Académique Documents
Professionnel Documents
Culture Documents
July 2017
Agenda
Business Problem: Investigate, by how much a typical familys food expenditure change as a result of a change
in its income
Objective: Test the below hypothesis (1) and model spending on food using family income
1) Does spending on food increase when a familys income increases?
2) By how much does the spending on food change when family income increases (or decreases)?
Dataset: Food consumption ($) and Family income ($) for 50 families embedded excel file
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 3
Simple Linear Correlation And Regression
Correlation & Regression
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 4
Solving the Business Case
Analysis steps
1 Assess the relationship between Family Income (X) and Food spending (Y) Scatter plot
5 Use Family Income (X) to model Error(Y) Observe correlation and trendline
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 5
Solving the Business Case
Analysis steps
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 6
Re-Cap: Correlation Strength Of A Relationship
Correlation
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 7
Mathematically: Coefficient Of Correlation
Coefficient of Correlation
R xi yi nx y
i
x 2
nx 2
iy 2
ny 2
R2
1. Measures how close all the (x,y) ordered pairs come to falling exactly on a
straight line.
2. -1 R 1
3. Slope determines only the sign of R.
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 8
Coefficient Of Determination
Coefficient of Determination is
The coefficient of determination R2 measures how well the line fits the data
Consider the Y variable alone. It has some total variation calculated using yy i
This variation can be partitioned into a part explained by the regression line and
the residual yy
i
y i
y
y
i
y i
The equation can be converted to squared terms, which are squared
deviations
yi y yi y i y i y
2 2 2
Total sum of squared sum of squared sum of squared
deviations in y deviations of residuals deviations of regression line
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 9
Explanatory Power Of A Linear Regression Equation
The Linear Regression Equation
Coefficient of determination
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 10
Food For Thought: R vs R2
R vs R2
To take an example: Consider a regression equation: To take an example: Consider a regression equation:
Y=2.5X+4 with R=0.922 => R2 = 0.850. Y=2.5X+4 with R=0.922 => R2 = 0.850.
Using R, we can deduce that there is a very strong Using R2, we can deduce that 85% of the total variation in
positive correlation between X and Y and that increase Y can be explained by the linear regression equation
or decrease in X would lead to a corresponding (between X and Y), and the other 15% of the total
increase or decrease in Y. variation in Y remains un-explained
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 11
Simple Linear Regression Single Independent Variable
Linear Regression
Definitions
Regression is a measure of the average One of the most frequently used techniques in
relationship between two or more variables in economics and business research, to find a
terms of the original units of the data relation between two or more variables that
are related casually, is regression analysis
Samuel B. Richmore Taro Yamne
In simple linear regression we generate an equation to calculate the value of a dependent variable
(Y) from an independent variable (X).
Example: Time taken to get to work (Y) is a function of the distance travelled (X)
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 12
The Regression Model
Regression Model
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 13
The Regression Model (Contd.)
Regression Model
Actually, it wont be that simple, because there will some time taken to walk to your
car and then walk from the car to work. Say this takes an extra 3 minutes per day
10
Time taken (minutes)
6
Note: the extended
4
line will intercept
Y axis at 3 minutes
2
0
0 1 2 3 4 5 6 7 8
Distance travelled (km's)
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 14
The Regression Model (Contd.)
Regression Model
It also wont be that precise because there will be slight variations in time taken
because of traffic, road works, etc.
12
Time taken (minutes)
10
8
Note: Line does not
6 perfectly pass
through all the
4
points
2
0
0 1 2 3 4 5 6 7 8
Distance travelled (km's)
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 15
The Regression Model (Contd.)
Regression Model
y O 1 x
y = the dependent variable
o = The y-intercept
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 16
The Regression Model (Contd.) Line Of Best Fit
Regression Model
Given a data set, we need to find a way of calculating the parameters of the
equation
14 ?
?
12 ?
10
8
6
4
2
0
0 5 10
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 17
The Regression Model (Contd.) Line Of Best Fit
Regression Model
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 18
The Regression Model (Contd.) Line Of Best Fit
Regression Model
Because the line will seldom fit the data precisely, there is always some error
associated with our line
The line of best fit is the line that minimises the spread of these errors
14
12
10
(yi - y ) = predicted value
8 of Y from Xi
6
4
2
0
0 2 4 6 8 10
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 19
The Regression Model (Contd.) Error Term
Regression Model
ei ( yi y )
The line of best fit occurs when the Sum of the Squared Errors is minimised
y
SSE ( yi y )
2
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 20
The Regression Model (Contd.) Estimating Of Parameters
Regression Model
xy x y
Slope =
( x x )( y y )
n
(x x)
1 2
2
( x )
x 2
n
n n
Intercept= 0
y x
1 y
yi
i 1 x
xi
i 1
n n
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 22
Simple Linear Regression Example
Linear Regression
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 23
Sample Data For House Price Model
An Example
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 24
Regression Output
Regression Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA Df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 25
Graphical Presentation
Graphical representation of our earlier example
450
400
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 26
Interpretation Of The Intercept, 0
The Intercept
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 27
Interpretation Of The Slope Coefficient, 1
Slope Coefficient
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 28
Least Squares Regression Properties
Regression properties
The simple regression line always passes through the mean of the y variable
and the mean of the x variable
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 29
Explained And Unexplained Variation
Variance
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 30
Explained And Unexplained Variation (Contd.)
Variance
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 31
Explained And Unexplained Variation (Contd.)
Variance
y
yi 2
SSE = (yi - yi ) y
_
SST = (yi - y)2
y _
_ SSR = (yi - y)2 _
y y
Xi x
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 32
Coefficient Of Determination, R2
Coefficient of Determination
SSR
R
2
where 0 R2 1
SST
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 33
Coefficient Of Determination, R2 (Contd.)
Coefficient of Determination
Coefficient of determination
R r2 2
where:
R2 = Coefficient of determination
r = Simple correlation coefficient
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 34
Examples Of Approximate R2 Values
Example: Coefficient of Determination
y
R2 = 1
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 35
Examples Of Approximate R2 Values (Contd.)
Example: Coefficient of Determination
y
0 < R2 < 1
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 36
Examples of Approximate R2 Values (Contd.)
Example: Coefficient of Determination
R2 = 0
y
No linear relationship between x
and y:
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 37
Regression Output
Regression Output
SSR 18934.9348
R 2
0.58082
Regression Statistics SST 32600.5000
Multiple R 0.76211
58.08% of the variation in house
R Square 0.58082
prices is explained by variation in
Adjusted R Square 0.52842
square feet
Standard Error 41.33032
Observations 10
ANOVA df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 38
Standard Error of Estimate
Standard Error of Estimate
SSE
s
n k 1
where
SSE = Sum of squares error
n = Sample size
k = number of independent variables in the model
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 39
The Standard Deviation Of The Regression Slope
Standard Deviation of Regression Slope
s s
sb1
(x x) 2
(
x n
2 x) 2
where:
s b1 = Estimate of the standard error of the least squares slope
SSE = Sample standard error of the estimate
s
n k -1
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 40
Regression Output
Regression Output
s 41.33032
Regression Statistics
Multiple R
R Square
0.76211
0.58082
sb1 0.03297
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 41
Comparing Standard Errors & Graphical Interpretation
Standard Errors: A comparison
Variation of observed y values from the Variation in the slope of regression lines from
regression line different possible samples
y y
y y
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 42
Inference About The Slope: t Test
Inference about the Slope
1 0 where:
t 1 = Sample regression slope
s b1 coefficient
0 = Hypothesized slope
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 43
Inference About The Slope: t Test (Contd.)
Inference about the Slope
House Price
Square Feet
in $1000s Estimated Regression Equation:
(x)
(y)
245 1400 house price 98.25 0.1098 (sq.ft.)
312 1600
279 1700
308 1875
199 1100 The slope of this model is 0.1098
219 1550 Does square footage of the house affect its
405 2350 sales price?
324 2450
319 1425
255 1700
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 44
Inferences About The Slope: t Test Example
Inference about the Slope
a/2=.025
Reject H0
a/2=.025
b1 t /2sb1 d.f. = n - 2
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 46
Regression Analysis For Description
Regression Analysis for Description
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 47
Residual Analysis
Residual Analysis
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 48
Residual Analysis For Normality
Residual Analysis
The simplest way to assess whether or not the residuals are normal is to draw a
histogram and visually inspect the distribution
0.45
0.4
0.35
probability
0.3
0.25
0.2
0.15
0.1
0.05
0
-3 -2 -1 0 1 2 3
e
Normally distributed
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 49
Residual Analysis For Linearity
Residual Analysis
y y
x x
residuals
residuals
x x
Not Linear
Linear
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 50
Residual Analysis For Constant Variance
Residual Analysis
Heteroscedasticity Homoscedasticity
y
y
x
x
residuals
residuals
x x
Non-constant variance
Constant variance
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 51
Food For Thought: ANOVA vs Linear Regression
ANOVA vs Linear Regression
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 52
Food For Thought: ANOVA vs Linear Regression (Contd.)
ANOVA vs Linear Regression
E.g: If we want to know whether being female means lower E.g: If we want to know whether gender, or having a college
income, or having a BA degree means higher, ANOVA should degree, have any effect on income, regression should be
be used. used
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 53
Agenda
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 55
Multiple Regression
Multiple Regression
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 56
Multiple Regression Example
Multiple Regression
No. of rooms 2 6 3 4 2 1
If we want to predict rent (in dollars per month) based on the size of the
apartment (number of rooms). You would collect data by recording the size and
rent and fit a model.
The following information has been gathered from a random sample of
apartment renters in a city.
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 57
Multiple Regression Example
Multiple Regression
4
3
2
1
0
0 200 400 600 800 1000 1200
rent ($)
Number of rooms
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 58
Multiple Regression Example
Multiple Regression
4
3
2
1
0
0 200 400 600 800 1000 1200
rent ($)
Number of rooms
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 59
Multiple Regression Example
Multiple Regression
But number of rooms isnt the only factor that has an impact on Rent.
With multiple regression we will have more then one independent variable, so we could
use number of rooms and Distance from Downtown to predict Rent.
Our new table, with the data, the Distance from Downtown, looks like this
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 60
Multiple Regression Example
Multiple Regression
This data cant be graphed like simple linear regression, because there are two
independent variables.
No. of observations read 6 Regression
No. of observations used 6 output (by SAS)
Analysis Of Variance
Source DF SS MS F value Pr>F
Model 2 306910 153455 16.28 0.0245
Error 3 28277 9425.76565
Corrected total 5 335188
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 61
Multiple Regression Example
Multiple Regression
Parameter Estimates
Standar-
Parameter Standard Variance
Variable Label DF t Value Pr > |t| dized
Estimate Error Inflation
Estimate
Intercept Intercept 1 96.458 118.12 0.82 0.47 0 0
Number_of_ Number_of_
1 136.48 26.864 5.08 0.01 0.94297 1.23
rooms rooms
Distance_
dis_downtown from_ 1 -2.4035 14.171 -0.17 0.88 -0.0315 1.23
Downtown
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 62
Multiple Regression Example
Multiple Regression
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 63
Multiple Regression Example
Multiple Regression
Hypotheses
All independent variables are unimportant for predicting y
H A : at least one k 0
At least one independent variable is useful for predicting y
H O : 1 2 3 ... k 0
What type of test should be used?
The distribution used is called the Fischer
distribution. The F-Statistic is used with this
distribution.
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 64
Multiple Regression Example
Multiple Regression
n
Yi Y
2
Regression SS
i1
n
2
Error SS Yi Y
i1
n
Yi Y
2
Total SS
i1
Y Y
n n n
Yi Y Yi Y
2 2 2
i
i 1 i 1 i 1
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 65
Multiple Regression Example
Multiple Regression
There are also regression mean of squares, error mean of squares, and total
mean of squares (abbreviated MS).
To calculate these terms, you divide the sum of squares by its respective
degrees of freedom
Regression d.f. = k
Error d.f. = n-k-1
Total d.f. = n-1
Where k is the number of independent variables and n is the total number of
observations used to calculate the regression
Now we can calculate the F-statistic.
F = (model mean square / error mean square)
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 66
Multiple Regression Example
Multiple Regression
The p-value for the F-statistic is then found in a F-Distribution Table. As you
saw before, it can also be easily calculated by software.
A small p-value rejects the null hypothesis that none of the independent
variables are significant. i.e., at least one of the independent variables are
significant.
Once you know that at least one independent variable is significant, you can go
on to test each independent variable separately.
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 67
Multiple Regression Example
Multiple Regression
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 68
Multiple Regression Example
Multiple Regression
HO: j 0
The independent variable, xj, is not important for predicting y
HA: j 0 or j 0 or j 0
The independent variable, x j, is important for predicting y where j
represents a specified random variable
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 69
Multiple Regression Example
Multiple Regression
Test Statistic
d.f. = n-k-1
Remember, this test is only to be performed, if the overall model of the test is
significant.
Tests of individual terms for significance are the same as a test of significance in simple
linear regression
t
j
s j
A small p-value in Parameter estimates table means that the independent variable is
significant.
This test of significance shows that No. of rooms is a significant independent variable
for predicting Rent, but average Distance from Downtown is not.
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 70
Multiple Regression
Multiple Regression
Adj.R R k
2 2 1 R 2
......(1)
n k 1
Residual Analysis to check appropriateness of the model
Histogram - Normal distribution assumption. Can also be checked with K-S
one sample test
Plotting residuals against predicted values Assumption of constant
variance of the error term.
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 71
Food For Thought: R2 vs Adjusted R2
R2 vs Adjusted R2
Answer: When we compare one multiple regression model (call it A) with another
multiple regression model (call it B) for that same sample with A having different
number of independent variables in comparison to B, then to compare both the
models, we should not use R2. This is because R2 of A and R2 of B (and consequently
the predicted values of dependent variables) have been computed using different
number of independent variables. So the comparison need not be an apt one.
Instead Adjusted R2 makes adjustment for the number of independent variables, as
can be clearly seen in it formula in equation (1) on the previous slide (See
denominator: n-k-1; k is the number of independent variables). Hence it is suitable
for making comparisons between two regression models having the same dependent
variable BUT different number of independent variables.
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 72
Multiple Regression
Multiple Regression
Plotting
residuals Plotting residuals against time/sequence of observations assumption of
against time/ non correlation across error terms (Durbin Watson test provides a formal
sequence of analysis of same)
observations
Plotting
residuals
against Plotting residuals against independent variables appropriateness of model
independent
variables
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 73
Multiple Regression
Multiple Regression
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 74
Post Regression Check List
The regression check list
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 75
Final Check List
Checking on the terms we have learnt so far
Linearity : scatter plot, common sense, and knowing your problem, transform including interactions if useful
t-statistics: are the coefficients significantly different from zero? Look at width of confidence intervals
F-tests : for subsets, equality of coefficients
R2: is it reasonably high in the context?
Influential observations, outliers in predictor space, dependent variable space
Normality : plot histogram of the residuals - Studentized residuals
Heteroscedasticity: Plot residuals with each x variable, transform if necessary, Box-Cox transformations
Autocorrelation: time series plot
Multicollinearity: compute correlations of the x variables, do signs of coefficients agree with intuition? -
Principal Components
Missing Values: Values which outght to have been in place, but are missing.
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 76
Food For Thought: Interpreting Regression Coefficients
Regression Coefficients and their interpretation
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 77
Food For Thought: Interpreting Regression Coefficients
Regression Coefficients and their interpretation
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 78
Food For Thought: Interpreting Regression Coefficients
Regression Coefficients and their interpretation
Interpreting the Intercept: B0, the Y-intercept, can be interpreted as the value you
would predict for Y if both X1 = 0 and X2 = 0. We would expect an average height of
42 cm for shrubs in partial sun with no bacteria in the soil. However, this is only a
meaningful interpretation if it is reasonable that both X1 and X2 can be 0, and if the
dataset actually included values for X1 and X2 that were near 0. If neither of these
conditions are true, then B0 really has no meaningful interpretation. It just anchors
the regression line in the right place. In this case, it is easy to see that X2 sometimes
is 0, but if X1, our bacteria level, never comes close to 0, then our intercept has no
real interpretation.
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 79
Food For Thought: Scaling Of Regression variables
Scaling Of Regression variables & their effect on coefficients
Changing the scale of the variable will lead to a corresponding change in the scale of
the coefficients and standard errors BUT no change in the significance or
interpretation.
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 80
Effects Of Data Scaling In A Tabulated Summary
Data Scaling
Dependent
y cy
Independent
R-squared R2 R2
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 81
Food For Thought: Transformation Of Regression variables
Transformation of regression variables
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 82
Common Transformation Methods
Common Transformation Methods To Achieve Linearity For Regression Analysis - Tabulated
Predicted value ()
Regression
Method Transformation(s) (Back
equation
transformation)
Standard linear
None y = b 0 + b 1x = b 0 + b 1x
regression
Exponential
Dependent variable = log(y) log(y) = b0 + b1x = 10b0 + b1x
model
Quadratic model Dependent variable = sqrt(y) sqrt(y) = b0 + b1x = ( b 0 + b 1 x )2
Reciprocal model Dependent variable = 1/y 1/y = b0 + b1x = 1 / ( b 0 + b 1x )
Logarithmic
Independent variable = log(x) y= b0 + b1log(x) = b0 + b1log(x)
model
Dependent variable = log(y) log(y)= b0 +
Power model = 10b0 + b1log(x)
Independent variable = log(x) b1log(x)
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 83
Steps Involved In Transformation For Achieving Linearity
Transformation for achieving linearity
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 84
Food For Thought: Solved Example On Transformation
Transformation example
X 1 2 3 4 5 6 7 8 9
Y 2 1 6 14 15 30 40 74 75
As shown in the above table (on left), the data for independent and dependent variables
- x and y, respectively. Applying a linear regression to the untransformed raw data,
the residual plot shows a non-random pattern (a U-shaped curve) (on right), which
suggests that the data are nonlinear.
On repeating the analysis, using a quadratic model for transformation. Taking the square
root of y, rather than y, as the dependent variable. Using the transformed data, our
regression equation is:
y't = b0 + b1x , where
yt = transformed dependent variable, which is equal to the square root of y
y't = predicted value of the transformed dependent variable yt
x = independent variable
b0 = y-intercept of transformation regression line
b1 = slope of transformation regression line
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 85
Food For Thought: Solved Example On Transformation
Transformation example
X 1 2 3 4 5 6 7 8 9
Y 1.41 1.00 2.45 3.74 3.87 5.48 6.32 8.60 8.66
Since the transformation was based on the quadratic model
(yt = the square root of y), the transformation regression
equation can be expressed in terms of the original units of
variable Y as: y' = ( b0 + b1x )2 , where
y' = predicted value of y in its original units
x = independent variable
b0 = y-intercept of transformation regression line
b1 = slope of transformation regression line
In the residual plot above (using the square root transformation regression), there is no
pattern => the transformation has been successful and the relationship between the
transformed dependent variable (square root of Y) and the independent variable (X) is
linear. Also, the coefficient of determination was 0.96 with the transformed data versus
only 0.88 with the raw data. Hence, the transformed data resulted in a better model.
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 86
Food For Thought: Standardization Of Regression Coefficients
Standardization of regression coefficients
In multiple regression, the relative size of the coefficients is not important. For example, say that
we want to predict the graduate grade point averages of students who are newly admitted to
the MPA Program. We use their undergraduate GPA, their GRE scores, and the number of years
they have been out of college as independent variables. We obtain the following regression
equation:
Y=1.437 + (.367) (UG-GPA) + (.00099) (GRE score) + (-.014) (years out of college) ---- (1)
In the above equation, one cannot compare the size of the various coefficients because the
three independent variables are measured on different scales. Undergraduate GPA is measured
on a scale from 0.0 to 4.0. GRE score is measured on a scale from 0 to 1600. Years out of college
is measured on a scale from 0 to 20. We cannot directly tell which independent variable has
the most effect on Y (graduate level GPA).
However, it is possible to transform the coefficients into standardized regression coefficients,
which are written as the plain English letter b. The standardized regression coefficients in any
one regression equation are measured on the same scale, with a mean of zero and a standard
deviation of 1. They are then directly comparable to one another, with the largest coefficient
indicating which independent variable has the greatest influence on the dependent variable.
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 87
Food For Thought: Standardization Of Regression Coefficients
Standardization of regression coefficients
Non-Standardized Standardized
Variable Name
Coefficient (beta) Coefficient (b)
Undergraduate GPA .367 +.291
GRE score .00099 +.175
Years out of college -.014 -.122
Intercept or Constant (a) 1.437 n/a
The table above, gives the nonstandard as well as standardized regression coefficients
from the regression equation (1)
From the standardized coefficient (b): it is clear that Undergraduate GPA is the most
important variable of the three variables
The one difference between non-standardized and standardized regression is that
standardized regression does not have an term (a constant). If there is no term (no
constant), then the regression coefficients have been standardized. If there is an
term, then the regression coefficients have not been standardized.
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 88
Standardized vs Unstandardized Regression Coefficients
Standardization vs Un-standardization regression coefficients
All variables have been converted to a common metric, The different predictor variables' unstandardized B
namely standard-deviation (z-score) units, so the coefficients are not directly comparable to each other,
coefficients can meaningfully be compared in because the raw units for each are (usually) different. In
magnitude. In this case, whichever predictor variable other words, the largest B coefficient will not
has the largest (in absolute value) can be said to have necessarily be the most significant, as it must be judged
the most potent relationship to the dependent in connection with its standard error (B/SE = t, which is
variable, and this predictor will also have the greatest used to test for statistical significance).
significance (smallest p value).
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 89
Questions Future
Copyright 2016 Axtria and/or its affiliates. All rights reserved. | Axtria Confidential Internal/Restricted/Highly Restricted 90