Session 12:
Regression, Forecasting Techniques
January 2015 April 2015
Building good regression models
A good regression model should include only significant independent variables
However, it is not always clear exactly what will happen when we add or
remove variables from a model; variables that are (and are not) significant in
one model may (or may not) be significant in another. Therefore, you should
not consider dropping all insignificant variables at one time, but rather take a
more structured approach
Adding an independent variable to a regression model will always result in R2
>= original R2. This is true even with new independent variables that have little
true relationships with the dependent variables.
Better way of evaluating is using adjusted R2. Adjusted R2 reflects both the
number of independent variables and the sample size and may either increase
/ decrease when an independent variable is added / dropped
An increase in adjusted R2 indicates that the model has improved
Building good regression models
Construct a model with all available independent variables. Check
for significance of all independent variables by examining p-values
Identify the independent variable having the largest p-vale that
exceeds the chosen level of significance
Remove the variable identified in step 2 from the model and
evaluate adjusted R2. Dont remove all variables together, one at a
time
Continue until all variables are significant
In essence, this approach seeks to find a significant model that has
a highest adjusted R2
Building good regression models
Another criterion used to determine if a variable should be
removed is the t-statistic.
If |t|<1, the standard error will decrease and adjusted R2 will
increase if the variable is removed.
If |t|>1, the opposite will occur
Correlation and Multicollinearity
Correlation a numerical value between -1 and +1 measures the
linear relationship between pairs of variables
Higher the absolute value of the correlation, the greater the
strength of the relationship. Sign simply indicates whether variables
tend to increase together (positive) or not (negative)
Examining correlations between dependent and independent
variables using Excel Correlation tool can be useful in selecting
variables to include in a multiple regression model because a
strong correlation indicates a strong linear relationship
Strong correlations between independent variables can be
problematic
Correlation and Multicollinearity
Multicollinearity
When two or more independent variables in the same regression
model contain high levels of the same information
Strongly correlated
P-values get inflated
Correlations exceeding |0.7| indicate multicollinearity
Multicollinearity is best measured using a statistic called the Variance
Inflation Factor
Regression with Categorical Independent Variables
Some data of interest in a regression study may be ordinal /
nominal
Because regression requires numerical data, we could include
categorical variables by coding the variables
Yes / No
Dummy variables
A dependence between two variables X1 and X2 is called an
interaction. We can test for interactions by defining a new variable
as the product of two variables X3 = X1 * X2, and testing whether
this variable is significant, leading to an alternative model
Categorical variables with more than two levels
When a categorical variable has only two levels, we coded the
levels as 0 and 1 and added a new variable to the model.
When a categorical variable has k>2 levels, we need to add (k1) additional variables to the model
Try Surface Finish problem
Thank you
End of session 12
Comments / Doubts / Questions:
josephine.gemson@libaedu.in
Bien plus que des documents.
Découvrez tout ce que Scribd a à offrir, dont les livres et les livres audio des principaux éditeurs.
Annulez à tout moment.