Vous êtes sur la page 1sur 9

Session 12:

Regression, Forecasting Techniques

January 2015 April 2015

Building good regression models

A good regression model should include only significant independent variables
However, it is not always clear exactly what will happen when we add or
remove variables from a model; variables that are (and are not) significant in
one model may (or may not) be significant in another. Therefore, you should
not consider dropping all insignificant variables at one time, but rather take a
more structured approach
Adding an independent variable to a regression model will always result in R2
>= original R2. This is true even with new independent variables that have little
true relationships with the dependent variables.
Better way of evaluating is using adjusted R2. Adjusted R2 reflects both the
number of independent variables and the sample size and may either increase
/ decrease when an independent variable is added / dropped
An increase in adjusted R2 indicates that the model has improved

Building good regression models

Construct a model with all available independent variables. Check
for significance of all independent variables by examining p-values
Identify the independent variable having the largest p-vale that
exceeds the chosen level of significance
Remove the variable identified in step 2 from the model and
evaluate adjusted R2. Dont remove all variables together, one at a
Continue until all variables are significant
In essence, this approach seeks to find a significant model that has
a highest adjusted R2

Building good regression models

Another criterion used to determine if a variable should be
removed is the t-statistic.
If |t|<1, the standard error will decrease and adjusted R2 will
increase if the variable is removed.
If |t|>1, the opposite will occur

Correlation and Multicollinearity

Correlation a numerical value between -1 and +1 measures the
linear relationship between pairs of variables
Higher the absolute value of the correlation, the greater the
strength of the relationship. Sign simply indicates whether variables
tend to increase together (positive) or not (negative)
Examining correlations between dependent and independent
variables using Excel Correlation tool can be useful in selecting
variables to include in a multiple regression model because a
strong correlation indicates a strong linear relationship
Strong correlations between independent variables can be

Correlation and Multicollinearity

When two or more independent variables in the same regression
model contain high levels of the same information
Strongly correlated
P-values get inflated
Correlations exceeding |0.7| indicate multicollinearity
Multicollinearity is best measured using a statistic called the Variance
Inflation Factor

Regression with Categorical Independent Variables

Some data of interest in a regression study may be ordinal /
Because regression requires numerical data, we could include
categorical variables by coding the variables
Yes / No
Dummy variables

A dependence between two variables X1 and X2 is called an

interaction. We can test for interactions by defining a new variable
as the product of two variables X3 = X1 * X2, and testing whether
this variable is significant, leading to an alternative model

Categorical variables with more than two levels

When a categorical variable has only two levels, we coded the
levels as 0 and 1 and added a new variable to the model.
When a categorical variable has k>2 levels, we need to add (k1) additional variables to the model
Try Surface Finish problem

Thank you
End of session 12
Comments / Doubts / Questions: