Vous êtes sur la page 1sur 21

Multiple Regression

Topic 5

Agenda

Background
Example with Real Data
Some Considerations
Key Terms
Summary

Background

Multiple Linear Regression is widely


used in academics and also in MR
We can consider it the start of
Multivariate Analysis, for our course
Any idea what the following are:

Multivariate analysis
Multiple Linear Regression (MLR)

Background

Multivariate analysis is hard to define well

Some say anytime you have more than 2


variables, it is multivariate
Some say that you need to have many
combinations of variables i.e. variates
Some say that you need to have multiple
dependent variables

For all practical purposes, following can be


considered multivariate

MLR
Factor analysis

Background

Discriminant analysis
Cluster Analysis
Conjoint analysis
Canonical Correlation
Structural Equation Modeling

We shall consider just MLR, factor,


discriminant and cluster analyses

Linear regression involves finding a linear


relationship between an independent variable
and dependent variable

Background

Different levels of an independent variable


are associated with corresponding changes
in the dependent variable

What is an IV? What is a DV?


IV is denoted by X, while DV is denoted Y
We can loosely say X causes Y

Any idea how regression works? The


principle behind it? In what scale the IV is,
the DV is?

Assume one X, one Y

Background

Normally, the IV & DV continuous, not


discrete

In regression, a line is repeatedly


fitted in the scatter-plot of X and Y

Meaning?

The line of best fit is the regression line

Consider the following data

Background
X

Background

Let us plot the points


Drawing a line of best fit is childs play

In real life, we rarely find data that are so


perfect

The association is perfectly linear

We instead may find data that may be as follows

Thus, the line of best fit is the regression line

There is some error


But the idea is to minimise this error; how is this
done?

Background

The sum of least squares is followed


Different lines are fitted, the errors
squared and the line with the sum of
least squares is chosen finally

Sometimes, MLR is called OLS or


Ordinary least squares

Why should one square the errors


and then add? Why not just add up?

Background

The idea is 2-fold

This is a 1-IV case, similar with n IVs

We cancel out +ve and ve errors


We penalise large errors
Impossible to show on the board

Now let us consider some real data


and perform a regression

Some Considerations

Can also handle non-metric or


categorical IVs e.g. gender influences
shopping time

This is called dummy coding


Basically dummy regression is the same
as an ANOVA
Both are forms of the General Linear
Model

While MLR is useful, it has certain


prerequisites and limitations

Some Considerations

There should be not be collinearity between


the IVs

This creates biased estimates


First step is therefore to get the correlation matrix
in Excel/SPSS
How to remove this collinearity?

One should also go into MLR with sufficient


research on likely relationships

Else, may end up doing sample-specific data


mining
No guarantee about robustness of results

Some Considerations

There should not be heteroscedasticity in


the DV

The shot-gun approach should be avoided


MR firms may not agree

2 marks bonus for saying this orally in the final!


This can be got around by transforming the data
using log, inverse, square root

Cannot handle non-linear relationships

Consider the following data

Some Considerations
X

16

25

36

49

64

Some Considerations

SPSS will give you a decent


regression but it misses the point

Have to use polynomial regression,


beyond scope

Must take great care in ensuring all


IVs put in, else may reach utterly
erroneous conclusions e.g.

Sales on Ad, leaving out Price, SP

Some Considerations

Ideally have some likely results in


mind before going in for data
collection

MR firms screw up here


We academics score big here
Why is this important?

In case no working knowledge is


there, use stepwise regression

It will give you the order of importance

Some Considerations

In exploratory research, ok to use it

Not a big fan of stepwise

Key Terms A Review

Coefficient of Determination, R2, gives


the extent of variation in Y explained
by X (or X1, X2 and so on)

Also called variance explained


Better would be adjusted R 2

b is the unstandardised weight and


is the standardised weight

Since different units may be there for diff


IVs

Key Terms A Review

F-Value and t-value must be looked at


too
Any doubts?
Do you want to learn how regression
can handle

Categorical data
Interaction effects? What problems will
come here?
Need demo?

Summary

MLR is a very useful tool

It has wide applications

But must be careful to avoid violating


fundamental assumptions, mainly
multicollinearity

Esp. in MR

Vous aimerez peut-être aussi