Vous êtes sur la page 1sur 50

Applied Marketing (Market Research Methods)

Topic 8:

Identifying relationships

Dr James Abdey

Methods) Topic 8: Identifying relationships Dr James Abdey Identifying relationships Dr James Abdey Overview
Methods) Topic 8: Identifying relationships Dr James Abdey Identifying relationships Dr James Abdey Overview

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Overview We consider regression analysis which is used for explaining variation in market share, sales,

Overview

We consider regression analysis which is used for explaining variation in market share, sales, brand preference etc.

This may use explanatory variables such as advertising, price, distribution and product quality

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Starting with correlation, we proceed to the simple linear model followed by multiple linear regression

regression Starting with correlation , we proceed to the simple linear model followed by multiple linear
Relationship between two variables We now investigate the relationship between two variables When we have

Relationship between two variables

We now investigate the relationship between two variables

When we have data on two variables (X and Y ), we have bivariate data

We will consider how to:

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

measure the strength of the relationship

model the relationship

predict the value of one variable on the basis of the other

the strength of the relationship model the relationship predict the value of one variable on the
Relationship between two variables First thing to do with data is to provide a graphical

Relationship between two variables

First thing to do with data is to provide a graphical representation

For one variable this might be a histogram, stem-and-leaf diagram etc.

For two variables we produce a scatter diagram

This must include the following:

title axis labels units and be accurate!

title axis labels units and be accurate! Identifying relationships Dr James Abdey Overview

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Relationship between two variables Assume that we have some data in paired form: ( x

Relationship between two variables

Assume that we have some data in paired form:

(x i , y i ),

i = 1, 2,

, n

An example might be unemployment and crime figures for 12 areas of a city, of interest to insurers in setting policy premia for people insuring against theft

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Unemp., x

2614

1160

1055

1199

2157

2305

Offences, y

6200

4610

5336

5411

5808

6004

Unemp., x

1687

1287

1869

2283

1162

1201

Offences, y

5420

5588

5719

6336

5103

5268

x 1687 1287 1869 2283 1162 1201 Offences, y 5420 5588 5719 6336 5103 5268

6000

Number of offences

5500

5000

6000 Number of offences 5500 5000 Relationship between two variables We plot X on the horizontal

Relationship between two variables

We plot X on the horizontal axis, and Y on the vertical axis This emphasises any relationship between the variables

Scatter plot of Crime against Unemployment

x x x x x x x x x x x x 1000 1500 2000
x
x
x
x
x
x
x
x
x
x
x
x
1000
1500
2000
2500
Unemployment
x x x x x x x x 1000 1500 2000 2500 Unemployment Identifying relationships Dr

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation

coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Relationship between two variables A positive , linear relationship is apparent X and Y increase

Relationship between two variables

A positive, linear relationship is apparent

X and Y increase together, roughly linearly

Hence the implied linear relationship is not exact

The points do not lie exactly on a straight line

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Such an ‘upward shape’ is termed positive correlation

We will see later how to quantify correlation

linear regression Such an ‘upward shape’ is termed positive correlation We will see later how to

y

y Relationship between two variables Identifying relationships Dr James Abdey Other examples of scatter plots include:

Relationship between two variables

Identifying

relationships

Dr James Abdey

Other examples of scatter plots include:

LHS: Negative correlation (Y decreases as X increases) RHS: Uncorrelated data (no obvious (linear) relationship between X and Y )

Scatter plot Scatter plot x x x x x x x x x x x
Scatter plot
Scatter plot
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
2
4
6
8
0
2
4
6
8
x
x
2
4
6
8
y
2
4
6
8
2 4 6 8 0 2 4 6 8 x x 2 4 6 8 y

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Correlation Correlation measures the strength of the linear relationship between two variables, each measured on

Correlation

Correlation measures the strength of the linear relationship between two variables, each measured on an interval scale

Positive correlation — the two variables tend to vary in the same direction

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Negative correlation — the two variables tend to vary in the opposite direction

Perfect correlation — the two variables have points which all lie exactly on a straight line

in the opposite direction Perfect correlation — the two variables have points which all lie exactly
Correlation If there exists a perfect linear relationship between X and Y , we can

Correlation

If there exists a perfect linear relationship between X and Y , we can represent them using an equation of the form

Y = α + βX

α represents the intercept of the line

β represents the slope or gradient of the line

Examples of anticipated correlation:

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Variables

Correlation

Height & weight Rainfall & sunshine hours Ice cream sales & sun cream sales Hours of study & exam mark Car’s petrol consumption & goals scored

Positive

Negative

Positive

Positive

Positive

Zero

& exam mark Car’s petrol consumption & goals scored Positive Negative Positive Positive Zero
Correlation Positive correlation: large X with large Y ; small X with small Y Negative

Correlation

Positive correlation: large X with large Y ; small X with small Y

Negative correlation: large X with small Y ; small X with large Y

However, since the X and Y may have widely different numerical values we need to take this into account

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

We do this by considering how far away from the means the two scores are

Worked example Multiple linear regression We do this by considering how far away from the means
Correlation So, we are interested in the degree to which variations in variable values are

Correlation

So, we are interested in the degree to which variations in variable values are related to each other

Our basis for the measurement of correlation is

n

i=1

(x i x¯)(y i y¯) =

n

i=1

x i y i nx¯y¯

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation

coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Unfortunately, this measure is extremely sensitive to the units in which the variables are measured

We would prefer a measure of correlation to remain the same regardless of the units of measurement (e.g. days, hours, minutes or seconds)

a measure of correlation to remain the same regardless of the units of measurement (e.g. days,
Correlation So, we use the following to measure the correlation for (sample) data r =

Correlation

So, we use the following to measure the correlation for (sample) data

r =

=

x i y i nx¯y¯

( x

2

i

2

i

nx¯ 2 ) × ( y

ny¯ 2 )

(x i x¯)(y i y¯)

(x i x¯) 2 × (y i y¯) 2

¯) ( x i − x ¯) 2 × ( y i − y ¯) 2

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Correlation Returning to the unemployment/crime dataset: x i = 19979 , x 2 = 36695129

Correlation

Returning to the unemployment/crime dataset:

x i = 19979, x

2 = 36695129, y i = 66803,

i

y

2 = 374471231, x i y i = 113784494

i

Since n = 12, we have x¯ = 19979/12 = 1664.92 and y¯ = 66803/12 = 5566.92

Hence the (sample) correlation coefficient, r , is

r = 0.861

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Of course, in practise we can software like SPSS to calculate r for us!

Worked example Multiple linear regression Of course, in practise we can software like SPSS to calculate
Correlation The (sample) correlation coefficient, r , takes values between − 1 and 1, i.e.

Correlation

The (sample) correlation coefficient, r , takes values between 1 and 1, i.e.

1 r 1

r > 0 indicates positive correlation, with r = 1 indicating perfect positive correlation

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

r < 0 indicates negative correlation, with r = 1 indicating perfect negative correlation

The closer |r | is to 1, the stronger the linear relationship is

= − 1 indicating perfect negative correlation The closer | r | is to 1, the
Regression Here we introduce simple linear regression Only part of a very large topic in

Regression

Here we introduce simple linear regression

Only part of a very large topic in statistical analysis

In the simple model, we have two variables Y and X :

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Y is the dependent (or response) variable — that which we are trying to explain using:

X , the independent (or explanatory) variable — the factor we think influences Y

we are trying to explain using: X , the independent (or explanatory) variable — the factor
The simple linear regression model Assume a true (population) linear relationship between a response variable

The simple linear regression model

Assume a true (population) linear relationship between a response variable y and an explanatory variable x of the approximate form:

y = α + βx

α and β are fixed, but unknown, population parameters

α is the y -intercept

β is the slope of the line

We seek to estimate α and β using (paired) sample

data (x i , y i ), i = 1,

, n

sample data ( x i , y i ) , i = 1 , , n

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

The simple linear regression model Particularly in business, we would not expect a perfect linear

The simple linear regression model

Particularly in business, we would not expect a perfect linear relationship between the two variables

Hence we modify this basic model to

y = α + βx +

is some random perturbation from the initial ‘approximate’ line

In other words, each y observation almost lies on the postulated line, but ‘jumps’ off the line according to the random variable

Often referred to as the error term

to the random variable Often referred to as the error term Identifying relationships Dr James Abdey

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Parameter estimation — The least squares method For given sample data we could produce a

Parameter estimation — The least squares method

For given sample data we could produce a scatter plot

Any linear relationship would be visible

This would suggest performing a (simple) linear regression

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

We estimate the population regression line

This estimated line is often termed the line of best fit

linear regression We estimate the population regression line This estimated line is often termed the line
Parameter estimation — The least squares method How do we choose the line of best

Parameter estimation — The least squares method

How do we choose the line of best fit?

We require a formal criterion for determining the line of best fit

Estimation of α and β will be by least squares estimation

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Specifically, we seek to minimise the sum of the squared residuals, where a residual is the difference between the true y value and its predicted (fitted) value

of the squared residuals, where a residual is the difference between the true y value and
Parameter estimation — The least squares method The least squares estimator for β is β

Parameter estimation — The least squares method

The least squares estimator for β is

β = x i y i nx¯y¯

ˆ

2

i

x

nx¯ 2

The least squares estimators for α is

ˆ

αˆ = y¯ βx¯

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Hence the line of best fit has equation:

ˆ

yˆ = αˆ + βx

Again, this is routinely calculated in SPSS

Hence the line of best fit has equation: ˆ y ˆ = α ˆ + β
Example Returning to the unemployment/crime dataset x i = 19979 , x 2 = 36695129

Example

Returning to the unemployment/crime dataset

x i = 19979, x

2 = 36695129, y i = 66803,

i

y

2

i

= 374471231,

x i y i = 113784494

Since n = 12, we have x¯ = 19979/12 = 1664.92 and y¯ = 66803/12 = 5566.92, hence

β ˆ x i y i nx¯y¯

x

=

2

i

nx¯ 2

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

113784494 (12 × 1664.92 × 5566.92)

=

=

36695129 (12 × 1664.92 2 )

0.7468

regression 113784494 − ( 12 × 1664 . 92 × 5566 . 92 ) = =
Example We estimate the intercept to be α ˆ ˆ = = 5566 . 92

Example

We estimate the intercept to be

αˆ

ˆ

=

= 5566.92 0.7468 × 1664.92

= 4323.6

y¯ βx¯

Hence the least squares regression line is

yˆ = 4323.6 + 0.7468x

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Note the yˆ notation, where the ‘hat’ denotes an estimated value

Worked example Multiple linear regression Note the y ˆ notation, where the ‘hat’ denotes an estimated
Interpretation of correlation coefficient In the case of perfect correlation between X and Y ,

Interpretation of correlation coefficient

In the case of perfect correlation between X and Y , we can predict Y directly and exactly from X

In the case of zero correlation between X and Y , knowledge of X tells us nothing about Y

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Here we consider measuring the extent to which the values of one variable can be used to predict the values of another where the correlation is neither 1, nor 0, nor 1

values of one variable can be used to predict the values of another where the correlation
Interpretation of correlation coefficient Our overall objective is to explain the response variable Y ,

Interpretation of correlation coefficient

Our overall objective is to explain the response variable Y , which is a random variable

We try to explain the variation in Y

Using simple linear regression, we attempt this using a single explanatory variable, X

The total variation in the response variable sample data is simply

n

i=1

(y i y¯) 2

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

We term this the total sum of squares (TSS)

Regression diagnostics Worked example Multiple linear regression We term this the total sum of squares (TSS)
Interpretation of correlation coefficient We can decompose TSS into two components: the amount we are

Interpretation of correlation coefficient

We can decompose TSS into two components:

the amount we are able to explain using the model called the explained sum of squares (ESS);

and the remaining variation that we are unable to explain with the model, called the residual sum of squares (RSS)

Hence,

TSS = ESS + RSS
TSS
=
ESS + RSS

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Coefficient of determination, R 2 We can assess the overall fit of a model using

Coefficient of determination, R 2

We can assess the overall fit of a model using R 2

This measures the ‘proportion of the total variability in the response variable explained by the model’

This statistic is known as the coefficient of determination and is denoted R 2 and defined as

0 R 2 1

R 2 = ESS TSS

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination, R 2

Prediction Regression diagnostics Worked example Multiple linear regression

The closer R 2 is to 1, the better the explanatory power of the model

Note that R 2 = r 2 for a simple linear model, so we can also compute it from r (correlation coefficient)

Note that R 2 = r 2 for a simple linear model , so we can
Coefficient of determination, R 2 Returning to the crime/unemployment dataset, let’s assign Y and X

Coefficient of determination, R 2

Returning to the crime/unemployment dataset, let’s assign Y and X as follows

Y = number of offences X = unemployment

The least squares regression line was

yˆ = 4323.6 + 0.7468x

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

The correlation coefficient was 0.861, therefore R 2 = 0.861 2 = 0.7413

This means we can explain 74.13% of the variation in number of offences using unemployment

= 0 . 861 2 = 0 . 7413 This means we can explain 74.13% of
Prediction One of the purposes in calculating the line of best fit is prediction Specifically,

Prediction

One of the purposes in calculating the line of best fit is prediction

Specifically, for some value of x , we can provide a prediction for y

So, returning to the example, how many offences would you predict if there were 2000 unemployed people in a city area?

Answer: just substitute the desired value of x into the least squares regression line:

yˆ = 4323.6 + 0.7468 × 2000 = 5817

line: y ˆ = 4323 . 6 + 0 . 7468 × 2000 = 5817 Identifying

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Prediction Provided we are predicting y for an x value that is within the available

Prediction

Provided we are predicting y for an x value that is within the available x data, then we can be fairly confident in the prediction

This is what we call interpolation

However, if we base our prediction on an x value outside the available x data, then we should view the prediction with caution

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

This would be an example of extrapolation which is risky since the relationship between x and y may change for such values of x

an example of extrapolation which is risky since the relationship between x and y may change
Regression diagnostics The usefulness of a fitted regression model rests on a basic assumption: E

Regression diagnostics

The usefulness of a fitted regression model rests on a basic assumption:

E(y) = α + βx

Furthermore inference such as the hypothesis tests, confidence intervals and predictive intervals only make sense if the error terms are (approximately) independent and normal with constant variance σ 2

Therefore it is important to check these conditions are met in practice — this task is called regression diagnostics

Basic idea: Looking into the residuals ε i or the normalised residuals ε i

ε i or the normalised residuals ε i /σ Identifying relationships Dr James Abdey Overview

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Regression diagnostics What to look for? Do the residuals manifest IID normal behaviour? Is the

Regression diagnostics

What to look for?

Do the residuals manifest IID normal behaviour?

Is the scatter plot of ε i versus x i patternless?

Is the scatter plot of ε i versus y i patternless?

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Is the scatter plot of ε i versus i patternless?

If you see trends, periodic patterns, increasing variation in any one of the above scatter plots, it is very likely that at least one assumption is violated!

increasing variation in any one of the above scatter plots, it is very likely that at
Regression diagnostics Two other issues in regression diagnostics: outliers and influential observations Outlier: An

Regression diagnostics

Two other issues in regression diagnostics: outliers and influential observations

Outlier: An unusually small or unusually large y i which lies outside of the majority of observations

An outlier is often caused by an error in either sampling or recording data. If so, we should correct it before proceeding with the regression analysis

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

If an observation which looks like an outlier indeed belongs to the sample and no errors in sampling or recording were discovered, we may use a more complex model or distribution to accommodate this ‘outlier’. For example, stock returns often exhibit extreme values and they often cannot be modelled satisfactorily by a normal regression model

stock returns often exhibit extreme values and they often cannot be modelled satisfactorily by a normal
Regression diagnostics Influential observation: An x i which is far away from other x i

Regression diagnostics

Influential observation: An x i which is far away from other x i s

Such an observation may have a large influence on the fitted regression line

may have a large influence on the fitted regression line Identifying relationships Dr James Abdey Overview
may have a large influence on the fitted regression line Identifying relationships Dr James Abdey Overview

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Regression: Worked example We apply the simple linear regression method to study the relationship between

Regression: Worked example

We apply the simple linear regression method to study the relationship between two series of financial returns: a regression of Cisco Systems stock returns, y , on S&P500 Index returns, x

This regression model is an example of the CAPM (Capital Asset Pricing Model)

Stock returns:

Return

=

Current price Previous price

log

Previous price current price

previous price

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

when the difference between the two prices is small

Regression diagnostics Worked example Multiple linear regression when the difference between the two prices is small

Regression: Worked exampleRemark: Daily prices are definitely not independent. However, daily returns may be seen as a

Remark: Daily prices are definitely not independent. However, daily returns may be seen as a sequence of uncorrelated random variables

For S&P500, the average daily return is -0.04%, the maximum daily return is 4.46%, the minimum daily return is -6.01%, and the standard deviation is 1.40%

For Cisco, the average daily return is -0.13%, the maximum daily return is 15.42%, the minimum daily return is -13.44%, and the standard deviation is

4.23%

Descriptive Statistics

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

N Range

Minimum

Maximum

Mean

Std. Deviation

Variance

Minimum Maximum Mean Std. Deviation Variance SP500 252 10.66 Cisco 252 28.85 Valid N

SP500

252

10.66

Cisco

252

28.85

Valid N (listwise)

252

-6.00

4.65

-.0424

1.40017

1.960

-13.44

15.42

-.1336

4.23419

17.928

-6.00 4.65 -.0424 1.40017 1.960 -13.44 15.42 -.1336 4.23419 17.928
N (listwise) 252 -6.00 4.65 -.0424 1.40017 1.960 -13.44 15.42 -.1336 4.23419 17.928
Regression: Worked example Remark: Cisco is much more volatile than the S&P500 There is clear

Regression: Worked example

Remark: Cisco is much more volatile than the

S&P500

There is clear synchronisation between the movements of the two series of returns

between the movements of the two series of returns Identifying relationships Dr James Abdey Overview
between the movements of the two series of returns Identifying relationships Dr James Abdey Overview

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Regression: Worked example We fit a regression model: Cisco = α + β S&P500 +

Regression: Worked example

We fit a regression model:

Cisco = α + βS&P500 + ε

Rationale: Part of the fluctuation in Cisco returns was driven by the fluctuation of the S&P500 returns

was driven by the fluctuation of the S&P500 returns Identifying relationships Dr James Abdey Overview

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Regression: Worked example Coefficients a Model Unstandardized Coefficients Standardized t Sig. 95.0%

Regression: Worked example

Coefficients a

Model

Unstandardized Coefficients

Standardized

t

Sig.

95.0% Confidence Interval for B

 

Coefficients

B Std. Error Beta Lower Bound Upper Bound

B Std. Error

B Std. Error

Beta

Lower Bound

Upper Bound

B Std. Error Beta Lower Bound Upper Bound   (Constant) -.012 .064 -.188 .851 -.139
B Std. Error Beta Lower Bound Upper Bound   (Constant) -.012 .064 -.188 .851 -.139
B Std. Error Beta Lower Bound Upper Bound   (Constant) -.012 .064 -.188 .851 -.139
B Std. Error Beta Lower Bound Upper Bound   (Constant) -.012 .064 -.188 .851 -.139
B Std. Error Beta Lower Bound Upper Bound   (Constant) -.012 .064 -.188 .851 -.139
B Std. Error Beta Lower Bound Upper Bound   (Constant) -.012 .064 -.188 .851 -.139
B Std. Error Beta Lower Bound Upper Bound   (Constant) -.012 .064 -.188 .851 -.139
  (Constant) -.012 .064 -.188 .851 -.139 .114 1 Cisco .227 .015 .687 14.943
  (Constant) -.012 .064 -.188 .851 -.139 .114 1 Cisco .227 .015 .687 14.943
 

(Constant)

-.012

.064

-.188

.851

-.139

.114

1

Cisco

.227

.015

.687

14.943

.000

.197

.257

a. Dependent Variable: SP500

Model Summary b

Model

R

R Square

Adjusted R

Std. Error of the Estimate

 

Square

Model R R Square Adjusted R Std. Error of the Estimate   Square

1 .687 a

.472

.470

1.01964

a. Predictors: (Constant), Cisco

b. Dependent Variable: SP500

Predictors: (Constant), Cisco b. Dependent Variable: SP500 Identifying relationships Dr James Abdey Overview

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation

coefficient

Coefficient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Regression: Worked example When testing the statistical significance of regression coefficients, we just need to

Regression: Worked example

When testing the statistical significance of regression coefficients, we just need to look at the p-value

The smaller the p-value, the more significant the result, i.e. that the true parameter value is different from zero

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

In practice, we treat p-values smaller than 0.05 as being statistically significant (at the 5% significance level)

In practice, we treat p -values smaller than 0.05 as being statistically significant (at the 5%
Regression: Worked example The estimated slope: β = 2 . 077. The null hypothesis H

Regression: Worked example

The estimated slope: β = 2.077. The null hypothesis H 0 : β = 0 is rejected with p-value 0.000: extremely significant

Attempted interpretation: When the market index goes up by 1%, Cisco stock goes up by 2.077%, on average. However, the error term ε in the model is large with an estimated σ = 3.08%

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

The p-value for testing H 0 : α = 0 is 0.815, so we cannot reject the hypothesis that α = 0

Recall α = y¯ βx¯ and both y¯ and x¯ are very close to

0

reject the hypothesis that α = 0 Recall α = y ¯ − β x ¯
Regression: Worked example R 2 = 47 . 2 % of the variation of Cisco

Regression: Worked example

R 2 = 47.2% of the variation of Cisco stock may be explained by the variation of the S&P500 index, or in other words 47.2% of the risk in Cisco stock is the market-related risk — see CAPM below

CAPM: A simple asset pricing model in finance:

y i = α + βx i + ε i

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

where y i is a stock return and x i is a market return at time i

Worked example Multiple linear regression where y i is a stock return and x i is
Regression: Worked example Total risk of the stock: 1 n n i = 1 (

Regression: Worked example

Total risk of the stock:

1

n

n

i=1

(y i y¯) 2 = 1

n

n

i=1

( y i y¯) 2 + 1

n

n

i=1

(y i y i ) 2

Market-related (or systematic) risk:

1

n

n

i=1

( y i y¯) 2 = 1

n

β

2

Firm-specific risk:

n

i=1

(x i x¯) 2

n 1 (y i − y i ) 2 n i=1
n
1
(y i − y i ) 2
n
i=1

Identifying

relationships

Dr James Abdey

Overview

Relationship between two

variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Regression: Worked example β measures the market-related (or systematic) risk of the stock Market-related risk

Regression: Worked example

β measures the market-related (or systematic) risk of the stock

Market-related risk is unavoidable, while firm-specific risk may be ‘diversified away’ through hedging

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Variance is a simple measure (and one of the most frequently used) for risk in finance

Multiple linear regression Variance is a simple measure (and one of the most frequently used) for
Multiple linear regression Previously we saw simple linear regression That had one explanatory variable Often

Multiple linear regression

Previously we saw simple linear regression

That had one explanatory variable

Often one explanatory variable is not enough to explain variation in the response variable

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

So we add more linear explanatory variables

Regression diagnostics Worked example Multiple linear regression So we add more linear explanatory variables
Multiple linear regression — examples Absenteeism in the workforce could be due to: hours worked

Multiple linear regression — examples

Absenteeism in the workforce could be due to:

hours worked flexibility in work practice salary paid

Salary for managers could be related to:

qualifications experience hours worked performance

qualifications experience hours worked performance Identifying relationships Dr James Abdey Overview

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Multiple linear regression Remember the aim of statistics is prediction and decision making In order

Multiple linear regression

Remember the aim of statistics is prediction and decision making

In order to make the best predictions and decisions we need to use the best models

This often means making more complex models adding more explanation

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

But not too complex (Occam’s razor)

Regression diagnostics Worked example Multiple linear regression But not too complex (Occam’s razor)
The multiple linear model Suppose y is the manager’s salary x 1 = qualifications, x

The multiple linear model

Suppose y is the manager’s salary

x 1 = qualifications, x 2 = experience, x 3 = hours, x 4 = performance

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

y = β 0 + β qual x 1 + β exp x 2 + β hrs x 3 + β per x 4 +

We can visualise up to n = 3

x 1 + β e x p x 2 + β h r s x 3
The multiple linear model Identifying relationships Dr James Abdey Overview Relationship between two variables

The multiple linear model

The multiple linear model Identifying relationships Dr James Abdey Overview Relationship between two variables
The multiple linear model Identifying relationships Dr James Abdey Overview Relationship between two variables

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

The multiple linear model Multiple linear regression uses least squares estimation like simple linear regression

The multiple linear model

Multiple linear regression uses least squares estimation like simple linear regression

That is, we minimise the sum of the squared residuals in all dimensions

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefficient

Coefficient of determination,

R

2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Sounds tricky, but fortunately software (SPSS etc.) takes care of that for us

example Multiple linear regression Sounds tricky, but fortunately software (SPSS etc.) takes care of that for