Vous êtes sur la page 1sur 50

Applied Marketing (Market Research Methods)

Topic 8:

Identifying relationships

Dr James Abdey

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

Overview

We consider regression analysis which is used for explaining variation in market share, sales, brand preference etc.

This may use explanatory variables such as advertising, price, distribution and product quality

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

Starting with correlation, we proceed to the simple linear model followed by multiple linear regression

Relationship between two variables

We now investigate the relationship between two variables

When we have data on two variables (X and Y ), we have bivariate data

We will consider how to:

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefﬁcient

Coefﬁcient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

measure the strength of the relationship

model the relationship

predict the value of one variable on the basis of the other

Relationship between two variables

First thing to do with data is to provide a graphical representation

For one variable this might be a histogram, stem-and-leaf diagram etc.

For two variables we produce a scatter diagram

This must include the following:

title axis labels units and be accurate!

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefﬁcient

Coefﬁcient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Relationship between two variables

Assume that we have some data in paired form:

(x i , y i ),

i = 1, 2,

, n

An example might be unemployment and crime ﬁgures for 12 areas of a city, of interest to insurers in setting policy premia for people insuring against theft

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefﬁcient

Coefﬁcient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

 Unemp., x 2614 1160 1055 1199 2157 2305 Offences, y 6200 4610 5336 5411 5808 6004 Unemp., x 1687 1287 1869 2283 1162 1201 Offences, y 5420 5588 5719 6336 5103 5268

6000

Number of offences

5500

5000

Relationship between two variables

We plot X on the horizontal axis, and Y on the vertical axis This emphasises any relationship between the variables

Scatter plot of Crime against Unemployment

x
x
x
x
x
x
x
x
x
x
x
x
1000
1500
2000
2500
Unemployment

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

Relationship between two variables

A positive, linear relationship is apparent

X and Y increase together, roughly linearly

Hence the implied linear relationship is not exact

The points do not lie exactly on a straight line

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

Such an ‘upward shape’ is termed positive correlation

We will see later how to quantify correlation

y

Relationship between two variables

Identifying

relationships

Dr James Abdey

Other examples of scatter plots include:

LHS: Negative correlation (Y decreases as X increases) RHS: Uncorrelated data (no obvious (linear) relationship between X and Y )

Scatter plot
Scatter plot
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
2
4
6
8
0
2
4
6
8
x
x
2
4
6
8
y
2
4
6
8
 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

Correlation

Correlation measures the strength of the linear relationship between two variables, each measured on an interval scale

Positive correlation — the two variables tend to vary in the same direction

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

Negative correlation — the two variables tend to vary in the opposite direction

Perfect correlation — the two variables have points which all lie exactly on a straight line

Correlation

If there exists a perfect linear relationship between X and Y , we can represent them using an equation of the form

Y = α + βX

α represents the intercept of the line

β represents the slope or gradient of the line

Examples of anticipated correlation:

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression
 Variables Correlation Height & weight Rainfall & sunshine hours Ice cream sales & sun cream sales Hours of study & exam mark Car’s petrol consumption & goals scored Positive Negative Positive Positive Zero

Correlation

Positive correlation: large X with large Y ; small X with small Y

Negative correlation: large X with small Y ; small X with large Y

However, since the X and Y may have widely different numerical values we need to take this into account

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

We do this by considering how far away from the means the two scores are

Correlation

So, we are interested in the degree to which variations in variable values are related to each other

Our basis for the measurement of correlation is

n

i=1

(x i x¯)(y i y¯) =

n

i=1

x i y i nx¯y¯

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

Unfortunately, this measure is extremely sensitive to the units in which the variables are measured

We would prefer a measure of correlation to remain the same regardless of the units of measurement (e.g. days, hours, minutes or seconds)

Correlation

So, we use the following to measure the correlation for (sample) data

r =

=

x i y i nx¯y¯

( x

2

i

2

i

nx¯ 2 ) × ( y

ny¯ 2 )

(x i x¯)(y i y¯)

(x i x¯) 2 × (y i y¯) 2

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

Correlation

Returning to the unemployment/crime dataset:

x i = 19979, x

2 = 36695129, y i = 66803,

i

y

2 = 374471231, x i y i = 113784494

i

Since n = 12, we have x¯ = 19979/12 = 1664.92 and y¯ = 66803/12 = 5566.92

Hence the (sample) correlation coefﬁcient, r , is

r = 0.861

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

Of course, in practise we can software like SPSS to calculate r for us!

Correlation

The (sample) correlation coefﬁcient, r , takes values between 1 and 1, i.e.

1 r 1

r > 0 indicates positive correlation, with r = 1 indicating perfect positive correlation

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

r < 0 indicates negative correlation, with r = 1 indicating perfect negative correlation

The closer |r | is to 1, the stronger the linear relationship is

Regression

Here we introduce simple linear regression

Only part of a very large topic in statistical analysis

In the simple model, we have two variables Y and X :

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

Y is the dependent (or response) variable — that which we are trying to explain using:

X , the independent (or explanatory) variable — the factor we think inﬂuences Y

The simple linear regression model

Assume a true (population) linear relationship between a response variable y and an explanatory variable x of the approximate form:

y = α + βx

α and β are ﬁxed, but unknown, population parameters

α is the y -intercept

β is the slope of the line

We seek to estimate α and β using (paired) sample

data (x i , y i ), i = 1,

, n

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefﬁcient

Coefﬁcient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

The simple linear regression model

Particularly in business, we would not expect a perfect linear relationship between the two variables

Hence we modify this basic model to

y = α + βx +

is some random perturbation from the initial ‘approximate’ line

In other words, each y observation almost lies on the postulated line, but ‘jumps’ off the line according to the random variable

Often referred to as the error term

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

Parameter estimation — The least squares method

For given sample data we could produce a scatter plot

Any linear relationship would be visible

This would suggest performing a (simple) linear regression

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

We estimate the population regression line

This estimated line is often termed the line of best ﬁt

Parameter estimation — The least squares method

How do we choose the line of best ﬁt?

We require a formal criterion for determining the line of best ﬁt

Estimation of α and β will be by least squares estimation

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefﬁcient

Coefﬁcient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Speciﬁcally, we seek to minimise the sum of the squared residuals, where a residual is the difference between the true y value and its predicted (ﬁtted) value

Parameter estimation — The least squares method

The least squares estimator for β is

β = x i y i nx¯y¯

ˆ

2

i

x

nx¯ 2

The least squares estimators for α is

ˆ

αˆ = y¯ βx¯

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefﬁcient

Coefﬁcient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Hence the line of best ﬁt has equation:

ˆ

yˆ = αˆ + βx

Again, this is routinely calculated in SPSS

Example

Returning to the unemployment/crime dataset

x i = 19979, x

2 = 36695129, y i = 66803,

i

y

2

i

= 374471231,

x i y i = 113784494

Since n = 12, we have x¯ = 19979/12 = 1664.92 and y¯ = 66803/12 = 5566.92, hence

β ˆ x i y i nx¯y¯

x

=

2

i

nx¯ 2

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefﬁcient

Coefﬁcient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

113784494 (12 × 1664.92 × 5566.92)

=

=

36695129 (12 × 1664.92 2 )

0.7468

Example

We estimate the intercept to be

αˆ

ˆ

=

= 5566.92 0.7468 × 1664.92

= 4323.6

y¯ βx¯

Hence the least squares regression line is

yˆ = 4323.6 + 0.7468x

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefﬁcient

Coefﬁcient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Note the yˆ notation, where the ‘hat’ denotes an estimated value

Interpretation of correlation coefﬁcient

In the case of perfect correlation between X and Y , we can predict Y directly and exactly from X

In the case of zero correlation between X and Y , knowledge of X tells us nothing about Y

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefﬁcient

Coefﬁcient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Here we consider measuring the extent to which the values of one variable can be used to predict the values of another where the correlation is neither 1, nor 0, nor 1

Interpretation of correlation coefﬁcient

Our overall objective is to explain the response variable Y , which is a random variable

We try to explain the variation in Y

Using simple linear regression, we attempt this using a single explanatory variable, X

The total variation in the response variable sample data is simply

n

i=1

(y i y¯) 2

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

We term this the total sum of squares (TSS)

Interpretation of correlation coefﬁcient

We can decompose TSS into two components:

the amount we are able to explain using the model called the explained sum of squares (ESS);

and the remaining variation that we are unable to explain with the model, called the residual sum of squares (RSS)

Hence,

TSS
=

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

Coefﬁcient of determination, R 2

We can assess the overall ﬁt of a model using R 2

This measures the ‘proportion of the total variability in the response variable explained by the model’

This statistic is known as the coefﬁcient of determination and is denoted R 2 and deﬁned as

0 R 2 1

R 2 = ESS TSS

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefﬁcient

Coefﬁcient of determination, R 2

Prediction Regression diagnostics Worked example Multiple linear regression

The closer R 2 is to 1, the better the explanatory power of the model

Note that R 2 = r 2 for a simple linear model, so we can also compute it from r (correlation coefﬁcient)

Coefﬁcient of determination, R 2

Returning to the crime/unemployment dataset, let’s assign Y and X as follows

Y = number of offences X = unemployment

The least squares regression line was

yˆ = 4323.6 + 0.7468x

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefﬁcient

Coefﬁcient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

The correlation coefﬁcient was 0.861, therefore R 2 = 0.861 2 = 0.7413

This means we can explain 74.13% of the variation in number of offences using unemployment

Prediction

One of the purposes in calculating the line of best ﬁt is prediction

Speciﬁcally, for some value of x , we can provide a prediction for y

So, returning to the example, how many offences would you predict if there were 2000 unemployed people in a city area?

Answer: just substitute the desired value of x into the least squares regression line:

yˆ = 4323.6 + 0.7468 × 2000 = 5817

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefﬁcient

Coefﬁcient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Prediction

Provided we are predicting y for an x value that is within the available x data, then we can be fairly conﬁdent in the prediction

This is what we call interpolation

However, if we base our prediction on an x value outside the available x data, then we should view the prediction with caution

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefﬁcient

Coefﬁcient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

This would be an example of extrapolation which is risky since the relationship between x and y may change for such values of x

Regression diagnostics

The usefulness of a ﬁtted regression model rests on a basic assumption:

E(y) = α + βx

Furthermore inference such as the hypothesis tests, conﬁdence intervals and predictive intervals only make sense if the error terms are (approximately) independent and normal with constant variance σ 2

Therefore it is important to check these conditions are met in practice — this task is called regression diagnostics

Basic idea: Looking into the residuals ε i or the normalised residuals ε i

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

Regression diagnostics

What to look for?

Do the residuals manifest IID normal behaviour?

Is the scatter plot of ε i versus x i patternless?

Is the scatter plot of ε i versus y i patternless?

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefﬁcient

Coefﬁcient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Is the scatter plot of ε i versus i patternless?

If you see trends, periodic patterns, increasing variation in any one of the above scatter plots, it is very likely that at least one assumption is violated!

Regression diagnostics

Two other issues in regression diagnostics: outliers and inﬂuential observations

Outlier: An unusually small or unusually large y i which lies outside of the majority of observations

An outlier is often caused by an error in either sampling or recording data. If so, we should correct it before proceeding with the regression analysis

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

If an observation which looks like an outlier indeed belongs to the sample and no errors in sampling or recording were discovered, we may use a more complex model or distribution to accommodate this ‘outlier’. For example, stock returns often exhibit extreme values and they often cannot be modelled satisfactorily by a normal regression model

Regression diagnostics

Inﬂuential observation: An x i which is far away from other x i s

Such an observation may have a large inﬂuence on the ﬁtted regression line

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefﬁcient

Coefﬁcient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Regression: Worked example

We apply the simple linear regression method to study the relationship between two series of ﬁnancial returns: a regression of Cisco Systems stock returns, y , on S&P500 Index returns, x

This regression model is an example of the CAPM (Capital Asset Pricing Model)

Stock returns:

Return

=

Current price Previous price

log

Previous price current price

previous price

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefﬁcient

Coefﬁcient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

when the difference between the two prices is small

Regression: Worked example

Remark: Daily prices are deﬁnitely not independent. However, daily returns may be seen as a sequence of uncorrelated random variables

For S&P500, the average daily return is -0.04%, the maximum daily return is 4.46%, the minimum daily return is -6.01%, and the standard deviation is 1.40%

For Cisco, the average daily return is -0.13%, the maximum daily return is 15.42%, the minimum daily return is -13.44%, and the standard deviation is

4.23%

Descriptive Statistics

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

N Range

Minimum

Maximum

Mean

Std. Deviation

Variance

 SP500 252 10.66 Cisco 252 28.85 Valid N (listwise) 252
 -6 4.65 -0.0424 1.40017 1.96 -13.44 15.42 -0.1336 4.23419 17.928

Regression: Worked example

Remark: Cisco is much more volatile than the

S&P500

There is clear synchronisation between the movements of the two series of returns

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

Regression: Worked example

We ﬁt a regression model:

Cisco = α + βS&P500 + ε

Rationale: Part of the ﬂuctuation in Cisco returns was driven by the ﬂuctuation of the S&P500 returns

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

Regression: Worked example

Coefficients a

 Model Unstandardized Coefficients Standardized t Sig. 95.0% Confidence Interval for B Coefficients
 B Std. Error Beta Lower Bound Upper Bound
 (Constant) -0.012 0.064 -0.188 0.851 -0.139 0.114 1 Cisco 0.227 0.015 .687 14.943 0 0.197 0.257

a. Dependent Variable: SP500

Model Summary b

 Model R R Square Adjusted R Std. Error of the Estimate Square

1 .687 a

.472

.470

1.01964

a. Predictors: (Constant), Cisco

b. Dependent Variable: SP500

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation

coefﬁcient

Coefﬁcient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

Regression: Worked example

When testing the statistical signiﬁcance of regression coefﬁcients, we just need to look at the p-value

The smaller the p-value, the more signiﬁcant the result, i.e. that the true parameter value is different from zero

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefﬁcient

Coefﬁcient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

In practice, we treat p-values smaller than 0.05 as being statistically signiﬁcant (at the 5% signiﬁcance level)

Regression: Worked example

The estimated slope: β = 2.077. The null hypothesis H 0 : β = 0 is rejected with p-value 0.000: extremely signiﬁcant

Attempted interpretation: When the market index goes up by 1%, Cisco stock goes up by 2.077%, on average. However, the error term ε in the model is large with an estimated σ = 3.08%

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefﬁcient

Coefﬁcient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

The p-value for testing H 0 : α = 0 is 0.815, so we cannot reject the hypothesis that α = 0

Recall α = y¯ βx¯ and both y¯ and x¯ are very close to

0

Regression: Worked example

R 2 = 47.2% of the variation of Cisco stock may be explained by the variation of the S&P500 index, or in other words 47.2% of the risk in Cisco stock is the market-related risk — see CAPM below

CAPM: A simple asset pricing model in ﬁnance:

y i = α + βx i + ε i

Identifying

relationships

Dr James Abdey

Overview

Relationship between two variables

Correlation

Regression

The simple linear regression model

Parameter estimation

Interpretation of correlation coefﬁcient

Coefﬁcient of determination, R 2

Prediction

Regression diagnostics

Worked example

Multiple linear regression

where y i is a stock return and x i is a market return at time i

Regression: Worked example

Total risk of the stock:

1

n

n

i=1

(y i y¯) 2 = 1

n

n

i=1

( y i y¯) 2 + 1

n

n

i=1

(y i y i ) 2

Market-related (or systematic) risk:

1

n

n

i=1

( y i y¯) 2 = 1

n

β

2

Firm-speciﬁc risk:

n

i=1

(x i x¯) 2

n
1
(y i − y i ) 2
n
i=1

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

Regression: Worked example

β measures the market-related (or systematic) risk of the stock

Market-related risk is unavoidable, while ﬁrm-speciﬁc risk may be ‘diversiﬁed away’ through hedging

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

Variance is a simple measure (and one of the most frequently used) for risk in ﬁnance

Multiple linear regression

Previously we saw simple linear regression

Often one explanatory variable is not enough to explain variation in the response variable

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

So we add more linear explanatory variables

Multiple linear regression — examples

Absenteeism in the workforce could be due to:

hours worked ﬂexibility in work practice salary paid

Salary for managers could be related to:

qualiﬁcations experience hours worked performance

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

Multiple linear regression

Remember the aim of statistics is prediction and decision making

In order to make the best predictions and decisions we need to use the best models

This often means making more complex models adding more explanation

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

But not too complex (Occam’s razor)

The multiple linear model

Suppose y is the manager’s salary

x 1 = qualiﬁcations, x 2 = experience, x 3 = hours, x 4 = performance

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

y = β 0 + β qual x 1 + β exp x 2 + β hrs x 3 + β per x 4 +

We can visualise up to n = 3

The multiple linear model

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

The multiple linear model

Multiple linear regression uses least squares estimation like simple linear regression

That is, we minimise the sum of the squared residuals in all dimensions

Identifying

relationships

Dr James Abdey

 Overview Relationship between two variables Correlation Regression The simple linear regression model Parameter estimation Interpretation of correlation coefﬁcient Coefﬁcient of determination, R 2 Prediction Regression diagnostics Worked example Multiple linear regression

Sounds tricky, but fortunately software (SPSS etc.) takes care of that for us