Applied Marketing (Market Research Methods)
Topic 8:
Identifying relationships
Dr James Abdey
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
Overview
We consider regression analysis which is used for explaining variation in market share, sales, brand preference etc.
This may use explanatory variables such as advertising, price, distribution and product quality
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
Starting with correlation, we proceed to the simple linear model followed by multiple linear regression
Relationship between two variables
We now investigate the relationship between two variables
When we have data on two variables (X and Y ), we have bivariate data
We will consider how to:
Identifying
relationships
Dr James Abdey
Overview
Relationship between two variables
Correlation
Regression
The simple linear regression model
Parameter estimation
Interpretation of correlation coefﬁcient
Coefﬁcient of determination, R ^{2}
Prediction
Regression diagnostics
Worked example
Multiple linear regression
measure the strength of the relationship
model the relationship
predict the value of one variable on the basis of the other
Relationship between two variables
First thing to do with data is to provide a graphical representation
For one variable this might be a histogram, stemandleaf diagram etc.
For two variables we produce a scatter diagram
This must include the following:
title axis labels units and be accurate!
Identifying
relationships
Dr James Abdey
Overview
Relationship between two variables
Correlation
Regression
The simple linear regression model
Parameter estimation
Interpretation of correlation coefﬁcient
Coefﬁcient of determination, R ^{2}
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Relationship between two variables
Assume that we have some data in paired form:
(x _{i} , y _{i} ),
i = 1, 2,
, n
An example might be unemployment and crime ﬁgures for 12 areas of a city, of interest to insurers in setting policy premia for people insuring against theft
Identifying
relationships
Dr James Abdey
Overview
Relationship between two variables
Correlation
Regression
The simple linear regression model
Parameter estimation
Interpretation of correlation coefﬁcient
Coefﬁcient of determination, R ^{2}
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Unemp., x 
2614 
1160 
1055 
1199 
2157 
2305 
Offences, y 
6200 
4610 
5336 
5411 
5808 
6004 
Unemp., x 
1687 
1287 
1869 
2283 
1162 
1201 
Offences, y 
5420 
5588 
5719 
6336 
5103 
5268 
6000
Number of offences
5500
5000
Relationship between two variables
We plot X on the horizontal axis, and Y on the vertical axis This emphasises any relationship between the variables
Scatter plot of Crime against Unemployment
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation 

coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
Relationship between two variables
A positive, linear relationship is apparent
X and Y increase together, roughly linearly
Hence the implied linear relationship is not exact
The points do not lie exactly on a straight line
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
Such an ‘upward shape’ is termed positive correlation
We will see later how to quantify correlation
y
Relationship between two variables
Identifying
relationships
Dr James Abdey
Other examples of scatter plots include:
LHS: Negative correlation (Y decreases as X increases) RHS: Uncorrelated data (no obvious (linear) relationship between X and Y )
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
Correlation
Correlation measures the strength of the linear relationship between two variables, each measured on an interval scale
Positive correlation — the two variables tend to vary in the same direction
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
Negative correlation — the two variables tend to vary in the opposite direction
Perfect correlation — the two variables have points which all lie exactly on a straight line
Correlation
If there exists a perfect linear relationship between X and Y , we can represent them using an equation of the form
Y = α + βX
α represents the intercept of the line
β represents the slope or gradient of the line
Examples of anticipated correlation:
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
Variables 
Correlation 

Height & weight Rainfall & sunshine hours Ice cream sales & sun cream sales Hours of study & exam mark Car’s petrol consumption & goals scored 
Positive 

Negative 

Positive 

Positive 


Zero 
Correlation
Positive correlation: large X with large Y ; small X with small Y
Negative correlation: large X with small Y ; small X with large Y
However, since the X and Y may have widely different numerical values we need to take this into account
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
We do this by considering how far away from the means the two scores are
Correlation
So, we are interested in the degree to which variations in variable values are related to each other
Our basis for the measurement of correlation is
n
i=1
(x _{i} − x¯)(y _{i} − y¯) =
n
i=1
x _{i} y _{i} − nx¯y¯
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation 

coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
Unfortunately, this measure is extremely sensitive to the units in which the variables are measured
We would prefer a measure of correlation to remain the same regardless of the units of measurement (e.g. days, hours, minutes or seconds)
Correlation
So, we use the following to measure the correlation for (sample) data
_{r} _{=}
_{=}
^{} x _{i} y _{i} − nx¯y¯
( ^{} x
2
i
2
i
− nx¯ ^{2} ) × ( ^{} y
−
ny¯ ^{2} )
^{} (x _{i} − x¯)(y _{i} − y¯)
^{} ^{} (x _{i} − x¯) ^{2} × ^{} (y _{i} − y¯) ^{2}
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
Correlation
Returning to the unemployment/crime dataset:
^{} x _{i} = 19979, ^{} x
2 = 36695129, ^{} y _{i} = 66803,
i
^{} y
2 = 374471231, ^{} x _{i} y _{i} = 113784494
i
Since n = 12, we have x¯ = 19979/12 = 1664.92 and y¯ = 66803/12 = 5566.92
Hence the (sample) correlation coefﬁcient, r , is
r = 0.861
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
Of course, in practise we can software like SPSS to calculate r for us!
Correlation
The (sample) correlation coefﬁcient, r , takes values between −1 and 1, i.e.
−1 ≤ r ≤ 1
r > 0 indicates positive correlation, with r = 1 indicating perfect positive correlation
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
r < 0 indicates negative correlation, with r = −1 indicating perfect negative correlation
The closer r  is to 1, the stronger the linear relationship is
Regression
Here we introduce simple linear regression
Only part of a very large topic in statistical analysis
In the simple model, we have two variables Y and X :
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
Y is the dependent (or response) variable — that which we are trying to explain using:
X , the independent (or explanatory) variable — the factor we think inﬂuences Y
The simple linear regression model
Assume a true (population) linear relationship between a response variable y and an explanatory variable x of the approximate form:
y = α + βx
α and β are ﬁxed, but unknown, population parameters
α is the y intercept
β is the slope of the line
We seek to estimate α and β using (paired) sample
data (x _{i} , y _{i} ), i = 1,
, n
Identifying
relationships
Dr James Abdey
Overview
Relationship between two variables
Correlation
Regression
The simple linear regression model
Parameter estimation
Interpretation of correlation coefﬁcient
Coefﬁcient of determination, R ^{2}
Prediction
Regression diagnostics
Worked example
Multiple linear regression
The simple linear regression model
Particularly in business, we would not expect a perfect linear relationship between the two variables
Hence we modify this basic model to
y = α + βx +
is some random perturbation from the initial ‘approximate’ line
In other words, each y observation almost lies on the postulated line, but ‘jumps’ off the line according to the random variable
Often referred to as the error term
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
Parameter estimation — The least squares method
For given sample data we could produce a scatter plot
Any linear relationship would be visible
This would suggest performing a (simple) linear regression
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
We estimate the population regression line
This estimated line is often termed the line of best ﬁt
Parameter estimation — The least squares method
How do we choose the line of best ﬁt?
We require a formal criterion for determining the line of best ﬁt
Estimation of α and β will be by least squares estimation
Identifying
relationships
Dr James Abdey
Overview
Relationship between two variables
Correlation
Regression
The simple linear regression model
Parameter estimation
Interpretation of correlation coefﬁcient
Coefﬁcient of determination, R ^{2}
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Speciﬁcally, we seek to minimise the sum of the squared residuals, where a residual is the difference between the true y value and its predicted (ﬁtted) value
Parameter estimation — The least squares method
The least squares estimator for β is
_{β} _{=} ^{} x _{i} y _{i} − nx¯y¯
ˆ
2
i
^{} x
−
nx¯ ^{2}
The least squares estimators for α is
ˆ
αˆ = y¯ − βx¯
Identifying
relationships
Dr James Abdey
Overview
Relationship between two variables
Correlation
Regression
The simple linear regression model
Parameter estimation
Interpretation of correlation coefﬁcient
Coefﬁcient of determination, R ^{2}
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Hence the line of best ﬁt has equation:
ˆ
yˆ = αˆ + βx
Again, this is routinely calculated in SPSS
Example
Returning to the unemployment/crime dataset
^{} x _{i} = 19979, ^{} x
2 = 36695129, ^{} y _{i} = 66803,
i
^{} y
2
i
= 374471231,
^{} x _{i} y _{i} = 113784494
Since n = 12, we have x¯ = 19979/12 = 1664.92 and y¯ = 66803/12 = 5566.92, hence
_{β} ˆ ^{} x _{i} y _{i} − nx¯y¯
^{} x
_{=}
2
i
−
nx¯ ^{2}
Identifying
relationships
Dr James Abdey
Overview
Relationship between two variables
Correlation
Regression
The simple linear regression model
Parameter estimation
Interpretation of correlation coefﬁcient
Coefﬁcient of determination, R ^{2}
Prediction
Regression diagnostics
Worked example
Multiple linear regression
113784494 − (12 × 1664.92 × 5566.92)
_{=}
=
36695129 − (12 × 1664.92 ^{2} )
0.7468
Example
We estimate the intercept to be
αˆ
ˆ
=
= 5566.92 − 0.7468 × 1664.92
= 4323.6
y¯ − βx¯
Hence the least squares regression line is
yˆ = 4323.6 + 0.7468x
Identifying
relationships
Dr James Abdey
Overview
Relationship between two variables
Correlation
Regression
The simple linear regression model
Parameter estimation
Interpretation of correlation coefﬁcient
Coefﬁcient of determination, R ^{2}
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Note the yˆ notation, where the ‘hat’ denotes an estimated value
Interpretation of correlation coefﬁcient
In the case of perfect correlation between X and Y , we can predict Y directly and exactly from X
In the case of zero correlation between X and Y , knowledge of X tells us nothing about Y
Identifying
relationships
Dr James Abdey
Overview
Relationship between two variables
Correlation
Regression
The simple linear regression model
Parameter estimation
Interpretation of correlation coefﬁcient
Coefﬁcient of determination, R ^{2}
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Here we consider measuring the extent to which the values of one variable can be used to predict the values of another where the correlation is neither 1, nor 0, nor −1
Interpretation of correlation coefﬁcient
Our overall objective is to explain the response variable Y , which is a random variable
We try to explain the variation in Y
Using simple linear regression, we attempt this using a single explanatory variable, X
The total variation in the response variable sample data is simply
n
i=1
(y _{i} − y¯) ^{2}
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
We term this the total sum of squares (TSS)
Interpretation of correlation coefﬁcient
We can decompose TSS into two components:
the amount we are able to explain using the model called the explained sum of squares (ESS);
and the remaining variation that we are unable to explain with the model, called the residual sum of squares (RSS)
Hence,
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
Coefﬁcient of determination, R ^{2}
We can assess the overall ﬁt of a model using R ^{2}
This measures the ‘proportion of the total variability in the response variable explained by the model’
This statistic is known as the coefﬁcient of determination and is denoted R ^{2} and deﬁned as
0 ≤ R ^{2} ≤ 1
_{R} _{2} _{=} ESS TSS
Identifying
relationships
Dr James Abdey
Overview
Relationship between two variables
Correlation
Regression
The simple linear regression model
Parameter estimation
Interpretation of correlation coefﬁcient
Coefﬁcient of determination, R ^{2}
Prediction Regression diagnostics Worked example Multiple linear regression
The closer R ^{2} is to 1, the better the explanatory power of the model
Note that R ^{2} = r ^{2} for a simple linear model, so we can also compute it from r (correlation coefﬁcient)
Coefﬁcient of determination, R ^{2}
Returning to the crime/unemployment dataset, let’s assign Y and X as follows
Y = number of offences X = unemployment
The least squares regression line was
yˆ = 4323.6 + 0.7468x
Identifying
relationships
Dr James Abdey
Overview
Relationship between two variables
Correlation
Regression
The simple linear regression model
Parameter estimation
Interpretation of correlation coefﬁcient
Coefﬁcient of determination, R ^{2}
Prediction
Regression diagnostics
Worked example
Multiple linear regression
The correlation coefﬁcient was 0.861, therefore R ^{2} = 0.861 ^{2} = 0.7413
This means we can explain 74.13% of the variation in number of offences using unemployment
Prediction
One of the purposes in calculating the line of best ﬁt is prediction
Speciﬁcally, for some value of x , we can provide a prediction for y
So, returning to the example, how many offences would you predict if there were 2000 unemployed people in a city area?
Answer: just substitute the desired value of x into the least squares regression line:
yˆ = 4323.6 + 0.7468 × 2000 = 5817
Identifying
relationships
Dr James Abdey
Overview
Relationship between two variables
Correlation
Regression
The simple linear regression model
Parameter estimation
Interpretation of correlation coefﬁcient
Coefﬁcient of determination, R ^{2}
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Prediction
Provided we are predicting y for an x value that is within the available x data, then we can be fairly conﬁdent in the prediction
This is what we call interpolation
However, if we base our prediction on an x value outside the available x data, then we should view the prediction with caution
Identifying
relationships
Dr James Abdey
Overview
Relationship between two variables
Correlation
Regression
The simple linear regression model
Parameter estimation
Interpretation of correlation coefﬁcient
Coefﬁcient of determination, R ^{2}
Prediction
Regression diagnostics
Worked example
Multiple linear regression
This would be an example of extrapolation which is risky since the relationship between x and y may change for such values of x
Regression diagnostics
The usefulness of a ﬁtted regression model rests on a basic assumption:
E(y) = α + βx
Furthermore inference such as the hypothesis tests, conﬁdence intervals and predictive intervals only make sense if the error terms are (approximately) independent and normal with constant variance σ ^{2}
Therefore it is important to check these conditions are met in practice — this task is called regression diagnostics
Basic idea: Looking into the residuals ε _{i} or the normalised residuals ε _{i} /σ
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
Regression diagnostics
What to look for?
Do the residuals manifest IID normal behaviour?
Is the scatter plot of ε _{i} versus x _{i} patternless?
Is the scatter plot of ε _{i} versus y _{i} patternless?
Identifying
relationships
Dr James Abdey
Overview
Relationship between two variables
Correlation
Regression
The simple linear regression model
Parameter estimation
Interpretation of correlation coefﬁcient
Coefﬁcient of determination, R ^{2}
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Is the scatter plot of ε _{i} versus i patternless?
If you see trends, periodic patterns, increasing variation in any one of the above scatter plots, it is very likely that at least one assumption is violated!
Regression diagnostics
Two other issues in regression diagnostics: outliers and inﬂuential observations
Outlier: An unusually small or unusually large y _{i} which lies outside of the majority of observations
An outlier is often caused by an error in either sampling or recording data. If so, we should correct it before proceeding with the regression analysis
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
If an observation which looks like an outlier indeed belongs to the sample and no errors in sampling or recording were discovered, we may use a more complex model or distribution to accommodate this ‘outlier’. For example, stock returns often exhibit extreme values and they often cannot be modelled satisfactorily by a normal regression model
Regression diagnostics
Inﬂuential observation: An x _{i} which is far away from other x _{i} s
Such an observation may have a large inﬂuence on the ﬁtted regression line
Identifying
relationships
Dr James Abdey
Overview
Relationship between two variables
Correlation
Regression
The simple linear regression model
Parameter estimation
Interpretation of correlation coefﬁcient
Coefﬁcient of determination, R ^{2}
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Regression: Worked example
We apply the simple linear regression method to study the relationship between two series of ﬁnancial returns: a regression of Cisco Systems stock returns, y , on S&P500 Index returns, x
This regression model is an example of the CAPM (Capital Asset Pricing Model)
Stock returns:
_{R}_{e}_{t}_{u}_{r}_{n}
_{=}
≈
Current price − Previous price
log
Previous price current price
previous price
Identifying
relationships
Dr James Abdey
Overview
Relationship between two variables
Correlation
Regression
The simple linear regression model
Parameter estimation
Interpretation of correlation coefﬁcient
Coefﬁcient of determination, R ^{2}
Prediction
Regression diagnostics
Worked example
Multiple linear regression
when the difference between the two prices is small
Regression: Worked example
Remark: Daily prices are deﬁnitely not independent. However, daily returns may be seen as a sequence of uncorrelated random variables
For S&P500, the average daily return is 0.04%, the maximum daily return is 4.46%, the minimum daily return is 6.01%, and the standard deviation is 1.40%
For Cisco, the average daily return is 0.13%, the maximum daily return is 15.42%, the minimum daily return is 13.44%, and the standard deviation is
4.23%
Descriptive Statistics
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
N Range
Minimum
Maximum
Mean
Std. Deviation
Variance
SP500 
252 
10.66 
Cisco 
252 
28.85 
Valid N (listwise) 
252 
6.00 
4.65 
.0424 
1.40017 
1.960 
13.44 
15.42 
.1336 
4.23419 
17.928 
Regression: Worked example
Remark: Cisco is much more volatile than the
S&P500
There is clear synchronisation between the movements of the two series of returns
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
Regression: Worked example
We ﬁt a regression model:
Cisco = α + βS&P500 + ε
Rationale: Part of the ﬂuctuation in Cisco returns was driven by the ﬂuctuation of the S&P500 returns
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
Regression: Worked example
Coefficients ^{a}
Model 
Unstandardized Coefficients 
Standardized 
t 
Sig. 
95.0% Confidence Interval for B 
Coefficients 

B Std. Error 
Beta 
Lower Bound 
Upper Bound 
(Constant) 
.012 
.064 
.188 
.851 
.139 
.114 

_{1} 
Cisco 
.227 
.015 
.687 
14.943 
.000 
.197 
.257 
a. Dependent Variable: SP500
Model Summary ^{b}
Model 
R 
R Square 
Adjusted R 
Std. Error of the Estimate 
Square 
1 .687 ^{a}
.472
.470
1.01964
a. Predictors: (Constant), Cisco
b. Dependent Variable: SP500
Identifying
relationships
Dr James Abdey
Overview
Relationship between two variables
Correlation
Regression
The simple linear regression model
Parameter estimation
Interpretation of correlation
coefﬁcient
Coefﬁcient of determination, R ^{2}
Prediction
Regression diagnostics
Worked example
Multiple linear regression
Regression: Worked example
When testing the statistical signiﬁcance of regression coefﬁcients, we just need to look at the pvalue
The smaller the pvalue, the more signiﬁcant the result, i.e. that the true parameter value is different from zero
Identifying
relationships
Dr James Abdey
Overview
Relationship between two variables
Correlation
Regression
The simple linear regression model
Parameter estimation
Interpretation of correlation coefﬁcient
Coefﬁcient of determination, R ^{2}
Prediction
Regression diagnostics
Worked example
Multiple linear regression
In practice, we treat pvalues smaller than 0.05 as being statistically signiﬁcant (at the 5% signiﬁcance level)
Regression: Worked example
The estimated slope: β = 2.077. The null hypothesis H _{0} : β = 0 is rejected with pvalue 0.000: extremely signiﬁcant
Attempted interpretation: When the market index goes up by 1%, Cisco stock goes up by 2.077%, on average. However, the error term ε in the model is large with an estimated σ = 3.08%
Identifying
relationships
Dr James Abdey
Overview
Relationship between two variables
Correlation
Regression
The simple linear regression model
Parameter estimation
Interpretation of correlation coefﬁcient
Coefﬁcient of determination, R ^{2}
Prediction
Regression diagnostics
Worked example
Multiple linear regression
The pvalue for testing H _{0} : α = 0 is 0.815, so we cannot reject the hypothesis that α = 0
Recall α = y¯ − βx¯ and both y¯ and x¯ are very close to
0
Regression: Worked example
R ^{2} = 47.2% of the variation of Cisco stock may be explained by the variation of the S&P500 index, or in other words 47.2% of the risk in Cisco stock is the marketrelated risk — see CAPM below
CAPM: A simple asset pricing model in ﬁnance:
y _{i} = α + βx _{i} + ε _{i}
Identifying
relationships
Dr James Abdey
Overview
Relationship between two variables
Correlation
Regression
The simple linear regression model
Parameter estimation
Interpretation of correlation coefﬁcient
Coefﬁcient of determination, R ^{2}
Prediction
Regression diagnostics
Worked example
Multiple linear regression
where y _{i} is a stock return and x _{i} is a market return at time i
Regression: Worked example
Total risk of the stock:
1
n
n
i=1
(y _{i} − y¯) ^{2} = ^{1}
n
n
i=1
( y _{i} − y¯) ^{2} + ^{1}
n
n
i=1
(y _{i} − y _{i} ) ^{2}
Marketrelated (or systematic) risk:
1
n
n
i=1
( y _{i} − y¯) ^{2} = ^{1}
n
β
2
Firmspeciﬁc risk:
n
i=1
(x _{i} − x¯) ^{2}
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two 

variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
Regression: Worked example
β measures the marketrelated (or systematic) risk of the stock
Marketrelated risk is unavoidable, while ﬁrmspeciﬁc risk may be ‘diversiﬁed away’ through hedging
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
Variance is a simple measure (and one of the most frequently used) for risk in ﬁnance
Multiple linear regression
Previously we saw simple linear regression
That had one explanatory variable
Often one explanatory variable is not enough to explain variation in the response variable
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
So we add more linear explanatory variables
Multiple linear regression — examples
Absenteeism in the workforce could be due to:
hours worked ﬂexibility in work practice salary paid
Salary for managers could be related to:
qualiﬁcations experience hours worked performance
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
Multiple linear regression
Remember the aim of statistics is prediction and decision making
In order to make the best predictions and decisions we need to use the best models
This often means making more complex models adding more explanation
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
But not too complex (Occam’s razor)
The multiple linear model
Suppose y is the manager’s salary
x _{1} = qualiﬁcations, x _{2} = experience, x _{3} = hours, x _{4} = performance
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
y = β _{0} + β _{q}_{u}_{a}_{l} x _{1} + β _{e}_{x}_{p} x _{2} + β _{h}_{r}_{s} x _{3} + β _{p}_{e}_{r} x _{4} +
We can visualise up to n = 3
The multiple linear model
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
The multiple linear model
Multiple linear regression uses least squares estimation like simple linear regression
That is, we minimise the sum of the squared residuals in all dimensions
Identifying
relationships
Dr James Abdey
Overview 

Relationship between two variables 

Correlation 

Regression 

The simple linear regression model 

Parameter estimation 

Interpretation of correlation coefﬁcient 

Coefﬁcient of determination, 

R 
^{2} 
Prediction 

Regression diagnostics 

Worked example 

Multiple linear regression 
Sounds tricky, but fortunately software (SPSS etc.) takes care of that for us
Bien plus que des documents.
Découvrez tout ce que Scribd a à offrir, dont les livres et les livres audio des principaux éditeurs.
Annulez à tout moment.