Académique Documents
Professionnel Documents
Culture Documents
Linear Regression
Linear & Polynomial Regression
Learning Objectives
Phase Review Phase Review Phase Review Phase Review Phase Review
Scatter Diagram Revisited
Purpose
To show:
• How one variable changes in response to changes in
another.
• The nature of the relationship between two
variables.
• The strength of relationship between two variables.
Scatter Diagram
Suspected Suspected
Cause Effect
Amount Health
Overweight Index
50 .53
81 .32
117
100
.10
.13
Low
68 .59
Suspected Suspected 77 .40
112 .28
Cause Effect 49
70
.45
.50
Amount Health 89 .25
70 .34
Overweight Index 115 .18
52 .60
90 .42
70 .43
High 121
80
.15
.49
40 .65 High
75 .22
Low 35 .58
100 .35
Scatter Diagram
Health
Index .60
.50
.40
.30
.20
.10
45 65 85 105 125
Amount
Overweight
Scatter Diagram
No Correlation
There is no correlation.
Scatter Diagram: Risks & Limitations
y = f(x)
y
y = β0 + β1x + error
β0 is the intercept
β1 is the slope x
Linear Regression - Example
The following slide shows the data for the past 50 working
days.
Linear Regression - Example
Enter Errors
in Y and
Volume in X
and click OK
Scatter Diagram
Scatterplot of Errors vs Volume
30
25
20
Errors
15
10
A scatter diagram reveals that there may be a relationship between the number of
errors and the volume of invoices. A regression analysis will reveal the existence
and/or the strength of the relationship.
Linear Regression – Least Squares Method
Scatterplot of Errors vs Volume
30
25
20
Errors
15
10
We first need to establish the equation for the best fitting line which will minimise the
sum of squares of the predicted y values from the observed y values. In short, this is
known as the “least squares” method.
Regression - Minitab
1. Enter Errors
and Volume
2. Check Linear
Minitab – Regression Plot
25
R-Sq(adj) 78.9%
We can use it to
20
predict:
Errors
15
e.g. if we have 200
10
invoices we would
5 predict:
0
150 175 200 225 250 275 300 -21.74 + 0.1465 (200)
Volume
= 7.6 errors
Minitab – Regression Plot
15
15 deviation of the
10 residuals
5
(the difference
between actual and
0
= σ n2−1
Total Degrees of Freedom n −1
2070 .00 42.245
σn2 −1 = = Check this out by calculating the standard
49
deviation of the 50 error results
σn −1 = 42.245 = 6.5
Is there a significant relationship between y and x?
Source Degrees
of Variation of Freedom Sum of Squares Mean Square F-Ratio
We can test the significance of the relationship between y and x by examining the
F-Ratio. The F-Ratio is name after Sir Ronald Fisher, who devised this test for
comparing variances.
Source DF SS MS F P
Regression 1 1642.07 1642.07 184.19 0.000
Error 48 427.93 8.92
Total 49 2070.00
value 25
20
Residual
Predicted
Errors
15
value
10
Observed y Predicted y
x y y = -21.74+0.1465x (y-y) ( y – y )2
155 2 0.9675 -1.0325 1.066
165 5 2.4325 -2.5675 6.592
170 3 2.485 -0.515 0.265
* * * * *
* * * * *
* * * * *
* * * * *
* * * * *
* * * * *
199 5 6.6175 1.6175 2.616
427.93 = SSRESIDUAL
Residuals vs Fits
Residuals Versus the Fitted Values
(response is Errors)
7.5
5.0
2.5
Residual
0.0
-2.5
-5.0
0 5 10 15 20
Fitted Value
60
50
40
30
20
10
1
-8 -6 -4 -2 0 2 4 6 8
RESI1
• In this case a Normality Test of the Residuals shows that they are
Normal (p value > 0.05)
How accurate is my prediction of y?
Source Degrees
of Variation of Freedom Sum of Squares Mean Square F-Ratio
1. Enter Errors
and Volume
2. Check Linear
3. Click on Options
How accurate is my prediction of y?
Tick both
Display Options
How accurate is my prediction of y?
Fitted Line Plot
Errors = - 21.74 + 0.1465 Volume
30 Regression
95% C I
95% PI
S 2.98583
20 R-Sq 79.3%
R-Sq(adj) 78.9%
Errors
10
-10
150 175 200 225 250 275 300
Volume
• 95% Confidence Intervals show the range of values we expect for the average value of
errors for any particular volume of invoices being processed
• 95% Prediction Intervals show the range of values within which we expect 95% of the
individual error values to be if we use the regression equation to predict this
• Precise values can be obtained within the Stat > Regression > Regression menu
Regression Exercises
Question 1:
A company developing healthcare software solutions is bidding for a new
contract and has historical data on similar previous contracts. It wants to
minimise the risk of failing to deliver the solution on time, so wants a good
estimate of the man-years of effort needed (the output measure, or y).
The variables previously recorded are the number of application sub-programs
written (x1), and the number of software configuration change proposals
implemented (x2).
Use regression to:
1. Investigate the relationship between x1 and the man-years required
2. Investigate the relationship between x2 and the man-years required
3. If the company estimates that 150 application sub-programs will be required,
and there are likely to be 100 software configuration change proposals
implemented, what would be your recommendation for the number of man-
years they should estimate?
Data is in Minitab Worksheet: Transactional Regression Exercises.mtw
Regression Exercises
Question 2:
The team investigating the Expense Claims process have
identified a potential input variable (x) that they believe
could affect the amount of time taken to pay the claims. The
potential variable is the amount of money claimed, and they
have gathered data on amounts claimed for the 100 payment
times they already had. Use Regression Analysis to
investigate the relationship, and be prepared to advise the
team on your conclusions.
Data is in Minitab Worksheet:
PAYMENT TIMES.mtw
Summary - Linear & Polynomial Regression