Linear Regression and Curve Fitting-12nov

Linear Regression
and
Curve Fitting
1
INTRODUCTION
 Regression: Step back
 It investigates the dependence of one variable , conventionally called the
dependent variable, on one or more other variables called independent
variables .
 It provide an equation to be used for estimating or predicting the average

value of the dependent variable from the known values of the independent
variables.
 The relation between the expected value of the dependent variable and the
independent variable is called regression relation.
 The dependence is rrepresented by a straight line equation, the regression

is said to b linear, otherwise it is said to be curvilinear.
2
CURVE FITTING
Describes techniques to fit curves (curve fitting) to discrete

data to obtain intermediate estimates.
There are two general approaches for curve fitting:

• Least Squares regression:
Data exhibit a significant degree of scatter. The strategy is to
derive a single curve that represents the general trend of
the data.
• Interpolation:
Data is very precise. The strategy is to pass a curve or a
series of curves through each of the points.
INTRODUCTION
In engineering, two types of applications are

encountered:
 Trend analysis. Predicting values of dependent
variable, may include extrapolation beyond data
points or interpolation between data points.
 Hypothesis testing. Comparing existing

mathematical model with measured data.
Introduction to Mathematical Modeling
SCATTER PLOT
5
CURVE FITTING
Breeding Chows and Vizslas
10
9
8
Disposition
7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
Appearance
6
PROCEDURE
 Given paired data points (xi,yi).

 Produce a scatterplot of the paired data points.
 Fit a linear equation

Y = aX+b
and its graph (curve) to the data.
 Evaluate the fit.
 Analyze the result.
7
TABLE OF DATA
Dog Appearance Disposition
Sam 4 2
Jake 6 6
Gus 7 7
10
Max 3 3
9
Suzie 7 10 8
Rover 8 10 7
Zeek 2 5 6
Rex 4 6 5
Tiesha 3 3 4
3
BJ 4 5
2
Missy 7 7 1
Mean 5 5.82 0
0
8
SCATTER PLOT OF DATA
10
9
8
Disposition
7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9
Appearance
9
Plotting the Means (x,y)=(5,5.82)
10
9
8
Disposition
7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9
Appearance
10
ADDING A TRENDLINE
10
9
8
Disposition
7
6
5 e2
4
3
e1 e3
2
1
0
0 1 2 3 4 5 6 7 8 9
Appearance
11
LEAST SQUARES FIT
 Minimize the
error sum of squares
Q =  (ei)2
 The smaller Q, the closer the correlation coefficient R2 is
to 1.
 A perfect fit (all points on the line) has R2=1.
12
TRENDLINE AND EQUATION
Breeding Chows and Vizslas D= 1.0476A + 0.5801
R2 = 0.6619
10
9
8
Disposition
7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
Appearance
13
NO CORRELATION
 This graph R2 = 0.0003
7
shows 6
essentially no 5
correlation Y
4
between the 3
2
variables X 1
and Y. 0
0 1 2 3 4 5 6 7 8 9 10
X
14
HIGH CORRELATION
 This graph 24
shows a high 20
degree of 16
correlation Y 12
between X 8
and Y. 4
0
0 2 4 6 8 10 12
X Y = 1.6927X + 3.0273
R2 = 0.9996
15
OUTLIERS
24
 Outliers affect
the degree of 20
correlation. 16
Y 12
 Outliers affect
the fitted 8
curve. 4
0
0 2 4 6 8 10 12
Y = 1.5698X+ 3.9045
X
R2 = 0.7968
16
MATHEMATICAL BACKGROUND
 Arithmetic mean. The sum of the individual data
points (yi) divided by the number of points (n).
y
 y i
, i  1, , n
n
 Standard deviation. The most common measure of

a spread for a sample.
St
Sy  , St   ( yi  y ) 2
n 1
MATHEMATICAL BACKGROUND (CONT’D)
 Variance. Representation of spread by the square of

the standard deviation.
 i   y   y 
2 2
( y y ) 2
/n
S 
2
or S 2
 i i
n 1
y
n 1
y
 Coefficient of variation. Has the utility to quantify the

spread of data.
Sy
c.v.  100%
y
Least Squares Regression
Linear Regression
Fitting a straight line to a set of paired observations:
(x1, y1), (x2, y2),…,(xn, yn).
y = a0+ a1 x + e
a1 - slope
a0 - intercept
e - error, or residual, between the model and the
observations
LINEAR REGRESSION: RESIDUAL
LINEAR REGRESSION: QUESTION
How to find a0 and a1 so that the error would be

minimum?
LINEAR REGRESSION: CRITERIA FOR A “BEST”
FIT
n n
min e  (y
i 1
i
i 1
i  a0  a1 xi )
e1 e1= -e2
e2
FIT
n n
min | e |  | y
i 1
i
i 1
i  a0  a1 xi |
FIT
n
min max | ei || yi  a0  a1 xi |
i 1
LINEAR REGRESSION: LEAST SQUARES FIT
n n n
S r   e   ( yi , measured  yi , model )   ( yi  a0  a1 xi ) 2
2
i
2
i 1 i 1 i 1
n n
min S r   ei
2
  ( yi  a0  a1 xi ) 2
i 1 i 1
Yields a unique line for a given set of data.

LINEAR REGRESSION: LEAST SQUARES FIT
n n
min S r   ei
2
  ( yi  a0  a1 xi ) 2
i 1 i 1
The coefficients a0 and a1 that minimize Sr must satisfy the

following conditions:
 S r
 a  0
 0

 S r  0
 a1
LINEAR REGRESSION:
DETERMINATION OF AO AND A1
S r
 2 ( yi  ao  a1 xi )  0
ao
S r
 2 ( yi  ao  a1 xi ) xi   0
a1
0   yi   a 0   a1 xi
0   yi xi   a 0 xi   a1 xi2
a 0  na0
na0   xi a1   yi 2 equations with 2 unknowns,
can be solved simultaneously
 ii  0i 1i
y x  a x  a x 2
LINEAR REGRESSION:
DETERMINATION OF AO AND A1
n xi yi   xi  yi
a1 
n x   xi 
2 2
i
a0  y  a1 x
30
31
ERROR QUANTIFICATION OF LINEAR
REGRESSION
 Total sum of the squares around the mean

for the dependent variable, y, is St
St   ( yi  y ) 2
 Sum of the squares of residuals around the

regression line is Sr
n n
S r   ei2   ( yi  ao  a1 xi ) 2
i 1 i 1
REGRESSION
 St-Sr quantifies the improvement or error
reduction due to describing data in terms of a
straight line rather than as an average value.
St  S r
r 
2
St
r2: coefficient of determination
r : correlation coefficient
REGRESSION
For a perfect fit:

 Sr= 0 and r = r2 =1, signifying that the line
explains 100 percent of the variability of the
data.
 For r = r2 = 0, Sr = St, the fit represents no
improvement.
LEAST SQUARES FIT OF A STRAIGHT
LINE: EXAMPLE
Fit a straight line to the x and y values in the

following Table:
xi yi xiyi xi2
 xi  28  yi  24.0
1 0.5 0.5 1
2 2.5 5 4  i  140
x 2
 xi yi  119 .5
3 2 6 9
28 24
4 4 16 16 x 4 y  3.42857
5 3.5 17.5 25 7 7
6 6 36 36
28 24
7 5.5 38.5 x  49  4 y  3.428571
7 7
28 24 119.5 140
LEAST SQUARES FIT OF A STRAIGHT LINE:
EXAMPLE (CONT’D)
n xi yi   xi  yi
a1 
n x  ( xi )
2 2
i
7 119.5  28  24
  0.8392857
7 140  28 2
a0  y  a1 x
 3.428571  0.8392857  4  0.07142857
Y = 0.07142857 + 0.8392857 x
LEAST SQUARES FIT OF A STRAIGHT LINE: EXAMPLE
(ERROR ANALYSIS)
^
xi yi (yi  y)
2
e  ( yi  y ) 2
2

 i 
i
   22.7143
2
1 0.5 8.5765 0.1687 S t y y
2 2.5 0.8622 0.5625
S r   ei  2.9911
2
3 2.0 2.0408 0.3473
4 4.0 0.3265 0.3265
5 3.5 0.0051 0.5896 St  S r
6 6.0 6.6122 0.7972 r 
2
 0.868
St
7 5.5 4.2908 0.1993
28 24.0 22.7143 2.9911
r  r 2  0.868  0.932
LEAST SQUARES FIT OF A STRAIGHT LINE:
EXAMPLE (ERROR ANALYSIS)
•The standard deviation (quantifies the spread around the

mean):
St 22.7143
sy    1.9457
n 1 7 1
•The standard error of estimate (quantifies the spread around
the regression line)
Sr 2.9911
sy / x    0.7735
n2 72
Because S y / x  S y , the linear regression model has good fitness
ALGORITHM FOR LINEAR REGRESSION
LINEARIZATION OF NONLINEAR RELATIONSHIPS
•The relationship between the dependent and
independent variables is linear.
•However, a few types of nonlinear functions
can be transformed into linear regression
problems.
 The exponential equation.
 The power equation.
 The saturation-growth-rate equation.
1. THE EXPONENTIAL EQUATION.
y  a1eb1x 
ln y  ln a1  b1 x
y* = ao + a 1 x
2. THE POWER EQUATION
y  a2 xb2 
log y  log a2  b2 log x

y* = ao + a1 x*
3. THE SATURATION-GROWTH-RATE EQUATION
x
y  a3 
b3  x
y* = 1/y
1 1 b3  1  ao = 1/a3
   
y a3 a3  x  a1 = b3/a3
x* = 1/x
EXAMPLE
Fit the following Equation:
y  a2 x b2
to the data in the following table: log y  log( a2 xb2 )
xi yi X*=log xi Y*=logyi log y  log a2  b2 log x
1 0.5 0 -0.301 let Y *  log y, X *  log x,
2 1.7 0.301 0.226 a0  log a2 , a1  b2
3 3.4 0.477 0.534
4 5.7 0.602 0.753 Y *  a0  a1 X *
5 8.4 0.699 0.922
15 19.7 2.079 2.141

Linear Regression and Curve Fitting-12nov

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Linear Regression and Curve Fitting-12nov

Transféré par

Droits d'auteur :

Formats disponibles

Linear Regression

 It provide an equation to be used for estimating or predicting the average

 The dependence is rrepresented by a straight line equation, the regression

Describes techniques to fit curves (curve fitting) to discrete

There are two general approaches for curve fitting:

In engineering, two types of applications are

 Hypothesis testing. Comparing existing

 Given paired data points (xi,yi).

 Fit a linear equation

 Analyze the result.

Breeding Chows and Vizslas

Breeding Chows and Vizslas

Breeding Chows and Vizslas

LEAST SQUARES FIT

 Standard deviation. The most common measure of

 Variance. Representation of spread by the square of

 Coefficient of variation. Has the utility to quantify the

How to find a0 and a1 so that the error would be

Yields a unique line for a given set of data.

The coefficients a0 and a1 that minimize Sr must satisfy the

 Total sum of the squares around the mean

 Sum of the squares of residuals around the

For a perfect fit:

Fit a straight line to the x and y values in the

•The standard deviation (quantifies the spread around the

log y  log a2  b2 log x

Vous aimerez peut-être aussi