Numerical Methods For Engineers

Numerical Methods for Civil Engineers
Lecture 7
Curve Fitting
Linear Regression
Polynomial Regression
Multiple Linear Regression
Mongkol JIRAVACHARADET
SURANAREE
UNIVERSITY OF TECHNOLOGY
INSTITUTE OF ENGINEERING
SCHOOL OF CIVIL ENGINEERING
LINEAR REGRESSION
y
Candidate lines for curve fit

y = x +
x
No exact solution but many approximated solutions
Error Between Model and Observation

y
Observation: [ xi yi ]
Model: y = x +
Error: ei = yi - xi +
x
Criteria for a Best Fit
Find the BEST line which minimize the sum of error for all data
Least-Square Fit of a Straight Line

Minimize sum of the square of the errors
n
i =1
i =1
S r = e i2 = (y i x i )
Differentiate with respect to each coefficient:
S r
= 2 ( y i x i )
S r
= 2 [( y i x i ) x i ]
Setting derivatives = 0 :
0 = y i x i
0 = y i x i x i x i2
From = n , express equations as set of 2 unknowns ( , )
n + x i = y i
x i + x i2 = y i x i
Solve equations simultaneously:
1
xi yi xi yi
n
=
1
2
2
xi ( xi )
n
S xy
S xx
= y x
where y and x are the mean of y and x
1
Define: S xy = xi yi xi yi
n
1
2
S xx = x ( xi )
n
2
i
S yy = yi2
1
2
y
( i)
n
Approximated y for any x is
y = x +
Example: Fit a straight line to x and y values

xi2
xi yi
y i2
xi
yi
1
2
3
4
5
6
7
0.5
2.5
2.0
4.0
3.5
6.0
5.5
1
4
9
16
25
36
49
0.5
5.0
6.0
16.0
17.5
36.0
38.5
0.25
6.25
4
16
12.25
36
30.25
28
24
140
119.5
105
(119.5) (28)(24) / 7
=
= 0.8393
2
(140) (28) / 7
= 3.4286 0.8393(4) = 0.0714
n =7
28
x =
=4
7
24
y =
= 3 . 4286
7
Least-square fit:
y = 0 . 8393 x + 0 . 0714
How good is our fit?

Sum of the square of the errors:
n
i =1
i =1
S r = ei2 = ( yi xi ) = ( S xx S yy S xy2 ) / S xx
2
Sum of the square around the mean:
St = ( yi y ) = S yy
i =1
Standard errors of the estimation:
Standard deviation:
sy / x
Sr
=
n2
St
sy =
n2
sy
Linear regression
sy > sy/x
sy/x
S xy2
Coefficient of determination
St S r
=
r =
S xx S yy
St
r2
y

For perfect fit Sr = 0 and r = r2 = 1
Example: error analysis of the linear fit

xi
yi
( yi y )
1
2
3
4
5
6
7
0.5
2.5
2.0
4.0
3.5
6.0
5.5
8.5765
0.8622
2.0408
0.3265
0.0051
6.6122
4.2908
28
24
22.7143
( y i - - x i) 2
0.1687
0.5626
0.3473
0.3265
0.5896
0.7972
0.1993
y = 3.4286
= 0.8393
= 0.0714
22.7143
sy =
72
= 2.131
sy / x
2.9911
2.9911
=
72
= 0.773
St
Sr
Since sy/x < sy , linear regression has merit.
r=
22.7143 2.9911
= 0.868 = 0.932
22.7143
Linear model explains 86.8% of original uncertainty.
OR Example: error analysis of the linear fit

xi2
y i2
xi
yi
xi yi
1
2
3
4
5
6
7
0.5
2.5
2.0
4.0
3.5
6.0
5.5
1
4
9
16
25
36
49
0.5
5.0
6.0
16.0
17.5
36.0
38.5
0.25
6.25
4
16
12.25
36
30.25
28
24
140
119.5
105
S xx = 140 282 / 7 = 28
S yy = 105 242 / 7 = 22.7
S xy = 119.5 28 24 / 7 = 23.5
Since sy/x < sy , linear regression has merit.

2
23.5
r2 =
= 0.869
28 22.7
S r = (28 22.7 23.52 ) / 28

= 2.977
sy =
sy / x =
Linear model explains 86.9% of original uncertainty.
22.7
= 2.131
72
2.977
= 0.772
72
Confidence Interval (CI)

A confidence interval is an interval in which a measurement or trial
falls corresponding to a given probability.
y i
y i
i
x
x
For CI 95%, you can be 95% confident that the two curved
confidence bands enclose the true best-fit linear regression line,

leaving a 5% chance that the true line is outside those boundaries.
A 100 (1 - ) % confidence interval for yi is given by

Confidence interval 95% = 0.05
yi t / 2 s y / x
1 ( xi x ) 2
+
n
S xx
Example: to estimate y when x is 3.4 using 95% confidence interval:

xi
1
2
3
4
5
6
7
yi
0.5
2.5
2.0
4.0
3.5
6.0
5.5
y = x + = 0.8363(3.4) + 0.0714 = 2.9148
95% Confidence = 0.05 t/2 = t0.025(df = n-2 = 5) = 2.571
1 (3.4 4) 2
Interval: 2.9148 (2.571) (0.772) +
7
28
2.9148 0.7832
T-Distribution
t 0.025
Probability density function of

the t distribution:
t 0.025
t 0.005
t 0.005
f ( x) =
(1 + x 2 / ) ( +1) / 2
t
95%
99%
B(0.5, 0.5 )
where B is the beta function and

is a positive integer
shape parameter.
The formula for the beta function is

1
B ( , ) = t 1 (1 t ) 1 dt
0
The following is the plot of the t probability density function for 4 different
values of the shape parameter.
= df
Degree of freedom
In fact, the t distribution with equal to 1 is a Cauchy distribution.

The t distribution approaches a normal distribution as becomes large.
The approximation is quite good for values of > 30.
Critical Values of t
Confidence Interval
df
80%
0.10
90%
0.05
95%
0.025
98%
0.01
99%
0.005
99.8%
0.001
1
2
3
4
5
3.078
1.886
1.638
1.533
1.476
6.314
2.920
2.353
2.132
2.015
12.706
4.303
3.182
2.776
2.571
31.821
6.965
4.541
3.747
3.365
63.657
9.925
5.841
4.604
4.032
318.313
22.327
10.215
7.173
5.893
6
7
8
9
10
1.440
1.415
1.397
1.383
1.372
1.943
1.895
1.860
1.833
1.812
2.447
2.365
2.306
2.262
2.228
3.143
2.998
2.896
2.821
2.764
3.707
3.499
3.355
3.250
3.169
5.208
4.782
4.499
4.296
4.143
11
12
13
14
15
1.363
1.356
1.350
1.345
1.341
1.796
1.782
1.771
1.761
1.753
2.201
2.179
2.160
2.145
2.131
2.718
2.681
2.650
2.624
2.602
3.106
3.055
3.012
2.977
2.947
4.024
3.929
3.852
3.787
3.733
16
17
18
19
20
1.337
1.333
1.330
1.328
1.325
1.746
1.740
1.734
1.729
1.725
2.120
2.110
2.101
2.093
2.086
2.583
2.567
2.552
2.539
2.528
2.921
2.898
2.878
2.861
2.845
3.686
3.646
3.610
3.579
3.552
Confidence Interval
df
80%
0.10
90%
0.05
95%
0.025
98%
0.01
99%
0.005
99.8%
0.001
21
22
23
24
25
1.323
1.321
1.319
1.318
1.316
1.721
1.717
1.714
1.711
1.708
2.080
2.074
2.069
2.064
2.060
2.518
2.508
2.500
2.492
2.485
2.831
2.819
2.807
2.797
2.787
3.527
3.505
3.485
3.467
3.450
26
27
28
29
30
1.315
1.314
1.313
1.311
1.310
1.706
1.703
1.701
1.699
1.697
2.056
2.052
2.048
2.045
2.042
2.479
2.473
2.467
2.462
2.457
2.779
2.771
2.763
2.756
2.750
3.435
3.421
3.408
3.396
3.385
31
32
33
34
35
1.309
1.309
1.308
1.307
1.306
1.696
1.694
1.692
1.691
1.690
2.040
2.037
2.035
2.032
2.030
2.453
2.449
2.445
2.441
2.438
2.744
2.738
2.733
2.728
2.724
3.375
3.365
3.356
3.348
3.340
36
37
38
39
40
1.306
1.305
1.304
1.304
1.303
1.688
1.687
1.686
1.685
1.684
2.028
2.026
2.024
2.023
2.021
2.434
2.431
2.429
2.426
2.423
2.719
2.715
2.712
2.708
2.704
3.333
3.326
3.319
3.313
3.307
Confidence Interval
df
80%
0.10
90%
0.05
95%
0.025
98%
0.01
99%
0.005
99.8%
0.001
41
42
43
44
45
1.303
1.302
1.302
1.301
1.301
1.683
1.682
1.681
1.680
1.679
2.020
2.018
2.017
2.015
2.014
2.421
2.418
2.416
2.414
2.412
2.701
2.698
2.695
2.692
2.690
3.301
3.296
3.291
3.286
3.281
46
47
48
49
50
1.300
1.300
1.299
1.299
1.299
1.679
1.678
1.677
1.677
1.676
2.013
2.012
2.011
2.010
2.009
2.410
2.408
2.407
2.405
2.403
2.687
2.685
2.682
2.680
2.678
3.277
3.273
3.269
3.265
3.261
51
52
53
54
55
1.298
1.298
1.298
1.297
1.297
1.675
1.675
1.674
1.674
1.673
2.008
2.007
2.006
2.005
2.004
2.402
2.400
2.399
2.397
2.396
2.676
2.674
2.672
2.670
2.668
3.258
3.255
3.251
3.248
3.245
56
57
58
59
60
1.297
1.297
1.296
1.296
1.296
1.673
1.672
1.672
1.671
1.671
2.003
2.002
2.002
2.001
2.000
2.395
2.394
2.392
2.391
2.390
2.667
2.665
2.663
2.662
2.660
3.242
3.239
3.237
3.234
3.232
Confidence Interval
df
80%
0.10
90%
0.05
95%
0.025
98%
0.01
99%
0.005
99.8%
0.001
61
62
63
64
65
1.296
1.295
1.295
1.295
1.295
1.670
1.670
1.669
1.669
1.669
2.000
1.999
1.998
1.998
1.997
2.389
2.388
2.387
2.386
2.385
2.659
2.657
2.656
2.655
2.654
3.229
3.227
3.225
3.223
3.220
66
67
68
69
70
1.295
1.294
1.294
1.294
1.294
1.668
1.668
1.668
1.667
1.667
1.997
1.996
1.995
1.995
1.994
2.384
2.383
2.382
2.382
2.381
2.652
2.651
2.650
2.649
2.648
3.218
3.216
3.214
3.213
3.211
71
72
73
74
75
1.294
1.293
1.293
1.293
1.293
1.667
1.666
1.666
1.666
1.665
1.994
1.993
1.993
1.993
1.992
2.380
2.379
2.379
2.378
2.377
2.647
2.646
2.645
2.644
2.643
3.209
3.207
3.206
3.204
3.202
76
77
78
79
80
1.293
1.293
1.292
1.292
1.292
1.665
1.665
1.665
1.664
1.664
1.992
1.991
1.991
1.990
1.990
2.376
2.376
2.375
2.374
2.374
2.642
2.641
2.640
2.640
2.639
3.201
3.199
3.198
3.197
3.195
Confidence Interval
df
80%
0.10
90%
0.05
95%
0.025
98%
0.01
99%
0.005
99.8%
0.001
81
82
83
84
85
1.292
1.292
1.292
1.292
1.292
1.664
1.664
1.663
1.663
1.663
1.990
1.989
1.989
1.989
1.988
2.373
2.373
2.372
2.372
2.371
2.638
2.637
2.636
2.636
2.635
3.194
3.193
3.191
3.190
3.189
86
87
88
89
90
1.291
1.291
1.291
1.291
1.291
1.663
1.663
1.662
1.662
1.662
1.988
1.988
1.987
1.987
1.987
2.370
2.370
2.369
2.369
2.368
2.634
2.634
2.633
2.632
2.632
3.188
3.187
3.185
3.184
3.183
91
92
93
94
95
1.291
1.291
1.291
1.291
1.291
1.662
1.662
1.661
1.661
1.661
1.986
1.986
1.986
1.986
1.985
2.368
2.368
2.367
2.367
2.366
2.631
2.630
2.630
2.629
2.629
3.182
3.181
3.180
3.179
3.178
96
97
98
99
100
1.290
1.290
1.290
1.290
1.290
1.661
1.661
1.661
1.660
1.660
1.985
1.985
1.984
1.984
1.984
2.366
2.365
2.365
2.365
2.364
2.628
2.627
2.627
2.626
2.626
3.177
3.176
3.175
3.175
3.174
1.282
1.645
1.960
2.326
2.576
3.090
Polynomial Regression
Second-order polynomial:
y = a0 + a1x + a2 x2
Sum of the squares of the residuals:
S r = ( y i a 0 a 1 x i a 2 x i2 ) 2
Take derivative with respect to each coefficients:

S r
= 2 ( y i a 0 a 1 x i a 2 x i2 )
a 0
S r
= 2 x i ( y i a 0 a 1 x i a 2 x i2 )
a 1
S r
= 2 x i2 ( y i a 0 a 1 x i a 2 x i2 )
a 2
Normal equations:
( )
)a + ( x )a
)a + ( x )a
n a 0 + ( x i )a 1 + x i2 a 2 = y i
( x i )a 0 + ( x i2
( x )a + ( x
2
i
3
i
3
i
4
i
= x iy i
= x i2 y i
For mth-order polynomial: y = a0 + a1x + a2 x2 + . . . + amxm

We have to solve m+1 simultaneous linear equations.
MATLAB polyfit Function

For second-order polynomial, we can define
x12
x1
1
y1
c1
2
y
x2
x2
1
A=
, Y = 2 , C = c2

c3
2

xm 1
ym
xm
and show that C = ( A ' A) 1 A ' Y or C=A -1Y
Fit norm
Fit QR
>> C = polyfit(x, y, n)
>> [C, S] = polyfit(x, y, n)
x = independent variable
y = dependent variable
n = degree of polynomial
C = coeff. of polynomial in
descending power
S = data structure for polyval
function
Example: Fit a second-order polynomial to the data.

xi
yi
( yi y )
0
1
2
3
4
5
2.1
7.7
13.6
27.2
40.9
61.1
544.44
314.47
140.03
3.12
239.22
1272.11
15 152.6
From the given data:
2513.39
( yi - a0 - a1xi - a2xi2)2
0.14332
1.00286
1.08158
0.80491
0.61951
0.09439
3.74657
m=2
xi = 15
xi4 = 979
n=6
yi = 152.6
xi yi = 585.6
x = 2.5
xi2 = 55
xi2 yi = 585.6
y = 25.433
xi3 = 225
Simultaneous linear equations

6
15
55
15
55
225
25 a0 152.6

225 a1 = 585.6
979 a2 2488.8
Solving these equation gives a0 = 2.47857, a1 = 2.35929, and

a2 = 1.86071.
Least-squares quadratic equation:
y = 2.47857 + 2.35929x + 1.86071x2
Coefficient of determination:
r=
2513.39 3.74657
= 0.99851 = 0.99925
2513.39
Solving by MATLAB polyfit Function

>> x = [0 1 2 3 4 5];
>> y = [2.1 7.7 13.6 27.2 40.9 61.1];
>> c = polyfit(x, y, 2)
>> [c, s] = polyfit(x, y, 2)
>> st = sum((y - mean(y)).^2)
>> sr = sum((y - polyval(c, x)).^2)
>> r = sqrt((st - sr) / st)
MATLAB polyval Function

Evaluate polynomial at the points defined by the input vector
>> y = polyval(c, x)
where x = Input vector
y = Value of polynomial evaluated at x
c = vector of coefficient in descending order
Y = c(1)*xn + c(2)*x(n-1) + ... + c(n)*x + c(n+1)
Example: y = 1.86071x2 + 2.35929x + 2.47857
>> c = [1.86071 2.35929 2.47857]
Polynomial Interpolation
70
60
50
y
40
30
20
10
0
0
>> y2 = polyval(c,x)
>> plot(x, y, o, x, y2)
Error Bounds
By passing an optional second output parameter from polyfit as an input
to polyval.
>> [c,s] = polyfit(x,y,2)

>> [y2,delta] = polyval(c,x,s)
>> plot(x,y,'o',x,y2,'g-',x,y2+2*delta,'r:',x,y2-2*delta,'r:')
Interval of 2 = 95% confidence interval
Linear Regression Example:

xi
yi
1
2
3
4
5
6
7
0.5
2.5
2.0
4.0
3.5
6.0
5.5
>> [c,s] = polyfit(x,y,1)

>> [y2,delta] = polyval(c,x,s)
>> plot(x,y,'o',x,y2,'g-',x,y2+2*delta,'r:',x,y2-2*delta,'r:')
Multiple Linear Regression

y = c0 + c1x1 + c2x2 + . . . + cpxp
Example case: two independent variables y = c0 + c1x1 + c2x2
Sum of squares of the residual: S r = ( yi c0 c1 x1i c2 x2i ) 2
Differentiate with respect to unknowns:
S r
= 2 ( yi c0 c1 x1i c2 x2i )
c0
S r
= 2 x1i ( yi c0 c1 x1i c2 x2i )
c1
S r
= 2 x2i ( yi c0 c1 x1i c2 x2i )
c2
Setting partial derivatives = 0 and expressing result in matrix form:
x1i
n
x
1i
x2i
Example:
x1 x2
0
2
2.5
1
4
7
0
1
2
3
6
2
x12i
x1i x2i
y
5
10
9
0
3
27
x2i c0 yi

x1i x2i c1 = x1i yi

2
x2i c2 x2i yi
6
16.5
14
c0 = 5
c1 = 4
c2 = 3
16.5
76.25
48
14 c0 54

48 c1 = 243.5
54 c2 100
Multivariate Fit in MATLAB

c0 + c1x11 + c2x12 + . . . + cpx1p = y1
c0 + c1x21 + c2x22 + . . . + cpx2p = y2
.
.
.
c0 + c1xm1 + c2xm2 + . . . + cpxmp = ym
Overdetermined system of equations: A c = y
x11
x
21
A=

xm1
x12
x22

xm 2

x1 p
x2 p

xmp
1
c0
y1
c
y
1
1
, c = , and y = 2

1
ym
c p
Fit norm
>> c = (A*A)\(A*y)
Fit QR
>> c = A\y
Example:
x1 x2
0
2
2.5
1
4
7
0
1
2
3
6
2
y
5
10
9
0
3
27
>> x1=[0 2 2.5 1 4 7]';

>> x2=[0 1 2 3 6 2]';
>> y=[5 10 9 0 3 27]';
>> A=[x1 x2 ones(size(x1))];
>> c=A\y

Numerical Methods For Engineers

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Numerical Methods For Engineers

Transféré par

Droits d'auteur :

Formats disponibles

Numerical Methods for Civil Engineers

Candidate lines for curve fit

Error Between Model and Observation

Least-Square Fit of a Straight Line

Solve equations simultaneously:

Approximated y for any x is

Example: Fit a straight line to x and y values

= 3.4286 0.8393(4) = 0.0714

How good is our fit?

Sum of the square around the mean:

Standard errors of the estimation:

   

For perfect fit Sr = 0 and r = r2 = 1

Example: error analysis of the linear fit

Linear model explains 86.8% of original uncertainty.

OR Example: error analysis of the linear fit

Since sy/x < sy , linear regression has merit.

S r = (28 22.7 23.52 ) / 28

Linear model explains 86.9% of original uncertainty.

Confidence Interval (CI)

confidence bands enclose the true best-fit linear regression line,

A 100 (1 - ) % confidence interval for yi is given by

Example: to estimate y when x is 3.4 using 95% confidence interval:

y = x + = 0.8363(3.4) + 0.0714 = 2.9148

95% Confidence = 0.05 t/2 = t0.025(df = n-2 = 5) = 2.571

Probability density function of

where B is the beta function and

The formula for the beta function is

In fact, the t distribution with equal to 1 is a Cauchy distribution.

Take derivative with respect to each coefficients:

For mth-order polynomial: y = a0 + a1x + a2 x2 + . . . + amxm

MATLAB polyfit Function

Example: Fit a second-order polynomial to the data.

Simultaneous linear equations

Solving these equation gives a0 = 2.47857, a1 = 2.35929, and

Solving by MATLAB polyfit Function

MATLAB polyval Function

>> [c,s] = polyfit(x,y,2)

Linear Regression Example:

>> [c,s] = polyfit(x,y,1)

Multiple Linear Regression

Setting partial derivatives = 0 and expressing result in matrix form:

x1i x2i c1 = x1i yi

Multivariate Fit in MATLAB

>> x1=[0 2 2.5 1 4 7]';

Vous aimerez peut-être aussi