Vous êtes sur la page 1sur 14

Regression methods

Linear regression
Y = m X + b
A linear relationship is assumed to exist
between to factors.
This was already discussed in an earlier unit.
Regression methods
Multiple linear regression
Y = m
1
X
1
+ m
2
X
2
+ ... m
n
X
n
+ e
This is a linear regression t that is extended to
several variables.
It is useful when several factors contribute to the
overall observed response.
Multivariate calibration
Typically, a multivariate method implies that
you have multiple X (independent) and
multiple Y (dependent) variable.
We will outline three multivariate
approaches to creating a calibration curve.
Ordinary Least Squares (OLS)
Principal component regression (PCR)
Partial least squares regression (PLS)
While each optimizes the t of your data
differently, method evaluation, optimization
and the results are often the same.
OLS
With traditional linear and multiple linear regression,
were limited to a single Y (dependent) variable.
OLS (also called a general linear model - GLM) can
be seen as an extension of this approach. You have
a Y matrix instead of a Y vector.
Mathematically, the matrix formulations for MLS and
OLS (GLM) are the identical - except for allowing for
a Y matrix. Basically a combination of MLS and
simultaneous equations.
XLStat will handle either approach - based on the
number of Y variables you give it.
OLS

One limit with OLS is that you need more


observations than X variables and more X
variables than Y variables.

Results can be irratic if you have variables


that are:

Collinear (ones with a high degree of


linear correlation.)

Invariate (ones that dont vary much.)

Can try to remove all invariate and all but


one collinear (in a block) and hope for the
best (XLStat will do this.)

For these reasons, OLS is not as commonly


for multi-Y type problems (compared to PCR
and PLS).
Principal component regression
This is a simple extension of OLS.
It is assumed that each member of your set can
be assigned a quantitative class value.
First, generate a PCA model for your data.
Using the PCA scores, conduct a multiple
linear regression where your Y values are the
quantitative class values.
Principal component
regression
Raw or
scaled
data
Residual
(noise)
PCA
scores
PCA
scores
slopes
and
multiple
linear
regression
intercepts
OLS
Principal component regression
Advantages of PCR over OLR
Noise remains in the residual.
Fewer variables to work with.
Obtain PCA information as well.
You can use just the components that relate
to the trend of interest.
Limits of PCR
It assumes that your data array is valid for
predicting Y values.
It must contain no errors beyond noise.
The rst PC(s) may or may not actually
related to any of the Y components.
Partial least squares regression
PLS modeling relies on a simultaneous t
of both an independent and dependent
matrix.
The objective is to derive latent variables
that are similar to principal components.
Major difference is the attempt to
minimize the variance of both arrays.
Called PLS1 for a Y vector and PLS2
where there is a Y matrix.
Partial least squares regression
With PLS, the goal it to extract the latent variables by
using the X array to properly align the Y array (or vector)
and then reversing the process.
Y
X
q
w
t
u
Partial least squares regression
Each factor to be determined should end up
with a different PC set.
It may require a different number of
components to adequately model each
quantitative variable.
The approach insures that the best t is obtain
for all variables - which is both good and bad.
Considered best approach when the number of
variables is high and correlated variables are
likely (basically the opposite of OLS).
Validation
All methods require validation.
Again, cross validation is one of the best
approach. (leave one out method)
It permits you to determine a prediction
error sum of squares (PRESS) or the root-
mean-square value of prediction error
(RMSPE)
Tracking the PRESS value will tell you the
optimum number of components to use.
PRESS
S
t
a
n
d
a
r
d

E
r
r
o
r
Number of components
Optimum number
of components
PRESS = (yi -yV
i
/ )
2
RMSPE =
nT
PRESS
b l
2
Goodness of t.

There are several measures of model


quality beyond the use of PRESS.

Whats used is based on both the type of


model (OLS/PCR or PLS) and the software
used.

XLStat can produce a huge amount of


information to evaluate.

Well look at only a few of the model


quality indexes.
OLS/PCR
These two methods have similar measures of model
quality since both rely on an OLS t.
Coefcient of determination of the model (R
2
)

Values between 0 and 1 (1 is best).

Interpreted as the proportion of variability of the


dependent variable(s) explained by the model.
R
2
=1 -
wi (yi -y
i
i =1
n
/ )
2
wi (yi -yV
i
i =1
n
/ )
2
where yi =
n
1
wi yi
i =1
n
/
Adjusted R
2
Takes into account the number of variables used in
the model.
Can be a negative value if R is small.
W = sum of the wi weights
p = number of x variables
OLS/PCR
adj R
2
=1 -(1 -R
2
)
W-p -1
W-1
Root Mean Square of Errors (RMSE)
OLS/PCR
PRESS RMSE
is the prediction of the ith obsevation when its
not included when building the model. A large
difference between RMSE and PRESS RMCE
indicates that the model is sensitive to the presence
(or absence) of some observations - outliers.
RMSE =
W-p
1
wi (yi -
i =1
n
/ yV
i
)
2
yV
i(-1)
PRESSRMSE =
W-p
wi (yi -y
^
i(-1)
i =1
n
/ )
2
PLS model quality

Besides the regression information, you have


additional measures that can be used.
Q
2
cum(h) index.
Measures global contribution of the h rst
components as to predictive quality of the model.
It involves the cross-validation PRESS and the model
Sum of Squares of Errors with one less component.
The max Q
2
cum index represents the most stable
model.
Qcum(h)
2
=1 -
SCEk(j -1)
k=1
q
/
PRESSkj
k=1
q
/
j =1
h
/
PLS model quality

QR
2
Ycum index.
Sum of the coefcients of determination (R
2
)
between the dependent variables and the h rst
components for the dependent variables.
QR
2
Xcum index.
Sum of the coefcients of determination (R
2
)
between the independent variables and the h
rst components for the independent variables.
These are similar to the Q
2
cum(h) index - but only
for one of the blocks of data.
Note: other programs will either a) call these
different things or b) use different measures.
Octane number
Rating Octane of Gasoline using Near IR.
ASTM method is complex and expensive.
A simple method would be more desirable.
Experimental
Unleaded gasoline samples were assayed
by the ASTM method.
NIR spectra (900-1600 nm) were obtained.
OLR, PCR and PLS models were studied.
X matrix - spectra at 20 nm intervals.
ASTM octane number by Research method
was used as the Y matrix (vector).
Octane Number
NIR spectra
A 915 nm, CH
2
stretch
B 1021 nm, CH
2
/CH
3
combination band
C 1151 nm, aromatic
and CH
3
stretch
D 1194 nm, CH
3
stretch
E 1394 nm, CH
2

combination bands
F 1412 nm aromatic &
CH
2
combination bands
G 1435 nm aromatic &
CH
2
combination bands
A
B
C
D
E
F
G
Octane Number - OLS
High R
2
value and
RMSE and Press
RMSE are similar.
Octane Number
Only 9 variables ended up being used in
building the OLS model.
Octane Number, OLS
!"
!#
!$
!%
%&
%"
!" !# !$ !% %& %"
!"#$%&'#$()&'*+#(,
)
&
'
*
+
#
(
,
'()*+,
-./*0.)*12
OLS residual.
!"#$%&'(')'*#$%+$,+-.&+',&/-+0$1/
!"#$
!"
!%#$
!%
!&#$
&
&#$
%
%#$
"
'( ') '$ '* '+ '' ', ,& ,% ," ,(
!"#$%&'(
*
#
$
%
+
$
,
+
-
.
&
+
'
,
&
/
-
+
0
$
1
/
Octane Number, PCR
PCR
OLS
Note that you get a small
improvement in the t
with PCR. Might have a
problem with outliers
Also, you have a larger
number of degrees of
freedom since all of the
original variables were
used. With OLS, most
were discarded.
Octane Number, PCR
!
"
#!
#"
$!
$"
%!
%"
&# &$ &% &' &" &(
!"#$"%&%'
(
)
*
&
%
+
,
-
.
&
!
$!
'!
(!
)!
#!!
!
.
#
.
-
,
'
)
+
&
/+
,
0
)
,
1
)
-
)
'
2
/3
4
5
Almost 90% of the
variance is captured in
the rst component.
Over 99% in the rst
three.
Scores plot - potential outliers
!"#$
!"
!"#% !&#"
!'#&
!'#"
!&#$
!"#(
!'#)
&%#"
!"#%
!"#&
!$#"
!$#"
!"#)
&%#"
&%#*
!&#'
&%#"
!'
!"#%
!"#$
!$#(
!"#%
&%#"
!'#*
&*#)
!"#)
&%#'
!"#%
&%#&
&%#!
!"#(
!(#!
!(#$
!(#&
!(#&
!"#%
!$#"
!"
!(#!
!"#'
&%#!
&%#&
!'#)
!$#&
!$#&
!"#$
&*#%
&%#"
!"#$
!(#&
&*#*
!"#!
!$#"
!$#&
!"#*
!"#'
&%#&
+)
,
)
+%, +) , ) %, %)
!"#$%&'()#*+
!
(
#$
"
"
')
,
#*
+
-./012 345064/078
Octane Number,
PCR
!"
!#
!$
!%
%&
%"
!" !# !$ !% %& %"
!"#$%&'#$ )&'*+# ,
)
&
'
*
+
#
(,
'()*+, -./*0.)*12
!"#$%& ( )'*#$%+$,+-.&+',&/-+0$1/
!"
!#
!$
!%
&
%
$
#
"
'# '( ') '* *% *#
!"#$%& (
*
#
$
%
+
$
,
+
-.
&
+
',
&
/
-+
0
$
1/
Octane Number, PLS
!"#$%&'()%*+,&-,&.(/-$0 "1 2"/3".$.+4
!
!"#
!"$
!"%
!"&
'
' # ( $ )
5"/3".$.+4
6
.
#
$
7
Octane Number, PCR
PCR
OLS
PLS
Correlation plot
!""
!#"
!$"
!%"
!&"
'""" '"#"
'"$"
'"%"
'"&"
''""
''#"
''$"
''%"
''&"
'#""
'##"
'#$"
'#%"
'#&"
'(""
'(#"
'($"
'(%"
'(&"
'$""
'$#"
'$$"
'$%"
'$&"
')""
')#"
')$"
')%"
')&"
'%""
*+,-./01
2'
2"34)
2"3)
2"3#)
"
"3#)
"3)
"34)
'
2' 2"34) 2"3) 2"3#) " "3#) "3) "34) '
!"
!
#
5
6
Similar to
a scores
plot
Octane Number, PLS
!"
!#
!$
!%
%&
%"
!" !# !$ !% %& %"
!"#$%&'#$ )&'*+# ,
)
&
'
*
+
#
(,
'()*+,
-./*0.)*12
!"#$%& ( )'*#$%+$,+-.&+',&/-+0$1/
!"#$
!"
!%#$
!%
!&#$
&
&#$
%
%#$
"
'( '$ ') '* *% *(
!"#$%& (
*
#$
%
+
$
,
+
-.
&
+
',
&
/
-+
0
$
1/
+,-./0 123.42-.56
Octane number
For this data set, there is no signicant
difference for OLR, PCR and PLS calibration
models.
Each produces a comparable t of the data
and has about the same level of residual
error.
Lets look at another example - this time with
multiple dependent (Y) variables.
X-Ray uorescence example
A series of nickel alloys were assayed by X-
ray uorescence. Four of the elements are
known to have specic spectral features that
allow prediction. There are a total of 15
samples.
! Elements present
! ! Si, Mn, Ni, Cr, Mo, Ti and Fe.
(bold = with spectral features)
OLS, PCR and PLS models were built and
compared.
Sample spectra
4
0 50 100 150 200 250 300
0
1
2
3
4
5
6 x 10
Variable
A
b
s
o
r
b
a
n
c
e
OLS
Due to the limited number of samples,
there are too many X points/sample.
By creating replicate copies of the samples
(in triplicate), an initial OLS was conducted
to determine which X points would have
been automatically eliminated.
This left 8 X points/sample.
OLS
Model Quality
Summary.
Si
Pred(% Si) / % Si
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Pred(% Si)
%

S
i
% Si / Standardized residuals
-1.5
-1
-0.5
0
0.5
1
1.5
0 0.5 1
% Si
S
t
a
n
d
a
r
d
i
z
e
d

r
e
s
i
d
u
a
l
s
Mn
Pred(% Mn) / % Mn
0
0.5
1
1.5
2
2.5
0 0.5 1 1.5 2 2.5
Pred(% Mn)
%

M
n
% Mn / Standardized residuals
-1.5
-1
-0.5
0
0.5
1
0 0.5 1 1.5 2 2.5
% Mn
S
t
a
n
d
a
r
d
i
z
e
d

r
e
s
i
d
u
a
l
s
Ni
Pred(% Ni) / % Ni
0
5
10
15
20
0 5 10 15 20
Pred(% Ni)
%

N
i
% Ni / Standardized residuals
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 5 10 15 20
% Ni
S
t
a
n
d
a
r
d
i
z
e
d

r
e
s
i
d
u
a
l
s
Cr
Pred(% Cr) / % Cr
5
7
9
11
13
15
17
19
21
5 7 9 11 13 15 17 19 21
Pred(% Cr)
%

C
r
% Cr / Standardized residuals
-1.5
-1
-0.5
0
0.5
1
5 10 15 20
% Cr
S
t
a
n
d
a
r
d
i
z
e
d

r
e
s
i
d
u
a
l
s
Pred(% Mo) / % Mo
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Pred(% Mo)
%

M
o
% Mo / Standardized residuals
-1.5
-1
-0.5
0
0.5
1
1.5
0 1 2 3 4 5
% Mo
S
t
a
n
d
a
r
d
i
z
e
d

r
e
s
i
d
u
a
l
s
Ti
Pred(% Ti) / % Ti
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1
Pred(% Ti)
%

T
i
% Ti / Standardized residuals
-1
-0.5
0
0.5
1
1.5
0 0.2 0.4 0.6 0.8 1
% Ti
S
t
a
n
d
a
r
d
i
z
e
d

r
e
s
i
d
u
a
l
s
Fe
60
65
70
75
80
85
90
60 65 70 75 80 85 90
Pred(% Fe)
%

F
e
-1.5
-1
-0.5
0
0.5
1
1.5
60 65 70 75 80 85 90
% Fe
S
t
a
n
d
a
r
d
i
z
e
d

r
e
s
i
d
u
a
l
s
PCR gives higher R
2
values but look at the Press RMSE
OLS results
!
"
#
$
%
&!
&"
'& '" '( '# ') '$ '* '% '+ '&! '&& '&" '&( '&#
!"#$"%&%'
(
)
*
&
%
+
,
-
.
&
!
"!
#!
$!
%!
&!!
!
.
#
.
-
,
'
)
+
&
/
+
,
0
)
,
1
)
-
)
'
2
/
3
4
5
Si
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.3
0.6
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Sl
S
L
a
n
d
a
r
d
lz
e
d
r
e
s
ld
u
a
ls
!
!"#
!"$
!"%
!"&
'
'"#
'"$
! !"# !"$ !"% !"& ' '"# '"$
()*+,-.*+/0/1,
0
/
1
,
Mn
!
!"#
$
$"#
%
%"#
! !"# $ $"# % %"#
&'()*+,()-.-/0
.
-
/
0
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.3
0.6
0 0.3 1 1.3 2 2.3
Mn
S
L
a
n
d
a
r
d
lz
e
d
r
e
s
ld
u
a
ls
Ni
!
"
#
$
%
&!
&"
&#
&$
! " # $ % &! &" &# &$
'()*+,-)*./.0+
/
.
0
+
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.3
0.6
0 3 10 13
nl
S
L
a
n
d
a
r
d
l
z
e
d

r
e
s
l
d
u
a
l
s
Cr
!
"
##
#$
#%
#!
#"
&#
! " ## #$ #% #! #" &#
'()*+,-)*./.0(
/
.
0
(
-0.6
-0.3
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
7 9 11 13 13 17 19 21
Cr
S
L
a
n
d
a
r
d
lz
e
d
r
e
s
ld
u
a
ls
Mo
!"#$
"#$
%#$
&#$
'#$
(#$
$#$
!"#$ "#$ %#$ &#$ '#$ (#$ $#$
)*+,-./+,01023
1
0
2
3
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.3
0.6
0 1 2 3 4
Mo
S
L
a
n
d
a
r
d
lz
e
d
r
e
s
ld
u
a
ls
Ti
!"#$
"
"#$
"#%
"#&
"#'
(
(#$
!"#$ " "#$ "#% "#& "#' ( (#$
)*+,-./+,0102-
1
0
2
-
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.3
0.6
0 0.2 0.4 0.6 0.8 1
1l
S
L
a
n
d
a
r
d
l
z
e
d

r
e
s
l
d
u
a
l
s
Fe
!"
!#
$"
$#
%"
%#
&"
!" !# $" $# %" %# &"
'()*+,-)*./.0)
/
.
0
)
-0.6
-0.3
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
60 63 70 73 80 83 90
le
S
L
a
n
d
a
r
d
l
z
e
d

r
e
s
l
d
u
a
l
s
PLS
!"#$%&'()%*+,&-,&.(/-$0&"1&2"/3".$.+4
5
567
568
569
56:
56;
56<
56=
56>
56?
7
7 8 9 : ; < = > ? 75 77 78 79
@"/3".$.+4
A
.
#
$
B
The Q
2
pattern indicates that most of the information if brought out in the rst
few components but then noise is brought out to nally nd a way to t the
non-spectral species.
PLS
PLS gives an even better t - but not a
huge improvement compared to PCR.
Si
!
!"#
!"$
!"%
!"&
'
'"#
'"$
! !"# !"$ !"% !"& ' '"# '"$
()*+,-.*+/0/1,
0
/
1
,
-0.3
-0.3
-0.1
0.1
0.3
0.3
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Sl
S
L
a
n
d
a
r
d
lz
e
d
r
e
s
ld
u
a
ls
Mn
!
!"#
$
$"#
%
%"#
! !"# $ $"# % %"#
&'()*+,()-.-/0
.
-
/
0
-0.3
-0.3
-0.1
0.1
0.3
0.3
0 0.3 1 1.3 2 2.3
Mn
S
L
a
n
d
a
r
d
lz
e
d
r
e
s
ld
u
a
ls
Ni
!
"
#
$
%
&!
&"
&#
&$
! " # $ % &! &" &# &$
'()*+,-)*./.0+
/
.
0
+
-0.3
-0.3
-0.1
0.1
0.3
0.3
0 3 10 13
nl
S
L
a
n
d
a
r
d
lz
e
d
r
e
s
ld
u
a
ls
Cr
!
"
##
#$
#%
#!
#"
&#
! " ## #$ #% #! #" &#
'()*+,-)*./.0(
/
.
0
(
-0.6
-0.4
-0.2
0
0.2
0.4
7 9 11 13 13 17 19 21
Cr
S
L
a
n
d
a
r
d
lz
e
d
r
e
s
ld
u
a
ls
!"#$
"
"#$
"#%
"#&
"#'
(
(#$
!"#$ " "#$ "#% "#& "#' ( (#$
)*+,-./+, 102-
1
02
-
!"#$
"#$
%#$
&#$
'#$
(#$
$#$
!"#$ "#$ %#$ &#$ '#$ (#$ $#$
)*+,-./+, 1023
1
02
3
-0.3
-0.3
-0.1
0.1
0.3
0.3
0 1 2 3 4
Mo
S
La
n
d
a
rd
lze
d
re
sld
u
a
ls
-0.3
-0.3
-0.1
0.1
0.3
0.3
0 0.2 0.4 0.6 0.8 1
1l
S
La
n
d
a
rd
lze
d
re
sld
u
a
ls
Mo
and Ti
Fantastic ts
considering
that there is
NO data to
support
them.
Fe Results
!"
!#
$"
$#
%"
%#
&"
!" !# $" $# %" %# &"
'()*+,-)*./.0)
/
.
0
)
-0.6
-0.4
-0.2
0
0.2
0.4
60 63 70 73 80 83 90
le
S
La
n
d
a
r
d
lz
e
d
r
e
s
ld
u
a
ls
Summary
Although we were able to develop models
that appeared to be able to predict the
amounts of all seven species, there is actually
only real information about four of them.
The PCR and PLS modes will produce a t
regardless of noise, lack of a positive
response, .
Care must be taken to ensure that your data
set contains real information about all of the
components.
One last example

30 different hydrocarbon blends were


assayed by UV/Vis-NIR.

Each blend had known levels of isooctane,


toluene and decane but also contained
other hydrocarbons at unknown levels.

Each blend was measured using two


different UV/Vis-NIR instruments.

Because we have more X variables than


samples, we cant use OLS.
4
7
0
4
9
0
5
1
0
5
3
0
5
5
0
5
7
0
5
9
0
6
1
0
6
3
0
6
5
0
6
7
0
6
9
0
7
1
0
7
3
0
7
5
0
7
7
0
7
9
0
8
1
0
8
3
0
8
5
0
8
7
0
8
9
0
9
1
0
9
3
0
9
5
0
9
7
0
9
9
0
1
0
1
0
1
0
3
0
1
0
5
0
1
0
7
0
1
0
9
0
The data
!"#$$%&'()
!
"!
#!
$!
%!
&!
'!
(!
)" )% )( )"! )"$ )"' )"* )## )#& )#+ )$" )$% )$( )%! )%$ )%' )%* )&# )&& )&+
*+,-
.
,
/
$
0
1
*
'
2
$
!
#!
%!
'!
+!
"!!
3
2
4
2
'
*
)
,
1
$
%
1
*
#
,
*
5
,
'
,
)
6
%
7
8
9
Almost all of the
variance was in the
rst component.
PCR PCR
Near perfect ts but the Press RMSE indicates
that the model is subject to outliers.
!
"
! "
! "
!
"
!
"
! "
! "
!
"
!
"
!
"
!
"
!
"
!
"
!
"
!
"
!
"
!
"
!
"
!
"
!
"
!
"
!
"
!
"
!
"
!
"
!
"
!
"
!
"
!
"
!
"
#!$
#!$$ #%$ $ %$ !$$
!"#$%&'%"#()
!
&
#$*
'&
+
#(
)
The rst PC contains mostly information about
instrument type. PCR discarded that PC when
building the models.
Model parameters show that the rst component was
given very little weight.
!"#$%&'()**+,-.#/'0'&'()**+,-.#
!
"
#!
#"
$!
$"
%!
%"
! " #! #" $! $" %! %"
!"#$%&'()**+,-.#/
&
'
(
)
*
*
+
,
-
.
#
! #$%%&'()* +",'()-(.-/0*-".*$/-1(2$
!"#$
!"#%
!"#&
!"#'
"
"#'
"#&
"#%
"#$
"#(
" ( '" '( &" &( %" %(
! #$%%&'()*
,
'
(
)
-
(
.
-
/0
*
-
".
*
$
/-
1
(
2$
!"#$%&'()*+#,#-'.'&'()*+#,#
!
"!
#!
$!
%!
&!
'!
! "! #! $! %! &! '!
!"#$%&'()*+#,#-
&
'
(
)
*
+
#
,
#
! #$%&'(' )"*+,(-,.-/0'-".'1/-&,%1
!"#$
!"#%
!"#&
!"#'
"
"#'
"#&
"#%
"#$
"#(
" '" &" %" $" (" )"
! #$%&'('
*
+
,
(
-
,
.
-
/0
'
-
".
'
1
/-
&
,
%1
!"#$%&'(#)*+#,'-'&'(#)*+#
!
"
#!
#"
$!
$"
%!
%"
! " #! #" $! $" %! %"
!"#$%&'(#)*+#,
&
'
(
#
)
*
+
#
! #$%&'$ (")*&'+&,+-.$+",$/-+0&1/
!"#$
!"#%
!"#&
!"#'
"
"#'
"#&
"#%
"#$
"#(
" ( '" '( &" &( %" %(
! #$%&'$
)
*
&
'
+
&
,
+
-.
$
+
",
$
/
-+
0
&
1/
!"#$%&'()%*+,&-,&.(/-$0&"1&2"/3".$.+4
!
!"#
!"$
!"%
!"&
!"'
!"(
!")
!"*
!"+
#
# $ %
5"/3".$.+4
6
.
#
$
7
PLS has a harder time with this dataset. Basically
because of the variations due to having two
instruments.
PLS insists on using that as latent variable.
!"
#"
!"
#"
!"
#"
!"
#"
!"
#"
!"
#"
!"
#"
!"
#"
!"
#" !"
#"
!"
#"
!"
#"
!"
#"
!"
#"
!"
#"
!"
#"
!"
#"
!"
#"
!"
#"
!"
#"
!"
#"
!"
#"
!"
#"
!"
#"
!"
#"
!"
#"
!"
#"
!"
#"
!"
#"
!"
#"
$%"
&"
%"
$!&" $%" &" %" !&"
!
"
#
!$#
%&'()*+,-.'#-.#+/('#!$#+.0#!"#
Plot of rst and second latent variables.
!"#$%&'()**+,-.#/'0'&'()**+,-.#
!
"
#!
#"
$!
$"
%!
%"
! " #! #" $! $" %! %"
!"#$%&'()**+,-.#/
&
'
(
)
*
*
+
,
-
.
#
! #$%%&'()* +",'()-(.-/0*-".*$/-1(2$
!"#$
!"
!%#$
!%
!&#$
&
&#$
%
%#$
"
& $ %& %$ "& "$ '& '$
! #$%%&'()*
,
'
(
)
-
(
.
-
/0
*
-
".
*
$
/-
1
(
2$
!"#$%&'()*+#,#-'.'&'()*+#,#
!
"!
#!
$!
%!
&!
'!
! "! #! $! %! &! '!
!"#$%&'()*+#,#-
&
'
(
)
*
+
#
,
#
! #$%&'(' )"*+,(-,.-/0'-".'1/-&,%1
!"
!#$%
!#
!&$%
&
&$%
#
#$%
"
& #& "& '& (& %& )&
! #$%&'('
*
+
,
(
-
,
.
-
/0
'
-
".
'
1
/-
&
,
%1
!"#$%&'(#)*+#,'-'&'(#)*+#
!"
#
"
$#
$"
%#
%"
&#
&"
!" # " $# $" %# %" &# &"
!"#$%&'(#)*+#,
&
'
(
#
)
*
+
#
! #$%&'$ (")*&'+&,+-.$+",$/-+0&1/
!"#$
!"
!%#$
!%
!&#$
&
&#$
%
%#$
"
& $ %& %$ "& "$ '& '$
! #$%&'$
)
*
&
'
+
&
,
+
-.
$
+
",
$
/
-+
0
&
1/
!"#$%&'()*+*$,(%(
-./012 3).4'%#-2567897:5
!
!"#
$
$"#
%
%"#
$ & # ' ( $$ $& $# $' $( %$ %& %# %' %( &$ && &# &' &( )$ )& )# )' )( #$ #& ## #' #(
!;(&'<*#%0+(
=
#*
+
1
*
'1
%>
&
1
)1
/
0
1
2
An outlier analysis indicates that over half of the
observations are considered outliers. Not a good
model. PCR offers a better choice this time.
Continuum
Regression
Continuum Regression (CR)
A recent attempt to unify PCR, PLS and OLR into a
single technique.
It is a continuously adjustable technique that uses
PLS as its base and includes PCR and OLR at the
opposite ends of the continuum.
PCR
CR parameter =
PLS
CR parameter = 1
MLR
CR parameter = o
Continuum regression

Vous aimerez peut-être aussi