Partial Least Squares A Tutorial

12/9/2013
Partial least Squares

Multivariate regression
Multiple Linear Regression (MLR)
Principal Component Regression (PCR)
Partial Least Squares (PLS)
Partial Least Squares
Validation
A tutorial
Preprocessing
Lutgarde Buydens
Multivariate Regression
Multivariate Regression
k
Raw data
Raw data
1.5
1.5
0.5
X
0.5
-0.5
2000
4000
6000
8000
10000
12000
14000
-1
W avenumber (cm )
4000
6000
8000
10000
12000
14000
Rows: Cases, observations
W avenumber (cm-1 )
Rows: Cases, observations,
Collums: Variables, Classes, tags
Analytical observations of different samples

Experimental runs
Persons
.
X: Independent variabels (will be always available)
Y: Dependent variables ( to be predicted later from X)
P: Spectral variables
Analytical measurements
Y = f(X) : Predict Y from X

MLR: Multiple Linear Regression
PCR: Principal Component Regression
PLS: Partial Least Sqaures
K: Class information
Concentration,..
From univariate to Multiple Linear Regression (MLR)

y
y= b0 +b1 x1 +
b0 : intercept
b1 : slope
Least squares regression
y= b0 +b1 x1 +
b0 : intercept
b1 : slope
Least squares regression
Collums: Variables, Classes, tags
X: Independent variabels (will be always available)

Y: Dependent variables ( to be predicted later from X)
x
Multiple Linear Regression
y= b0 +b1 x1 + b2x2 + bpxp +
-0.5
2000
^
Y Y E
maximizes
x1
r ( y, y )
x2
12/9/2013
MLR: Multiple Linear

Regression
y= b0 +b1 x1 + b2x2 + bpxp +
Disadavantages: (XTX)-1
^
Y Y E
Uncorrelated X-variables required
n p +1
r(x1,x2) 1
x1
x2
p+1
Ynk = XnpBpk + Enk
b = (XTX)-1XTy
:
:
:
b0
b1
1
1
x1
e
+
x2
bp
1
1


Set A
r(x1,x2) 1
Fits a plane through a line !!
x1
Set B
r(x1,x2) 1
x1
x2
x1
x2
x2
-1.01
-0.99
-1.01
-0.99
-1.89
3.23
3.25
3.23
3.25
10.33
5.49
5.55
5.49
5.55
19.09
0.23
0.21
0.23
0.23
2.19
-2.87
-2.91
-2.87
-2.91
-8.09
3.67
3.76
3.67
3.76
11.29
y= b1 x1 + b2x2 +
x2
MLR
b1
b2
b1
10.3
-6.92
2.96
R2
b2
R2
=0.98

Disadavantages:
x1
yn1 = Xnpbp1 + en1
0.28
=0.98
(XTX)-1
Step 1: Perform PCA on the original X
n p +1
Step 2 : Use the orthogonal PC-scores as independent variables in a MLR model

p
cols
a
cols
PCA
T
X
Step 1
X
a1
a2
MLR
aa
y
Step2
p
n-rows
n-rows
n
Dimension reduction
Variable Selection
Latent variables (PCR, PLS)
Step 3: Calculate b-coefficients from the a-coefficients
b0
n-rows
a1
a2
aa
b1
Step 3
bp
12/9/2013
xp
Step 0 : Meancenter
Step 1: Perform PCA:
X = TPT X* = (TPT)*
Step 2: Perform MLR
Y=TA
PC1
A = (TTT)-1TTY
x1
Step 3 : Calculate B
Y = X* B
Y = (T PT) B
MLR on reconstructed X*= (TPT)*
A = PT B
B = (PPT)-1PA
x2
Dimension reduction:
B = PA
b 0 y y
Calculate b0s
Use scores (projections) on latent variables that explain maximal variance in X
PLS: Partial Least Squares Regression

Phase 1
p
cols
Optimal number of PCs
Phase 2
a
col
a2
PLS
Calculate Crossvalidation RMSE for different # PCs
RMSECV
( y y )
i
MLR
k
cols
a1
aa
n-rows
n-rows
n-rows
a1
k
cols
Phase 3
b0
b1
a1
a2
aa
n-rows

Phase 1 : Calculate new independent variables (T)
Projection to Latent Structure

PCR
xp
PLS
xp
Sequential Algorithm: Latent variables and their scores are calculated sequentially
Step 0: Mean center X
PC1
x1
LV1 (w)
Step 1: Calculate w
Calculate LV1= w1 that maximizes Covariance (X,Y) : SVD on XTY
(XTY)pk = WpaDaa ZTak
w1 = 1st col. of W
x1
xp
x2
Use PC:
Maximizes variance in X
bp
w1
x2
Use LV:
Maximizes covariance (X,y)
= VarX*vary*cor(X,y)
x1
x2
12/9/2013

Phase 1
p
cols
Phase 1 : Calculate new independent variables (T)

Sequential Algorithm: Latent variables and their scores are calculated sequentially
k
cols
a1
a2
PLS
MLR
Step 1: Calculate LV1= w1 that maximizes Covariance (X,Y) : SVD on XTY

(XTY)pk = WpaDaa ZTak
Phase 2
a
col
aa
w1 = 1st col. of W
n-rows
xp
Step 2:
a1
w
Calculate t1, scores (projections) of X on w1
tn1 = Xnpwp1
n-rows
n-rows
k
cols
Phase 3
x1
b0
b1
a1
a2
aa
n-rows
bp
x2
MLR, PCR, PLS:
Optimal number of LVs
Set A
Calculate Crossvalidation RMSE for different # LVs
RMSECV
(y i y i )2
Set B
x1
x2
x1
x2
-1.01
-0.99
-1.01
-0.99
-1.89
3.23
3.25
3.23
3.25
10.33
5.49
5.55
5.49
5.55
19.09
0.23
0.21
0.23
0.23
2.19
-2.87
-2.91
-2.87
-2.91
-8.09
3.67
3.76
3.67
3.76
11.29
y= b1 x1 + b2x2 +
VALIDATION
b1
b2
b1
b2
MLR
10.3
-6.92
2.96
0.28
PCR
1.60
1.62
1.60
1.62
PLS
1.60
1.62
1.60
1.62
Common measure for prediction error
Estimating prediction error.

Basic Principle:
test how well your model works with new data,
it has not seen yet!
12/9/2013
A Biased Approach
Validation: Basic Principle

Basic Principle:
Prediction error of the samples the model was built on

test how well your model works with new data, it has not
seen yet!
Error is biased!
Samples also used to build the model
Split data in training and test set.
model is biased towards accurate prediction of these

specific samples
Several ways:
One large test set
Leave one out and repeat: LOO
Leave n objects out and repeat: LNO
...
Apply entire model procedure on the test set
Validation
Training and test sets

Split in training and test set.
Test set should be
representative of training set
Random choice is often the
best
Check for extremely unlucky
divisions
Apply whole procedure on the
test and validation sets
b0
Training
set
Build model :
bp
Full data
set
Test
set
RMSEP
Remark: for final model use whole data set.
Cross-validation
Cross-validation: an example
The data
Most simple case: Leave-One-Out (=LOO, segment=1

sample). Normally 10-20% out (=LnO).
12/9/2013
Split data into training set and validation set
Split data into training set and test set
Build a model on the training set
Split data again into training set and valid. set
Until all samples have been in the validation set once
Common: Leave-One-Out (LOO)
12/9/2013




Cross-validation: a warning
Cross-validation: a warning
Data: 13 x 5 = 65 NIR spectra (1102 wavelengths)

13 samples: different composition of NaOH, NaOCl and Na2CO3
5 temperatures: each sample measured at 5 temperatures
The data
1102
1
Composit
ion
NaOH (wt%)
NaOCl
(wt%)
Na2CO3 (wt%)
18.99
15
21
27
34
40
9.15
9.99
0.15
15
21
27
34
40
15.01
4.01
15
21
27
34
40
9.34
5.96
3.97
15
21
27
34
40
13
13
16.02
2.01
1.00
15
21 27
34
40
Temperature (C)
2
y
65
65
Leave SAMPLE out
12/9/2013
Validation
Selection of number of LVs

Training
Set
Trough Validation:
2) Build model : b
0
Choose number of LVs that results in model with

lowest prediction errror
Testset to assess final model cannot be used !
1) determine #LVs : wit test set
Full data
set
Test
set
Divide trainingset
Crossvalidation
Test
set
bp
RMSEP
Double Cross Validation
CV2
Double cross-validation
1) determine #LVs : CV Innerloop
The data
2) Build model : CV Outer loop

b0
Full data
set
Training
setC
CV
1
bp
RMSEP
Remark: for final model use whole data set Skip.
Used later to assess model performance!
12/9/2013
1LV
2LV
3LV
Apply crossvalidation on the rest: Split training set into

(new) training set and test set
1LV
2LV
3LV
1LV
2LV
3LV
Lowest RMSECV
12/9/2013
Repeat procedure
Repeat procedure
PLS: an example
In this way:
Raw + meancentered data
The number of LVs is determined by using samples not

used to build the model with
Raw data
Meancentered data
0.3
0.25
1.5
0.2
Absorbance (a.u.)
0.15
Absorbance (a.u.)
The prediction error is also determined using samples the

model has not seen before
0.5
0.1
0.05
0
-0.05
-0.1
-0.15

-0.5
2000
4000
6000
8000
10000
12000
-0.2
2000
14000
Wavenumber (cm-1)
RMSECV vs. No of LVs
4000
6000
8000
10000
12000
14000
Wavenumber (cm-1)
Regression coeffficients
Raw data
Absorbance (a.u.)
RMSECV values for prediction of NaOH

0.7
0.6
-0.5
3000
0.4
4000
5000
6000
7000
8000
9000 10000 11000 12000 13000
Wavenumber (cm-1)
0.3
10
0.2
0.1
5
6
7
Number of LVs
10
Regression coefficient
RMSECV
1
0.5
0
0.5
1.5
8
6
4
2
0
-2
3000
4000
5000
6000
7000
8000
9000 10000 11000 12000 13000
Wavenumber (cm-1)
10
12/9/2013
Why Pre-Processing ?
True vs. predicted
Data Artefacts
3
Original spectrum
True values vs. predictions
18
NaOH, predicted
16
14
12
Baseline correction
Alignment
Scatter correction
Noise removal
Scaling, Normalisation
Transformation
..
2.5
10
12
14
NaOH, true
16
original
0.8
0.6
0.7
0.5
0.6
0.4
0.3
0.2
0.1
18
Other
20
1400
1600
0.4
0.8
0.3
0.6
200
400
600
800
1000 1200
Wavelength (a.u.)
1400
1600
0.6
offset+slope
0.6
0.5
0.4
0.4
0.3
200
400
0
600 800 1000 1200 1400 16000
Wavelength (a.u.)
Pre-Processing Methods
STEP 2:
(10x) SCATTER
STEP 3:
(10x) NOISE
STEP 4:
(7x) SCALING &
TRANSFORMATION
S
Meancentering
No baseline correction
No scatter correction
No noise removal
(3x) Detrending
polynomial order
(2-3-4)
(4x) scaling: Mean

Median Max L2 norm
(9x) S-G smoothing

(window: 5-9-11 pt)
(order: 2-3-4)
(2x) Derivatisation
(1st 2nd )
SNV
Pareto scaling
(3x) RNV (15, 25, 35)%
Poisson scaling
AsLS
MSC
Level scaling
400
600
800
1000
Wavelength (a.u.)
1200
1400
1600
Pre-Processing Results
Complexity of the model : no of LV
Classification Accuracy
Raw Data
Autoscaling
Range scaling
Log transformation
Supervised pre-processing methods

No noise removal
200
200 400 600 800 1000 1200 1400 1600

Wavelength (a.u.)
4914 combinations: all reasonable
OSC
DOSC
0.2
0.1
200 400 600 800 1000 1200 1400 160000

Wavelength (a.u.)
STEP 1:
(7x) BASELINE
0.3
0.1
0.2
0.2
0.1
0.1
0.4
0.3
0.3
0.2
multiplicative
0.6
0.5
0.5
0.5
0.4
Intensity (a.u.)
Intensity (a.u.)
Intensity (a.u.)
offset
0.7
0.7
0.8
0.7
original
offset
offset+slope
multiplicative
offset + slope + multiplicative
0.7
0.1
0.8
0.8
2500
original
offset
offset+slope
multiplicative
offset + slope + multiplicative
Intensity (a.u.)
600
800 1000 1200
Wavelength (a.u.)
2000
Meancentering
Autoscaling
Range scaling
Pareto scaling
Poisson scaling
Level scaling
Log scaling
Complexity of the model (no of LV)
400
1500
Wavelength (cm-1)
0.2
200
1000
0.5
0
0
00
500
Missing values
Outliers
0.7
0
0
Intensity (a.u.)
Intensity (a.u.)
0.8
1.5
0.5
10
Offset
Slope
Scatter
2
Intensity (a.u)
20
J. Engel et al. TrAC 2013
Classification accuracy %
11
12/9/2013
SOFTWARE
PLS Toolbox (Eigenvector Inc.)
www.eigenvector.com
For use in MATLAB (or standalone!)
XLSTAT-PLS (XLSTAT)
www.xlstat.com
For use in Microsoft Excel
Package pls for R

Free software
http://cran.r-project.org
12

Partial Least Squares A Tutorial

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Partial Least Squares A Tutorial

Transféré par

Droits d'auteur :

Formats disponibles

12/9/2013

Partial least Squares

Partial Least Squares

Rows: Cases, observations

Rows: Cases, observations,

Collums: Variables, Classes, tags

Analytical observations of different samples

Y = f(X) : Predict Y from X

MLR: Multiple Linear Regression

From univariate to Multiple Linear Regression (MLR)

Least squares regression

Least squares regression

Collums: Variables, Classes, tags

X: Independent variabels (will be always available)

y= b0 +b1 x1 + b2x2 + bpxp +

MLR: Multiple Linear

MLR: Multiple Linear Regression

Uncorrelated X-variables required

Ynk = XnpBpk + Enk

MLR: Multiple Linear Regression

MLR: Multiple Linear Regression

Uncorrelated X-variables required

Uncorrelated X-variables required

Fits a plane through a line !!

MLR: Multiple Linear Regression

yn1 = Xnpbp1 + en1

PCR: Principal Component Regression

Uncorrelated X-variables required

Step 1: Perform PCA on the original X

Step 2 : Use the orthogonal PC-scores as independent variables in a MLR model

Step 3: Calculate b-coefficients from the a-coefficients

PCR: Principal Component Regression

PCR: Principal Component Regression

Step 1: Perform PCA:

Step 2: Perform MLR

MLR on reconstructed X*= (TPT)*

Use scores (projections) on latent variables that explain maximal variance in X

PCR: Principal Component Regression

PLS: Partial Least Squares Regression

Optimal number of PCs

PLS: Partial Least Squares Regression

PLS: Partial Least Squares Regression

Projection to Latent Structure

PLS: Partial Least Squares Regression

PLS: Partial Least Squares Regression

Phase 1 : Calculate new independent variables (T)

Step 1: Calculate LV1= w1 that maximizes Covariance (X,Y) : SVD on XTY

Calculate t1, scores (projections) of X on w1

PLS: Partial Least Squares Regression

MLR, PCR, PLS:

Optimal number of LVs

Calculate Crossvalidation RMSE for different # LVs

Common measure for prediction error

Estimating prediction error.

Validation: Basic Principle

Prediction error of the samples the model was built on

Split data in training and test set.

model is biased towards accurate prediction of these

Training and test sets

Remark: for final model use whole data set.

Most simple case: Leave-One-Out (=LOO, segment=1

Build a model on the training set

MLR on reconstructed X= (TPT)