Académique Documents
Professionnel Documents
Culture Documents
Factor analysis dates back to 1930s. It was originally used in psychology to study intelligence. Attempts were made to relate test results to other factors. Premise was that X = S L + E where ! X = test performance L = intrinsic intelligence factors S = individual scoree E = residual error
Covariance matrix =
1 Xl T Xl NV - 1
It is then diagonalized - eigenvector rotation Typically, the largest eigenvectors (based on the size of the eigenvalues) are the most important.
r = COVAR Sx *Sy
Principal component (PC). A linear combination of related variables. It represents an intrinsic factor of your data. Scores. The projection of your data in to PC space. Loading. Show the relative signicance of the original variables. Residual. The data that could not be correlated -- typically random noise.
Varimax rotation
EV1 EV2
Variable 3
A secondary tweaking of the PCs to help better observe relationships. It is essentially a secondary rotation of your data in an attempt to lump all variance from individual variables in to single components. It can often help you to better understand the effects of your original data.
a ari
ble
Variab
le 1
Varimax rotation
Assume we have 5 original variables and are only interested in the rst 3 PC.
PC1 1 2 3 4 5 % variance % variance % variance PC2 PC3
Varimax rotation
After varimax rotation, it might look like:
PC1 1 2 3 4 5 % variance % variance % variance
original variables
PC2
PC3
original variables
obtained from 5 sites in northern California. Samples from 4 quarry sites obsidian were obtained in the same area. Mn, Rb, Sr,Y and Zr) was conducted on all 75 samples.
X-ray uorescence analysis for ten elements (Fe, Ti, Ba, Ca, K,
Questions posed.
100
80
60
40
20
0 F4 F5 F6 F7 F8 F9 F10
600
We can now produce displays of our components. A plot of the scores for PC1 vs. PC2 will result in about 98% of the original information being displayed. A loadings plot of L1 vs. L2 will show the importance of the original variables in the construction of PC1 and PC2.
K Ca Ti Rb Sr Y Mn Ba Zr Fe 4 4
-400 -800
F1 (81.68 %)
Eigenvalue
Ca
The loadings show that Ca and Fe both have an effect on PC1. The other variables have a smaller effect.
1 K Rb 0.5 Sr 0.25 Ca Ti
0.75
Ti
0 Y -0.25 Ba
XLStat will also produce a plot that indicates the correlation between the original variables and the factors. Here, it indicates that Y has little effect and the other have impact on both PCs.
0.022
Sr Y Mn Ba Rb Zr
0.810
K
Mn Fe
-0.5 Zr
-0.75
-1
Fe
-1
-0.75
-0.5
-0.25
0.25
0.5
0.75
L1 vs. L2
Correlation plot
Now that each variable is given equal weight, variance is no longer all smashed into the 1st component.
60
40
20
0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10
5 4 3 K Rb 2 2 1 1 2 22 2 6 6 2 6 2 2 5 5 1 1 11 11 5 1 1
Ca Sr
So what does it all mean? In this case, both scaled and unscaled results indicate a grouping of related samples - XRF results can be used to classify related samples. Samples from different quarries (1-4) can easily be determined although samples from site 2 are pretty scattered.
5 6
F2 (20.78 %)
2 1 0
4 4 Mn 4 4 Fe 4 4 4 Ba4 44 4 44 4 44 4 44 4 4
-1 -2 -3 -6 -5 -4 -3
33 3 73 3 3 3 37 3 3 33 3 7 3 3 73 3 3 3 3 3 3 7
5 Y
-2
-1
Zr
F1 (52.52 %)
Eigenvalue
5 4 3 2 1 0 3 3 3 7 3 3 3 3 37 3 33 3 7 3 7 3 3 3 3 3 3 3 7 5 Y 26 6 2 2 6 K Rb 2 2 1 1 1 1 1 11 5 1 1 1 5
Ca Sr
Ti
F2 (20.78 %)
F2 (20.78 %)
22 2
No artifacts appear to have come from quarry site 4. Its well resolved from the other samples.
5 6
4 3 2 1 0 26 6 2 2 6
K Rb 2 2 1 1 1 1 1 11 5 1 1 1 5
Ca Sr
Ti
22 2 2
Artifacts from site 6 appear to come from quarry 2, although the results are scattered. Artifacts from site 7 are from quarry 3. Site 5 artifacts appear to come from all over the place. Its possible that this was a nomadic tribe.
5 6
4 4 Mn 4 4Fe 4 4 4 Ba 444 4 44 4 4 4 4 4 4 4 4
4 4 Mn 4 4Fe 4 4 4 Ba 444 4 44 4 4 4 4 4 4 4 4
-1 -2 -3 -6 -5 -4 -3
-1 -2 -3 -6 -5 -4 -3
3 3 3 7 3 3 3 3 37 3 33 3 7 3 3 7 3 3 3 3 3 3 7
5 Y
-2
-1 0 1 F1 (52.52 %)
Zr 3
-2
-1 0 1 F1 (52.52 %)
Zr 3
1 0.75 0.5 0.25 0 -0.25 -0.5 -0.75 -1 -1 -0.75 -0.5 -0.25 0 0.25 0.5 K Rb
In addition,V appears to have little effect on We can reprocess our data after eliminating
V and some of our correlated variables.
Use Fe Remove
Zr Y Ba Mn Fe
Use
1
0.75
Modied Study
We now only 4 variables.
F2 (27.66 %)
Ca
Is it enough to still give the same characterization as with the original 10? If this works, well be able to save quite a bit of time and money on subsequent assays. Also, does it improve our results?
2 2 26 6
2 0
22 2 2 1 5 1 1 1 1 1 1 11 1 5 5
Fe 44 4 4 44 44 4 4 4 4 4 44 4 44
-2
7 37 3 7 3 3 33 3 3 3 7 3 3 3333 3 3 7 33
-4 -5 -3 -1 1 3
Zr 5
F1 (60.47 %)
Classication of whiskey
Another example -- the work conducted in our laboratory. One study involved the characterization of whiskey based on GC/MS traces. This example show what you might need to do in order to make your data suitable for PCA evaluations.
Representative whiskeys
Methylene chloride extracts for a series of whiskeys were assayed using a GC/MS. Variables needed to be constructed from these traces.
Data preprocessing
Each chromatograph consisted of approximately 1800 points. This would be too much for many systems to handle.
Variables were constructed by summing response at 1 min intervals resulting in 30 variables.
Data preprocessing
To improve variable stability: The smallest response value was treated as
a baseline for background correction.
Data preprocessing
All data was autoscaled prior to PCA. Questions asked Could whiskies be classied? Could the approach be used to detect sample dilution? blending? contamination?
PC2
C CC
-1
C CC C CC LL L LL L L C L CC
S S S S S S S SS S S S S S S S
T T TTT T
B B B B B B B B B
BB
BB
-3 -3 -1 1 3
PC1
3 X X X X XX X X X X X XX X XX X X 25Y X X X X X 50Y X X X X X X XX
Contamination
4
PC2
-1
PC1
-3 -3 -1
PC1
-2
ppm
60
90
Dilution
3 0 0 0 20 20 20 1 40 40 40 60 6060 80 80 80 X X X X X X X X X X X XX X XX X X X X X X X X X X X X X X
PC2
-1
% by V, whiskey
-3 -7 -3
PC1