Vous êtes sur la page 1sur 6

13-7-4

Statistics for HCI Research: Principal Component Analysis (PCA)

Statistics for HCI Research : PCA


Login/Register Search:

Principal Component Analysis (PCA)


Table Of Contents
1 . Introduction 2. R code ex ample 3. Interpretation of the results of PCA 4. PCA and Logistic regression 5. Difference between PCA and Factor Analy sis

Introduction
Principal Component Analy sis (PCA) is a powerful tool when y ou hav e many v ariables and y ou want to look into things that these v ariables can ex plain. As the name of PCA suggests, PCA finds the combination of y our v ariables which ex plains the phenomena. In this sense, PCA is useful when y ou want to reduce the num ber of the v ariables. One common scenario of PCA is that y ou hav e n v ariables and y ou want to combine them and make them 3 or 4 v ariables without losing much of the information that the original data hav e. More mathematically , PCA is try ing to find some linear projections of y our data which preserv e the information y our data hav e. PCA is one of the methods y ou may want to try if y ou hav e lots of Likert data and try to understand what these data tell y ou. Let's say we asked the participants four 7 -scale Likert questions about what they care about when choosing a new computer, and got the results like this. Participant Price Software Aesthetics Brand P1 P2 P3 P4 P5 P6 P7 P8 P9 P1 0 P1 1 P1 2 P1 3 P1 4 P1 5 P1 6 6 7 6 5 7 6 5 6 3 1 2 5 2 3 1 2 5 3 4 7 7 4 7 5 5 3 6 7 4 5 6 3 3 2 4 1 5 2 2 4 6 7 6 7 5 6 5 7 4 2 5 3 5 3 1 4 7 5 7 6 6 5 5 7

Price: A new computer is cheap to y ou (1 : strongly disagree -- 7 : strongly agree), Software: The OS on a new computer allows y ou to use software y ou want to use (1 : strongly disagree -- 7 : strongly
yatani.jp/HCIstats/PCA 1/6

13-7-4

Statistics for HCI Research: Principal Component Analysis (PCA)

agree), Aesthetics: The appearance of a new computer is appealing to y ou (1 : strongly disagree -- 7 : strongly agree), Brand: The brand of the OS on a new computer is appealing to y ou (1 : strongly disagree -- 7 : strongly agree)

Now what y ou want to do is what combination of these four v ariables can ex plain the phenomena y ou observ ed. I will ex plain this with the ex ample R code.

R code example
Let's prepare the same data shown in the table abov e.

>P r i c e< -c ( 6 , 7 , 6 , 5 , 7 , 6 , 5 , 6 , 3 , 1 , 2 , 5 , 2 , 3 , 1 , 2 ) >S o f t w a r e< -c ( 5 , 3 , 4 , 7 , 7 , 4 , 7 , 5 , 5 , 3 , 6 , 7 , 4 , 5 , 6 , 3 ) >A e s t h e t i c s< -c ( 3 , 2 , 4 , 1 , 5 , 2 , 2 , 4 , 6 , 7 , 6 , 7 , 5 , 6 , 5 , 7 ) >B r a n d< -c ( 4 , 2 , 5 , 3 , 5 , 3 , 1 , 4 , 7 , 5 , 7 , 6 , 6 , 5 , 5 , 7 ) >d a t a< -d a t a . f r a m e ( P r i c e ,S o f t w a r e ,A e s t h e t i c s ,B r a n d )

At this point, data looks pretty much the same as the table abov e. Now, we do PCA. In R, there are two functions for PCA: prcomp() and princomp(). prcomp() uses a correlation coefficient matrix , and princomp() uses a v ariance cov ariance matrix . But it seems that the results become similar in many cases (which I hav en't formally tested, so be careful), and the results gained from princomp() hav e nice features, so here I use princomp().

>p c a< -p r i n c o m p ( d a t a ,c o r = T ) >s u m m a r y ( p c a ,l o a d i n g s = T )

And here is the result of the PCA.

I m p o r t a n c eo fc o m p o n e n t s : C o m p . 1 S t a n d a r dd e v i a t i o n C o m p . 2 C o m p . 3 C o m p . 4 1 . 5 5 8 9 3 9 10 . 9 8 0 4 0 9 20 . 6 8 1 6 6 7 30 . 3 7 9 2 5 7 7 7

P r o p o r t i o no fV a r i a n c e0 . 6 0 7 5 7 2 70 . 2 4 0 3 0 0 60 . 1 1 6 1 6 7 60 . 0 3 5 9 5 9 1 1 C u m u l a t i v eP r o p o r t i o n 0 . 6 0 7 5 7 2 70 . 8 4 7 8 7 3 30 . 9 6 4 0 4 0 91 . 0 0 0 0 0 0 0 0 L o a d i n g s : C o m p . 1C o m p . 2C o m p . 3C o m p . 4 P r i c e S o f t w a r e B r a n d 0 . 5 2 3 0 . 8 4 8 0 . 1 7 7 0 . 9 7 70 . 1 2 0 0 . 5 8 3 0 . 1 6 7 0 . 4 2 3 0 . 6 7 4

A e s t h e t i c s 0 . 5 9 7 0 . 1 3 4 0 . 2 9 50 . 7 3 4

I will ex plain how to interpret this result in the nex t section.

Interpretation of the results of PCA


Let's take a look at the table for loadings, which mean the coefficients for the "new" v ariables.

yatani.jp/HCIstats/PCA

2/6

13-7-4

Statistics for HCI Research: Principal Component Analysis (PCA)

Com p.1 Com p.2 Com p.3 Com p.4 Price Software Brand -0.523 -0.1 7 7 0.583 0.97 7 0.1 34 0.1 67 0.848 -0.1 20 0.295 0.423 -0.7 34 0.67 4

Aesthetics 0.597

From the second table (loadings), PCA found four new v ariables which can ex plain the same information as the original four v ariables (Price, Software, Aesthetics, and Brand), which are Comp.1 to Comp.4. And Comp.1 is calculated as follows: Comp.1 = -0.523 * Price - 0.1 7 7 * Software + 0.597 * Aesthetics + 0.583 * Brand

Thus, PCA successfully found a new combination of the v ariables, which is good. The nex t thing we want to know is how much each of new v ariables has a power to ex plain the information that the original data hav e. For this, y ou need to look at Standard dev iation , and Cum ulativ e Proportion (of Variance) in the result. Com p.1 Com p.2 Com p.3 Com p.4 Standard dev iation 1 .56 0.98 0.85 0.68 0.96 0.38 1 .00 Cumulativ e Proportion 0.61

Standard dev iation means the standard dev iation of the new v ariables. PCA calculates the combination of the v ariables such that new v ariables hav e a large standard dev iation. Thus, generally a larger standard dev iation means a better v ariable. A heuristics is that we take all the new v ariables whose standard dev iations are roughly ov er 1 .0 (so, we will take Comp.1 and Comp.2). Another way to determine how many new v ariables we want to take is to look at cumulativ e proportion of v ariance. This means how much of the information that the original data hav e can be described by the combination of the new v ariables. For instance, with only Comp.1 , we can describe 61 % of the information the original data hav e. If we use Comp.1 and Comp2, we can describe 85% of them. Generally , 80% is considered as the number of the percentage which describes the data well. So, in this ex ample, we can take Comp.1 and Comp.2, and ignore Comp.3 and Comp.4. In this manner, we can decrease the number of the v ariables (in this ex ample, from 4 v ariables to 2 v ariables). Y our nex t task is to understand what the new v ariable means in the contex t of y our data. As we hav e seen, the first new v ariable can be calculated as follows: Comp.1 = -0.523 * Price - 0.1 7 7 * Software + 0.597 * Aesthetics + 0.583 * Brand

It is a v ery good idea to plot the data to see what this new v ariable means. Y ou can use scores to take the v alues of each v ariable modeled by PCA.

>p l o t ( p c a $ s c o r e s [ , 1 ] ) >b a r p l o t ( p c a $ s c o r e s [ , 1 ] )

With the graphs (sorry I was kinda lazy to upload the graph, but y ou can quickly generate it by y ourself), y ou can see Participant 1 - 8 get negativ e v alues, and the other participants get positiv e v alues. It seems that this new v ariable indicates whether a user cares about Price and Software or Aesthetics and Brand for her computer. So, we probably can name this v ariable as "Feature/Fashion index " or something. There is no definitiv e answer for this part of PCA. Y ou need to go through
yatani.jp/HCIstats/PCA 3/6

13-7-4

Statistics for HCI Research: Principal Component Analysis (PCA)

y our data and make sense what the new v ariables mean by y ourself.

PCA and Logistic regression


Once y ou hav e done the analy sis with PCA, y ou may want to look into whether the new v ariables can predict some phenomena well. This is kinda like machine learning: Whether features can classify the data well. Let's say y ou hav e asked the participants one more thing, which OS they are using (Windows or Mac) in y our surv ey , and the results are like this. Participant Price Software Aesthetics Brand OS P1 P2 P3 P4 P5 P6 P7 P8 P9 P1 0 P1 1 P1 2 P1 3 P1 4 P1 5 P1 6 6 7 6 5 7 6 5 6 3 1 2 5 2 3 1 2 5 3 4 7 7 4 7 5 5 3 6 7 4 5 6 3 3 2 4 1 5 2 2 4 6 7 6 7 5 6 5 7 4 2 5 3 5 3 1 4 7 5 7 6 6 5 5 7 0 0 0 0 1 0 0 0 1 1 0 1 1 1 1 1

Here what we are going to do is to see whether the new v ariables giv en by PCA can predict the OS people are using. OS is 0 or 1 in our case, which means the dependent v ariable is binomial. Thus, we are going to do logistic regression. I will skip the details of logistic regression here. If y ou are interested, the details of logistic regression are av ailable in a separate page . First, we prepare the data about OS.

>O S< -c ( 0 , 0 , 0 , 0 , 1 , 0 , 0 , 0 , 1 , 1 , 0 , 1 , 1 , 1 , 1 , 1 )

Then, fit the first v ariable we found through PCA (i.e. . Comp.1 ) to a logistic function.

>m o d e l< -g l m ( O S~p c a $ s c o r e s [ , 1 ] ,f a m i l y = b i n o m i a l ) >s u m m a r y ( m o d e l )

Now y ou get the logistic function model.

C a l l : g l m ( f o r m u l a=O S~p c a $ s c o r e s [ ,1 ] ,f a m i l y=b i n o m i a l ) D e v i a n c eR e s i d u a l s : M i n


yatani.jp/HCIstats/PCA

1 Q

M e d i a n

3 Q

M a x
4/6

13-7-4

Statistics for HCI Research: Principal Component Analysis (PCA)

2 . 1 9 7 4 6 0 . 4 4 5 8 6 C o e f f i c i e n t s :

0 . 0 1 9 3 2

0 . 6 0 0 1 8

1 . 6 5 2 6 8

E s t i m a t eS t d .E r r o rzv a l u eP r ( > | z | ) ( I n t e r c e p t ) S i g n i f .c o d e s : 0 * * * 0 . 0 0 1 * * 0 . 0 1 * 0 . 0 5 . 0 . 1 1 ( D i s p e r s i o np a r a m e t e rf o rb i n o m i a lf a m i l yt a k e nt ob e1 ) N u l ld e v i a n c e :2 2 . 1 8 1 o n1 5 d e g r e e so ff r e e d o m R e s i d u a ld e v i a n c e :1 2 . 0 3 3 o n1 4 d e g r e e so ff r e e d o m A I C :1 6 . 0 3 3 N u m b e ro fF i s h e rS c o r i n gi t e r a t i o n s :5 0 . 0 8 3 7 1 0 . 7 4 2 1 6 0 . 1 1 3 0 . 6 2 1 2 9 2 . 3 0 1 0 . 9 1 0 2 0 . 0 2 1 4* p c a $ s c o r e s [ ,1 ] 1 . 4 2 9 7 3

Let's see how well this model predicts the kind of OS. Y ou can use fitted() function to see the prediction.

>f i t t e d ( m o d e l )

1 8 1 5

2 9 1 6

3 1 0

4 1 1

5 1 2

6 1 3

7 1 4

0 . 1 5 1 7 3 7 2 30 . 0 4 1 5 9 4 4 90 . 3 4 9 6 8 7 3 30 . 0 4 4 0 6 1 3 30 . 2 5 5 2 0 7 4 50 . 0 7 8 0 8 6 3 30 . 0 2 6 4 9 1 6 6 0 . 2 1 7 4 4 4 5 40 . 8 9 4 3 3 0 7 90 . 9 3 6 1 2 4 1 10 . 9 1 0 5 7 9 9 40 . 7 3 4 2 8 6 4 80 . 8 5 1 9 0 9 3 10 . 7 6 2 8 5 1 7 0 0 . 7 8 1 4 9 8 8 90 . 9 6 4 1 0 8 4 1

These v alues represent the probabilities of being 1 . For ex ample, we can ex pect 1 5% chance that Participant 1 is using OS 1 based on the v ariable deriv ed by PCA. Thus, in this case, Participant 1 is more likely to be using OS 0, which agrees with the surv ey response. In this way , PCA can be used with regression models for calculating the probability of a phenomenon or making a prediction.

Difference between PCA and Factor Analysis


A similar concept and name to PCA y ou may hav e heard of is Factor Analy sis. I ex plain the difference between PCA and factor analy sis in the factor analy sis page .

Com m ents [Show the comment form] V ery nice and simple. It helps me to understand what PCA does in 5 minutes. Thanks a lot.
Anony m ous (2 01 2 -06 -1 9 00:4 7 :52 )

Can y ou also ex plain how one can use it for classification?? I am struggling to use it for predicting classes and I am not understanding how to use the components. It would be really of great help to me.
Anony m ous (2 01 2 -1 1 -2 6 1 5:54 :07 ) yatani.jp/HCIstats/PCA 5/6

13-7-4

Statistics for HCI Research: Principal Component Analysis (PCA)

PCA itself is not doing any classification, but y ou can use it to reduce the number of features y ou use. For ex ample, if y ou hav e hundres or thousands of features, some machine learning techniques may take lots of time to run for prediction. In this case, y ou can perform PCA to generate some most informativ e features (fewer number than the original feature set). PCA is just a linear transformation, and should be super fast to run.
KojiYatani (2 01 2 -1 2 -1 2 00:3 3 :2 5)

nice tutorial, really helpful


Anony m ous (2 01 3 -03 -2 9 00:52 :4 7 )

Owner: KojiYatani

Valid XHTML 1.0 Transitional :: Valid CSS :: Powered by WikkaWiki

yatani.jp/HCIstats/PCA

6/6

Vous aimerez peut-être aussi