Determining variables that contribute to principal components

Hi,
I am trying to do a PCA analysis on a (24x3333) matrix where 24 is the number of observations and 3333 is the number of variables. I am using:
[coeff,score,eigval] = princomp(zscore(aggregate));
23 PCs are needed to explain 95% of the variance in the data. My question is how do I know which variables are contributing to each component. I believe I need to make a variable spreadsheet naming all 3333 variables. However, it is not clear how I would be able to identify the variables contributing to each component.
I also am creating a variable: %percent variation explained (PVE): variation in the original variable explained by a principal component
Because ultimately I want to quantify how much a variable contributes to its respective principal component.
for i = 1:3333
pve(:,i) = 100*coeff(i,i)*sqrt(var(score(:,i)))/(var(aggregate(:,i)));
end
Any insight would be a big help. I've been trying to figure this out for for weeks with no luck.
Thanks,
Eric

 Accepted Answer

The first paragraph in the doc description for princomp says "COEFF is a p-by-p matrix, each column containing coefficients for one principal component." For example, to project your data onto the 1st principal axis, do zscore(aggregate)*coeff(:,1). Why not measure the contribution of a variable to a component by the size of the respective coefficient? Especially since you have standardized your data by zscore.
Since you have 23 components, the columns in score past 23 are filled with zeros. If you need to get the principal component variance, take the 3rd output from princomp.

5 Comments

Ilya,
Thank you for responding. From score, I am projecting each participants data along the respective PC.
Whether I project data along a principal axis or measuring the contribution of a variable to component by the size of the respective coefficient, I am still at the same problem. That is I still don't know the id of the original variables that are loading a principal component.
Eric
I am sorry, I don't understand what you mean by "the id of the original variables that are loading a principal component." You seem to believe that there is a one-to-one correspondence between a variable and a principal component. There is no such thing. PCA defines an orthogonal rotation in the multivariate space. Generally, all variables contribute to all components. For example, all variables with non-zero coefficients in coeff(:,1) contribute to the 1st component. You can judge which one contributes more by the size of the coefficient.
I see what you mean in regards that each coefficient loads each PC and if I have 3333 variables, therefore, have 3333 coefficients for each PC. However, in the gait studies that have used PCA, authors are able to report the percent of the variance explained by particular variables (i.e. joint angle and moment waveforms) on a PC. My coefficients are small (<1 to <<1). There are 33 angle and moment measures in each PC for each joint rotation there data 101 points which is why the row dimension of each PC is 3333. The PCs are ordered by from largest to smallest but not the coefficients. Would really the only way to get at the overall contribution of each of the 101 points for each joint rotation would be to average the coefficients in 101 point increments down each PC I retain?
I could understand what you mean if you wanted to go in the opposite direction, that is, explain variance in an original variable by a specific principal component. Since the covariance matrix is diagonal in the PCA space, we can separate contributions of the principal components to the variance of a variable. I do not see how to separate variable contributions to the variance of a principal component since the variables are not independent (and if they were, you would not need PCA in the first place). Here is what you could do:
% Load data and perform PCA
load hald
[coeff,~,latent] = princomp(ingredients);
cov(ingredients)
% Variance in variable I explained by principal component J
i = 2;
j = 1;
varI = coeff(i,:)*(latent.*coeff(i,:)')
varIfromJ = coeff(i,j)*latent(j)*coeff(i,j)
percVarIfromJ = varIfromJ/varI
Alternatively, you could, of course, ask the authors of those gait studies what exactly they did.
Thank you. Your way to find percVarIformJ was what I was attempting to do with my original for loop.

Sign in to comment.

More Answers (0)

Categories

Asked:

on 26 Sep 2012

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!