Reducing a dataset's dimensions with PCA and projecting it onto a graph.

Hi all,
Like the title says, I'd like to take a large dataset X of m by n size and reduce it's dimensions to m by 2 length, and from there project it onto a 2D plane where the axis are the two main principal components of X. Is the below code enough, or have I missed a step?
[a,b] = pca(X);
b = b';
plot(b(1,:),b(2,:),'.');
From what I understand, taking the two columns of b gives me the two principal components that tell us the most about X's structure, whilst having reduced it's dimensions down. I'm basing this on the code I found here, but I just wanted to check I had it correct.

Answers (1)

Hi Alasdair,
Your approach to using PCA to reduce the dimensions of your dataset and project it onto a 2D plane is mostly correct, but there are a few clarifications and adjustments needed for the typical PCA workflow in MATLAB.
% Assume X is your m-by-n dataset
% Perform PCA
[coeff, score, ~] = pca(X);
% coeff contains the principal component coefficients
% score contains the transformed data in the principal component space
% Reduce the data to 2 dimensions by selecting the first two columns of the score
X_reduced = score(:, 1:2);
% Plot the reduced data
plot(X_reduced(:, 1), X_reduced(:, 2), '.');
xlabel('Principal Component 1');
ylabel('Principal Component 2');
title('2D Projection of Dataset using PCA');
  • pca Function: The pca function returns three outputs: coeff, score, and latent.
  • coeff contains the principal component coefficients (eigenvectors).
  • score is the representation of your data in the principal component space.
  • latent contains the eigenvalues which represent the variance explained by each principal component.
  • Transformation: The score matrix is the transformed dataset where each column corresponds to a principal component. By selecting the first two columns of score, you are projecting your dataset onto the plane defined by the first two principal components.
  • Plotting: Plotting X_reduced gives you the 2D visualization of your dataset in the reduced space.
This approach ensures that you are correctly reducing the dimensionality of your dataset and visualizing it in terms of the most significant principal components.

Categories

Asked:

on 27 Feb 2018

Answered:

on 31 Jan 2025

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!