Clear Filters
Clear Filters

Principal Component Analysis Reconstructing Centred Data

8 views (last 30 days)
BOB
BOB on 19 Apr 2019
Answered: Aditya on 27 Jun 2024 at 5:38
Hi, I have the following dataset which I have performed PCA on:
DATASET =
10.0000 6.0000
11.0000 4.0000
8.0000 5.0000
3.0000 3.0000
2.0000 2.8000
1.0000 1.0000
As I understand, the "score" output multiplied by the "coeff" output reconstructs the centered data. I assume by centred data it means fixing the data to the origin as descrbed in this tutorial video "https://www.youtube.com/watch?v=FgakZw6K1QQ"? If so why does my data when centred manually not equal the outputs of score*coeff? The score*coeff results in:
>> score*coeff
ans =
4.7231 -0.8092
4.2295 -2.9901
2.5422 -0.3156
-2.5933 1.3053
-3.4936 1.7842
-5.4078 1.0253
But then, the mean of the first column minus every value in that column (for the centred values of the first variable) and the mean of the second column minus every value in that column (for the centred values of the second variable) equals different values, even though this is presumably how you centre the data around the origin?
>> CentredVariable1 = mean(DATASET(:,1))-DATASET(:,1)
CentredVariable1 =
-4.1667
-5.1667
-2.1667
2.8333
3.8333
4.8333
>> CentredVariable2 = mean(DATASET(:,2))-DATASET(:,2)
CentredVariable2 =
-2.3667
-0.3667
-1.3667
0.6333
0.8333
2.6333

Answers (1)

Aditya
Aditya on 27 Jun 2024 at 5:38
he discrepancy you are observing is due to a misunderstanding of how data centering works in the context of Principal Component Analysis (PCA). When you perform PCA, the data is centered by subtracting the mean of each column from the respective column values. However, it seems like you are subtracting the mean from the data incorrectly.
To center the data correctly, you should subtract the mean of each column from each element in that column.
Example code for the same is as follows:
% Original dataset
DATASET = [
10.0000 6.0000;
11.0000 4.0000;
8.0000 5.0000;
3.0000 3.0000;
2.0000 2.8000;
1.0000 1.0000
];
% Step 1: Calculate the mean of each column
mean_data = mean(DATASET);
% Step 2: Center the data by subtracting the mean
centered_data = DATASET - mean_data;
% Step 3: Perform PCA
[coeff, score, ~] = pca(DATASET);
% Step 4: Reconstruct the centered data
reconstructed_centered_data = score * coeff';
% Display results
disp('Original Centered Data:');
disp(centered_data);
disp('Reconstructed Centered Data:');
disp(reconstructed_centered_data);
% Verify that the centered data matches the reconstructed centered data
assert(isequal(round(centered_data, 4), round(reconstructed_centered_data, 4)), 'Centered data does not match reconstructed data.');
The correct way to center the data is:
centered_data = DATASET - mean(DATASET);
This ensures that each column of the data has a mean of zero, which is a crucial step before performing PCA.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!