PCA:
If you have "n" samples, each with "d" features. You want a smaller set of k features (< d) that preserves as much total variance as possible without using class labels.
Algorithm:
- Subtract the average of each feature so all features start from a common origin.
- Build a “covariance matrix” that tells you, for every pair of features, whether they rise and fall together or move in opposite ways.
- Run an eigen‑decomposition on that matrix. It returns a ranked list of directions (called principal components) ordered by how much overall variation they capture.
- Keep only the top k directions you care about.
- Multiply the original data by those top directions. The result is a compact version of your data with far fewer columns but still holding most of its original information.
Bottom-line: PCA is an unsupervised compression tool. It ignores any class labels you might have and focuses solely on where the data varies the most.
Fisher Linear Discriminant / Linear Discriminant Analysis:
LDA is supervised. It purposely uses the labels to find axes that best separate the classes, not the axes that merely carry the most variance.
Algorithm:
- Group your data by class and compute a mean for each group.
- For every class, look at how spread‑out its points are around its own mean. Combine these into an overall within‑class scatter measure.
- Compare each class mean to the grand mean of all data. Combine these into a between‑class scatter measure.
- Solve a generalised eigenproblem that balances the two scatter measures. The best directions are the ones where the ratio “between‑class / within‑class” is maximised.
- Choose up to C − 1 of those directions (where C is the number of classes).
- In that space, points from different classes are now as far apart as possible relative to their own cluster size—ideal for a simple linear classifier or for visual inspection.
Hope this helps!