Principal component analysis

Principal component analysis (PCA) is the prime linear method among the various methods used for dimensionality reduction. It consists in constructing the covariance matrix of the data, and computing the eigenvectors of the matrix.^[1]

Questions/things to explain

Analogously to the covariance matrix one can define a correlation matrix. What happens if you run SVD on the correlation matrix?
multiple ways to look at PCA:
- SVD on the covariance matrix (this is probably the same as the maximum variance interpretation, or rather a sub-interpretation of that; if you view the covariance matrix as a transformation that takes white noise to your data set, then the principal components = axes of the ellipsoid = the views that maximize variance)
- maximum variance (see Bishop). This one uses the Lagrange multiplier and derivative of a quadratic form.
- minimum-error (see Bishop)
- the best linear compression-recovery of data to a lower dimension (see Shalev-Shwartz and Ben-David). Is this the same as minimum-error interpretation?
- maximum-variance and minimum-error are related by the Pythagorean theorem, see page 16 of these slides. There's a similar picture in this post.
once you've done PCA, how do you calculate the percentage of variance captured by a principal component? what is the relationship between the percentage variance and the size of the eigenvalue (the larger the eigenvalue, the larger the variance, but what is the specific relationship)?
What is PCA good for? compressing data, dimensionality reduction/preprocessing step before passing to another learning algorithm, visualization, etc.
When does PCA not work so well?

References

↑ What is Dimensionality Reduction – Techniques, Methods, Components

[1] What is Dimensionality Reduction – Techniques, Methods, Components

[1]