Principal component analysis

From Machinelearning

Questions/things to explain

  • Analogously to the covariance matrix one can define a correlation matrix. What happens if you run SVD on the correlation matrix?
  • multiple ways to look at PCA:
    • SVD on the covariance matrix (this is probably the same as the maximum variance interpretation, or rather a sub-interpretation of that; if you view the covariance matrix as a transformation that takes white noise to your data set, then the principal components = axes of the ellipsoid = the views that maximize variance)
    • maximum variance (see Bishop). This one uses the Lagrange multiplier and derivative of a quadratic form.
    • minimum-error (see Bishop)
    • the best linear compression-recovery of data to a lower dimension (see Shalev-Shwartz and Ben-David). Is this the same as minimum-error interpretation?
    • maximum-variance and minimum-error are related by the Pythagorean theorem, see page 16 of these slides. There's a similar picture in this post.
    • once you've done PCA, how do you calculate the percentage of variance captured by a principal component?