Principal component analysis: Difference between revisions

From Machinelearning
No edit summary
Line 9: Line 9:
** the best linear compression-recovery of data to a lower dimension (see Shalev-Shwartz and Ben-David). Is this the same as minimum-error interpretation?
** the best linear compression-recovery of data to a lower dimension (see Shalev-Shwartz and Ben-David). Is this the same as minimum-error interpretation?
** maximum-variance and minimum-error are related by the Pythagorean theorem, see [https://drive.google.com/file/d/0B3-japQ2zgG_MGM3cHdzdGRyMm8/view page 16 of these slides]. There's a similar picture in [https://jeremykun.com/2016/05/16/singular-value-decomposition-part-2-theorem-proof-algorithm/ this post].
** maximum-variance and minimum-error are related by the Pythagorean theorem, see [https://drive.google.com/file/d/0B3-japQ2zgG_MGM3cHdzdGRyMm8/view page 16 of these slides]. There's a similar picture in [https://jeremykun.com/2016/05/16/singular-value-decomposition-part-2-theorem-proof-algorithm/ this post].
** once you've done PCA, how do you calculate the percentage of variance captured by a principal component? what is the relationship between the percentage variance and the size of the eigenvalue?
** once you've done PCA, how do you calculate the percentage of variance captured by a principal component? what is the relationship between the percentage variance and the size of the eigenvalue (the larger the eigenvalue, the larger the variance, but what is the specific relationship)?

Revision as of 04:33, 14 July 2018

Questions/things to explain

  • Analogously to the covariance matrix one can define a correlation matrix. What happens if you run SVD on the correlation matrix?
  • multiple ways to look at PCA:
    • SVD on the covariance matrix (this is probably the same as the maximum variance interpretation, or rather a sub-interpretation of that; if you view the covariance matrix as a transformation that takes white noise to your data set, then the principal components = axes of the ellipsoid = the views that maximize variance)
    • maximum variance (see Bishop). This one uses the Lagrange multiplier and derivative of a quadratic form.
    • minimum-error (see Bishop)
    • the best linear compression-recovery of data to a lower dimension (see Shalev-Shwartz and Ben-David). Is this the same as minimum-error interpretation?
    • maximum-variance and minimum-error are related by the Pythagorean theorem, see page 16 of these slides. There's a similar picture in this post.
    • once you've done PCA, how do you calculate the percentage of variance captured by a principal component? what is the relationship between the percentage variance and the size of the eigenvalue (the larger the eigenvalue, the larger the variance, but what is the specific relationship)?