Principal component analysis

Questions/things to explain

Analogously to the covariance matrix one can define a correlation matrix. What happens if you run SVD on the correlation matrix?
multiple ways to look at PCA:
- SVD on the covariance matrix (this is probably the same as one of the other interpretations)
- maximum variance (see Bishop)
- minimum-error (see Bishop)
- the best linear compression-recovery of data to a lower dimension (see Shalev-Shwartz and Ben-David). Is this the same as minimum-error interpretation?