User:IssaRice/Linear algebra/Singular value decomposition: Difference between revisions
No edit summary |
No edit summary |
||
Line 7: | Line 7: | ||
* (polar decomposition.) In the linked image, look at the axes of the final ellipse, labeled <math>\sigma_1</math> and <math>\sigma_2</math>. Call those vectors <math>u_1</math> and <math>u_2</math>. So <math>u_1 = Mv_1</math> and <math>u_2 = Mv_2</math> for some vectors v1 and v2. Now, backtrack along the arrows, starting from the final image, going through U, then Sigma, then V*. Pay attention to what it does to u1 and u2. In each step, the vectors remain orthogonal. So not only are u1 and u2 orthogonal, we must have that v1 and v2 are orthogonal. So now, couldn't we say, "take v1 and v2, squish along those axes. then rotate." That seems to have required only one rotation. What's going on? The problem is that a diagonal matrix can only stretch along the standard basis. So "stretch along v1 and v2" can't be done via a diagonal matrix (unless v1 and v2 are the standard basis, of course). Let's say <math>M = RD</math> where R is a rotation, and D is "stretch along v1 and v2". So <math>Dv_1 = \lambda_1 v_1</math> and <math>Dv_2 = \lambda_2 v_2</math>. Now, D is not a diagonal matrix, when viewed in the standard basis, but it ''is'' a diagonal matrix when viewed under the basis v1,v2. To get to the standard basis, we need to convert the incoming vectors like v1 into e1, then apply the stretching, then reconvert back to v1. In other words, we want to show <math>D = U\Sigma U^*</math> for some diagonal matrix Sigma and orthogonal U. Just take <math>Ue_j=v_j</math>, i.e. the matrix U is the matrix with columns [v1 v2]. Now, <math>Dv_j = U\Sigma U^* v_j = U\Sigma e_j = U\lambda_j e_j = \lambda_j v_j</math> just like we wanted. So <math>M = RD = RU\Sigma U^*</math>. Of course, RU is another orthogonal matrix, so we recover SVD. | * (polar decomposition.) In the linked image, look at the axes of the final ellipse, labeled <math>\sigma_1</math> and <math>\sigma_2</math>. Call those vectors <math>u_1</math> and <math>u_2</math>. So <math>u_1 = Mv_1</math> and <math>u_2 = Mv_2</math> for some vectors v1 and v2. Now, backtrack along the arrows, starting from the final image, going through U, then Sigma, then V*. Pay attention to what it does to u1 and u2. In each step, the vectors remain orthogonal. So not only are u1 and u2 orthogonal, we must have that v1 and v2 are orthogonal. So now, couldn't we say, "take v1 and v2, squish along those axes. then rotate." That seems to have required only one rotation. What's going on? The problem is that a diagonal matrix can only stretch along the standard basis. So "stretch along v1 and v2" can't be done via a diagonal matrix (unless v1 and v2 are the standard basis, of course). Let's say <math>M = RD</math> where R is a rotation, and D is "stretch along v1 and v2". So <math>Dv_1 = \lambda_1 v_1</math> and <math>Dv_2 = \lambda_2 v_2</math>. Now, D is not a diagonal matrix, when viewed in the standard basis, but it ''is'' a diagonal matrix when viewed under the basis v1,v2. To get to the standard basis, we need to convert the incoming vectors like v1 into e1, then apply the stretching, then reconvert back to v1. In other words, we want to show <math>D = U\Sigma U^*</math> for some diagonal matrix Sigma and orthogonal U. Just take <math>Ue_j=v_j</math>, i.e. the matrix U is the matrix with columns [v1 v2]. Now, <math>Dv_j = U\Sigma U^* v_j = U\Sigma e_j = U\lambda_j e_j = \lambda_j v_j</math> just like we wanted. So <math>M = RD = RU\Sigma U^*</math>. Of course, RU is another orthogonal matrix, so we recover SVD. | ||
** also, if any of the entries of <math>\Sigma</math> are negative, we could have chosen a different v vector that would have made it positive, so we can assume that D is a positive operator (positive semi-definite). | ** also, if any of the entries of <math>\Sigma</math> are negative, we could have chosen a different v vector that would have made it positive, so we can assume that D is a positive operator (positive semi-definite). | ||
==See also== | |||
* https://machinelearning.subwiki.org/wiki/User:IssaRice/Linear_algebra/Classification_of_operators -- performing SVD on some nicer operators allows you to skip some of the steps, resulting in a simpler decomposition. |
Revision as of 06:03, 17 January 2020
the stupid textbooks don't tell you anything about SVD!!!! i think it's super helpful to look at all the wrong things one might say about SVD... we need to un-knot all those wrong intuitions. i'll list some knots that i have had.
starting at this image: https://en.wikipedia.org/wiki/File:Singular-Value-Decomposition.svg
- if A is an invertible matrix, then for some elementary matrices . Dilations and swapping elementary matrices obviously involve only orthogonal operations. So we can write A as an alternating product of orthogonal and shear matrices (the product of two orthogonal matrices is again orthogonal. right???). If we can prove SVD for shears, we can convert this to an alternating product of orthogonal and diagonal matrices. unfortunately, this doesn't seem to lead to a full proof of SVD (unless orthogonal and diagonal matrices somehow commute).
- one question one might have is, to get the behavior of M in the linked image, can't we just squish along the standard basis directions, then rotate? surely this would produce the same ellipse. And it would seem that we've only required one rotation, instead of the two in SVD. That's true, but pay attention to where the basis vectors went. A squish followed by a rotation... would preserve orthogonality. But in M it is clear that these basis vectors are no longer orthogonal. So even though we have faithfully preserved the ellipse, we don't have the same transformation. i.e. need not imply , apparently.
- (polar decomposition.) In the linked image, look at the axes of the final ellipse, labeled and . Call those vectors and . So and for some vectors v1 and v2. Now, backtrack along the arrows, starting from the final image, going through U, then Sigma, then V*. Pay attention to what it does to u1 and u2. In each step, the vectors remain orthogonal. So not only are u1 and u2 orthogonal, we must have that v1 and v2 are orthogonal. So now, couldn't we say, "take v1 and v2, squish along those axes. then rotate." That seems to have required only one rotation. What's going on? The problem is that a diagonal matrix can only stretch along the standard basis. So "stretch along v1 and v2" can't be done via a diagonal matrix (unless v1 and v2 are the standard basis, of course). Let's say where R is a rotation, and D is "stretch along v1 and v2". So and . Now, D is not a diagonal matrix, when viewed in the standard basis, but it is a diagonal matrix when viewed under the basis v1,v2. To get to the standard basis, we need to convert the incoming vectors like v1 into e1, then apply the stretching, then reconvert back to v1. In other words, we want to show for some diagonal matrix Sigma and orthogonal U. Just take , i.e. the matrix U is the matrix with columns [v1 v2]. Now, just like we wanted. So . Of course, RU is another orthogonal matrix, so we recover SVD.
- also, if any of the entries of are negative, we could have chosen a different v vector that would have made it positive, so we can assume that D is a positive operator (positive semi-definite).
See also
- https://machinelearning.subwiki.org/wiki/User:IssaRice/Linear_algebra/Classification_of_operators -- performing SVD on some nicer operators allows you to skip some of the steps, resulting in a simpler decomposition.