User:IssaRice/Linear algebra/Singular value decomposition: Difference between revisions

From Machinelearning
No edit summary
No edit summary
Line 7: Line 7:
* (polar decomposition.) In the linked image, look at the axes of the final ellipse, labeled <math>\sigma_1</math> and <math>\sigma_2</math>. Call those vectors <math>u_1</math> and <math>u_2</math>. So <math>u_1 = Mv_1</math> and <math>u_2 = Mv_2</math> for some vectors v1 and v2. Now, backtrack along the arrows, starting from the final image, going through U, then Sigma, then V*. Pay attention to what it does to u1 and u2. In each step, the vectors remain orthogonal. So not only are u1 and u2 orthogonal, we must have that v1 and v2 are orthogonal. So now, couldn't we say, "take v1 and v2, squish along those axes. then rotate." That seems to have required only one rotation. What's going on? The problem is that a diagonal matrix can only stretch along the standard basis. So "stretch along v1 and v2" can't be done via a diagonal matrix (unless v1 and v2 are the standard basis, of course). Let's say <math>M = RD</math> where R is a rotation, and D is "stretch along v1 and v2". So <math>Dv_1 = \lambda_1 v_1</math> and <math>Dv_2 = \lambda_2 v_2</math>. Now, D is not a diagonal matrix, when viewed in the standard basis, but it ''is'' a diagonal matrix when viewed under the basis v1,v2. To get to the standard basis, we need to convert the incoming vectors like v1 into e1, then apply the stretching, then reconvert back to v1. In other words, we want to show <math>D = U\Sigma U^*</math> for some diagonal matrix Sigma and orthogonal U. Just take <math>Ue_j=v_j</math>, i.e. the matrix U is the matrix with columns [v1 v2]. Now, <math>Dv_j = U\Sigma U^* v_j = U\Sigma e_j = U\lambda_j e_j = \lambda_j v_j</math> just like we wanted. So <math>M = RD = RU\Sigma U^*</math>. Of course, RU is another orthogonal matrix, so we recover SVD.
* (polar decomposition.) In the linked image, look at the axes of the final ellipse, labeled <math>\sigma_1</math> and <math>\sigma_2</math>. Call those vectors <math>u_1</math> and <math>u_2</math>. So <math>u_1 = Mv_1</math> and <math>u_2 = Mv_2</math> for some vectors v1 and v2. Now, backtrack along the arrows, starting from the final image, going through U, then Sigma, then V*. Pay attention to what it does to u1 and u2. In each step, the vectors remain orthogonal. So not only are u1 and u2 orthogonal, we must have that v1 and v2 are orthogonal. So now, couldn't we say, "take v1 and v2, squish along those axes. then rotate." That seems to have required only one rotation. What's going on? The problem is that a diagonal matrix can only stretch along the standard basis. So "stretch along v1 and v2" can't be done via a diagonal matrix (unless v1 and v2 are the standard basis, of course). Let's say <math>M = RD</math> where R is a rotation, and D is "stretch along v1 and v2". So <math>Dv_1 = \lambda_1 v_1</math> and <math>Dv_2 = \lambda_2 v_2</math>. Now, D is not a diagonal matrix, when viewed in the standard basis, but it ''is'' a diagonal matrix when viewed under the basis v1,v2. To get to the standard basis, we need to convert the incoming vectors like v1 into e1, then apply the stretching, then reconvert back to v1. In other words, we want to show <math>D = U\Sigma U^*</math> for some diagonal matrix Sigma and orthogonal U. Just take <math>Ue_j=v_j</math>, i.e. the matrix U is the matrix with columns [v1 v2]. Now, <math>Dv_j = U\Sigma U^* v_j = U\Sigma e_j = U\lambda_j e_j = \lambda_j v_j</math> just like we wanted. So <math>M = RD = RU\Sigma U^*</math>. Of course, RU is another orthogonal matrix, so we recover SVD.
** also, if any of the entries of <math>\Sigma</math> are negative, we could have chosen a different v vector that would have made it positive, so we can assume that D is a positive operator (positive semi-definite).
** also, if any of the entries of <math>\Sigma</math> are negative, we could have chosen a different v vector that would have made it positive, so we can assume that D is a positive operator (positive semi-definite).
* another question is: if we squish along the right orthogonal directions, can't we get away with not needing the extra rotation? after all, an ellipse can be squished into a circle without any rotations. what must be the case, although i can't explain it visually yet, is that if we do just that, then the standard basis vectors (yellow and pink) get mapped to the wrong spots.
* [https://math.stackexchange.com/questions/2899052/singular-value-decomposition-reconciling-the-maximal-stretching-and-spectral my old question]:
** what is this <math>\sqrt{T^*T}</math> that axler keeps talking about? that's the "stretch along well-chosen orthogonal directions" operation that we start out with in polar decomposition.
** for proof (1), see [http://cognitivemedium.com/emm/emm.html michael nielsen]. basically, the maximal stretching direction has a tangent vector (on the ellipse) that is orthogonal to it, because if it ''wasn't'' orthogonal, then we could get an even more stretched out vector. the other piece that's required is that linear maps preserve tangency. i.e. if v(t) is a parametrization of a circle, and M is a matrix, then M(v(t)) traces out an ellipse as t varies. (i'm using t as a parameter even though nielsen uses it as a vector. seriously, who the heck uses t for a vector??) the tangent vector on the circle at v(t) is v'(t). this tangent vector gets mapped to M(v'(t)). the tangent vector at M(v(t)) on the ellipse is <math display=inline>\frac{d}{dt} M(v(t))</math>. now, by linearity of M and the definition of the derivative, we can basically "pull out" the M and see that <math display=inline>\frac{d}{dt} M(v(t)) = M(\frac{d}{dt} v(t))</math>. what this means is that if you have a point on the circle and its tangent, then you map both of them under M, then the tangent of the image of the point is the image of the tangent at the point. what this implies is that for our maximal stretch vector, since the tangent on the circle is orthogonal, the image of that tangent is also a tangent at the new place on the ellipse, and we already know that the tangent is orthogonal for the maximal stretch vector.


==See also==
==See also==


* https://machinelearning.subwiki.org/wiki/User:IssaRice/Linear_algebra/Classification_of_operators -- performing SVD on some nicer operators allows you to skip some of the steps, resulting in a simpler decomposition.
* https://machinelearning.subwiki.org/wiki/User:IssaRice/Linear_algebra/Classification_of_operators -- performing SVD on some nicer operators allows you to skip some of the steps, resulting in a simpler decomposition.

Revision as of 06:57, 17 January 2020

the stupid textbooks don't tell you anything about SVD!!!! i think it's super helpful to look at all the wrong things one might say about SVD... we need to un-knot all those wrong intuitions. i'll list some knots that i have had.

starting at this image: https://en.wikipedia.org/wiki/File:Singular-Value-Decomposition.svg

  • if A is an invertible matrix, then for some elementary matrices . Dilations and swapping elementary matrices obviously involve only orthogonal operations. So we can write A as an alternating product of orthogonal and shear matrices (the product of two orthogonal matrices is again orthogonal. right???). If we can prove SVD for shears, we can convert this to an alternating product of orthogonal and diagonal matrices. unfortunately, this doesn't seem to lead to a full proof of SVD (unless orthogonal and diagonal matrices somehow commute).
  • one question one might have is, to get the behavior of M in the linked image, can't we just squish along the standard basis directions, then rotate? surely this would produce the same ellipse. And it would seem that we've only required one rotation, instead of the two in SVD. That's true, but pay attention to where the basis vectors went. A squish followed by a rotation... would preserve orthogonality. But in M it is clear that these basis vectors are no longer orthogonal. So even though we have faithfully preserved the ellipse, we don't have the same transformation. i.e. need not imply , apparently.
  • (polar decomposition.) In the linked image, look at the axes of the final ellipse, labeled and . Call those vectors and . So and for some vectors v1 and v2. Now, backtrack along the arrows, starting from the final image, going through U, then Sigma, then V*. Pay attention to what it does to u1 and u2. In each step, the vectors remain orthogonal. So not only are u1 and u2 orthogonal, we must have that v1 and v2 are orthogonal. So now, couldn't we say, "take v1 and v2, squish along those axes. then rotate." That seems to have required only one rotation. What's going on? The problem is that a diagonal matrix can only stretch along the standard basis. So "stretch along v1 and v2" can't be done via a diagonal matrix (unless v1 and v2 are the standard basis, of course). Let's say where R is a rotation, and D is "stretch along v1 and v2". So and . Now, D is not a diagonal matrix, when viewed in the standard basis, but it is a diagonal matrix when viewed under the basis v1,v2. To get to the standard basis, we need to convert the incoming vectors like v1 into e1, then apply the stretching, then reconvert back to v1. In other words, we want to show for some diagonal matrix Sigma and orthogonal U. Just take , i.e. the matrix U is the matrix with columns [v1 v2]. Now, just like we wanted. So . Of course, RU is another orthogonal matrix, so we recover SVD.
    • also, if any of the entries of are negative, we could have chosen a different v vector that would have made it positive, so we can assume that D is a positive operator (positive semi-definite).
  • another question is: if we squish along the right orthogonal directions, can't we get away with not needing the extra rotation? after all, an ellipse can be squished into a circle without any rotations. what must be the case, although i can't explain it visually yet, is that if we do just that, then the standard basis vectors (yellow and pink) get mapped to the wrong spots.
  • my old question:
    • what is this that axler keeps talking about? that's the "stretch along well-chosen orthogonal directions" operation that we start out with in polar decomposition.
    • for proof (1), see michael nielsen. basically, the maximal stretching direction has a tangent vector (on the ellipse) that is orthogonal to it, because if it wasn't orthogonal, then we could get an even more stretched out vector. the other piece that's required is that linear maps preserve tangency. i.e. if v(t) is a parametrization of a circle, and M is a matrix, then M(v(t)) traces out an ellipse as t varies. (i'm using t as a parameter even though nielsen uses it as a vector. seriously, who the heck uses t for a vector??) the tangent vector on the circle at v(t) is v'(t). this tangent vector gets mapped to M(v'(t)). the tangent vector at M(v(t)) on the ellipse is . now, by linearity of M and the definition of the derivative, we can basically "pull out" the M and see that . what this means is that if you have a point on the circle and its tangent, then you map both of them under M, then the tangent of the image of the point is the image of the tangent at the point. what this implies is that for our maximal stretch vector, since the tangent on the circle is orthogonal, the image of that tangent is also a tangent at the new place on the ellipse, and we already know that the tangent is orthogonal for the maximal stretch vector.

See also