User:IssaRice/Linear algebra/Singular value decomposition - Revision history

https://machinelearning.subwiki.org/w/index.php?action=history&feed=atom&title=User%3AIssaRice%2FLinear_algebra%2FSingular_value_decomposition User:IssaRice/Linear algebra/Singular value decomposition - Revision history 2026-04-25T19:56:57Z Revision history for this page on the wiki MediaWiki 1.41.2 https://machinelearning.subwiki.org/w/index.php?title=User:IssaRice/Linear_algebra/Singular_value_decomposition&diff=3552&oldid=prev IssaRice at 21:10, 1 October 2022 2022-10-01T21:10:52Z

<p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 21:10, 1 October 2022</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l4">Line 4:</td> <td colspan="2" class="diff-lineno">Line 4:</td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* if A is an invertible matrix, then <math>A = E_1 \cdots E_m</math> for some elementary matrices <math>E_1,\ldots,E_m</math>. Dilations and swapping elementary matrices obviously involve only orthogonal operations. So we can write A as an alternating product of orthogonal and shear matrices (the product of two orthogonal matrices is again orthogonal. right???). If we can prove SVD for shears, we can convert this to an alternating product of orthogonal and ''diagonal'' matrices. unfortunately, this doesn't seem to lead to a full proof of SVD (unless orthogonal and diagonal matrices somehow commute).</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* if A is an invertible matrix, then <math>A = E_1 \cdots E_m</math> for some elementary matrices <math>E_1,\ldots,E_m</math>. Dilations and swapping elementary matrices obviously involve only orthogonal operations. So we can write A as an alternating product of orthogonal and shear matrices (the product of two orthogonal matrices is again orthogonal. right???). If we can prove SVD for shears, we can convert this to an alternating product of orthogonal and ''diagonal'' matrices. unfortunately, this doesn't seem to lead to a full proof of SVD (unless orthogonal and diagonal matrices somehow commute).</div></td></tr> <tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* one question one might have is, to get the behavior of M in the linked image, can't we just squish along the standard basis directions, then rotate? surely this would produce the same ellipse. And it would seem that we've only required one rotation, instead of the two in SVD. That's true, but pay attention to where the basis vectors went. A squish followed by a rotation... would preserve orthogonality. But in M it is clear that these basis vectors are no longer orthogonal. So even though we have faithfully preserved the ellipse, we don't have the same transformation. i.e. <math>M(\{v : \|v\| = 1\}) = M'(\{v : \|v\|=1\})</math> need not imply <math>M=M'</math>, apparently. This must be an artifact of the fact that a circle is an extremely symmetric shape, so lots of non-identical transformations can still produce the same image of a circle. I think if we started out with a square, we would not have the same image if we just instead stretched and then did a rotation.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* one question one might have is, to get the behavior of M in the linked image, can't we just squish along the standard basis directions, then rotate? surely this would produce the same ellipse. And it would seem that we've only required one rotation, instead of the two in SVD. That's true, but pay attention to where the basis vectors went. A squish followed by a rotation... would preserve orthogonality. But in M it is clear that these basis vectors are no longer orthogonal. So even though we have faithfully preserved the ellipse, we don't have the same transformation. i.e. <math>M(\{v : \|v\| = 1\}) = M'(\{v : \|v\|=1\})</math> need not imply <math>M=M'</math>, apparently. This must be an artifact of the fact that a circle is an extremely symmetric shape, so lots of non-identical transformations can still produce the same image of a circle. I think if we started out with a square, we would not have the same image if we just instead stretched and then did a rotation <ins style="font-weight: bold; text-decoration: none;">(actually, maybe a square too is still too symmetric; see example [https://youtu.be/vSczTbgc8Rc?list=PLnQX-jgAF5pTZXPiD8ciEARRylD9brJXU&t=679 here])</ins>.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* (polar decomposition.) In the linked image, look at the axes of the final ellipse, labeled <math>\sigma_1</math> and <math>\sigma_2</math>. Call those vectors <math>u_1</math> and <math>u_2</math>. So <math>u_1 = Mv_1</math> and <math>u_2 = Mv_2</math> for some vectors v1 and v2. Now, backtrack along the arrows, starting from the final image, going through U, then Sigma, then V*. Pay attention to what it does to u1 and u2. In each step, the vectors remain orthogonal. So not only are u1 and u2 orthogonal, we must have that v1 and v2 are orthogonal. So now, couldn't we say, "take v1 and v2, squish along those axes. then rotate." That seems to have required only one rotation. What's going on? The problem is that a diagonal matrix can only stretch along the standard basis. So "stretch along v1 and v2" can't be done via a diagonal matrix (unless v1 and v2 are the standard basis, of course). Let's say <math>M = RD</math> where R is a rotation, and D is "stretch along v1 and v2". So <math>Dv_1 = \lambda_1 v_1</math> and <math>Dv_2 = \lambda_2 v_2</math>. Now, D is not a diagonal matrix, when viewed in the standard basis, but it ''is'' a diagonal matrix when viewed under the basis v1,v2. To get to the standard basis, we need to convert the incoming vectors like v1 into e1, then apply the stretching, then reconvert back to v1. In other words, we want to show <math>D = U\Sigma U^*</math> for some diagonal matrix Sigma and orthogonal U. Just take <math>Ue_j=v_j</math>, i.e. the matrix U is the matrix with columns [v1 v2]. Now, <math>Dv_j = U\Sigma U^* v_j = U\Sigma e_j = U\lambda_j e_j = \lambda_j v_j</math> just like we wanted. So <math>M = RD = RU\Sigma U^*</math>. Of course, RU is another orthogonal matrix, so we recover SVD.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* (polar decomposition.) In the linked image, look at the axes of the final ellipse, labeled <math>\sigma_1</math> and <math>\sigma_2</math>. Call those vectors <math>u_1</math> and <math>u_2</math>. So <math>u_1 = Mv_1</math> and <math>u_2 = Mv_2</math> for some vectors v1 and v2. Now, backtrack along the arrows, starting from the final image, going through U, then Sigma, then V*. Pay attention to what it does to u1 and u2. In each step, the vectors remain orthogonal. So not only are u1 and u2 orthogonal, we must have that v1 and v2 are orthogonal. So now, couldn't we say, "take v1 and v2, squish along those axes. then rotate." That seems to have required only one rotation. What's going on? The problem is that a diagonal matrix can only stretch along the standard basis. So "stretch along v1 and v2" can't be done via a diagonal matrix (unless v1 and v2 are the standard basis, of course). Let's say <math>M = RD</math> where R is a rotation, and D is "stretch along v1 and v2". So <math>Dv_1 = \lambda_1 v_1</math> and <math>Dv_2 = \lambda_2 v_2</math>. Now, D is not a diagonal matrix, when viewed in the standard basis, but it ''is'' a diagonal matrix when viewed under the basis v1,v2. To get to the standard basis, we need to convert the incoming vectors like v1 into e1, then apply the stretching, then reconvert back to v1. In other words, we want to show <math>D = U\Sigma U^*</math> for some diagonal matrix Sigma and orthogonal U. Just take <math>Ue_j=v_j</math>, i.e. the matrix U is the matrix with columns [v1 v2]. Now, <math>Dv_j = U\Sigma U^* v_j = U\Sigma e_j = U\lambda_j e_j = \lambda_j v_j</math> just like we wanted. So <math>M = RD = RU\Sigma U^*</math>. Of course, RU is another orthogonal matrix, so we recover SVD.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** also, if any of the entries of <math>\Sigma</math> are negative, we could have chosen a different v vector that would have made it positive, so we can assume that D is a positive operator (positive semi-definite).</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** also, if any of the entries of <math>\Sigma</math> are negative, we could have chosen a different v vector that would have made it positive, so we can assume that D is a positive operator (positive semi-definite).</div></td></tr> </table>

IssaRice https://machinelearning.subwiki.org/w/index.php?title=User:IssaRice/Linear_algebra/Singular_value_decomposition&diff=3551&oldid=prev IssaRice at 21:08, 1 October 2022 2022-10-01T21:08:40Z

<p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 21:08, 1 October 2022</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l4">Line 4:</td> <td colspan="2" class="diff-lineno">Line 4:</td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* if A is an invertible matrix, then <math>A = E_1 \cdots E_m</math> for some elementary matrices <math>E_1,\ldots,E_m</math>. Dilations and swapping elementary matrices obviously involve only orthogonal operations. So we can write A as an alternating product of orthogonal and shear matrices (the product of two orthogonal matrices is again orthogonal. right???). If we can prove SVD for shears, we can convert this to an alternating product of orthogonal and ''diagonal'' matrices. unfortunately, this doesn't seem to lead to a full proof of SVD (unless orthogonal and diagonal matrices somehow commute).</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* if A is an invertible matrix, then <math>A = E_1 \cdots E_m</math> for some elementary matrices <math>E_1,\ldots,E_m</math>. Dilations and swapping elementary matrices obviously involve only orthogonal operations. So we can write A as an alternating product of orthogonal and shear matrices (the product of two orthogonal matrices is again orthogonal. right???). If we can prove SVD for shears, we can convert this to an alternating product of orthogonal and ''diagonal'' matrices. unfortunately, this doesn't seem to lead to a full proof of SVD (unless orthogonal and diagonal matrices somehow commute).</div></td></tr> <tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* one question one might have is, to get the behavior of M in the linked image, can't we just squish along the standard basis directions, then rotate? surely this would produce the same ellipse. And it would seem that we've only required one rotation, instead of the two in SVD. That's true, but pay attention to where the basis vectors went. A squish followed by a rotation... would preserve orthogonality. But in M it is clear that these basis vectors are no longer orthogonal. So even though we have faithfully preserved the ellipse, we don't have the same transformation. i.e. <math>M(\{v : \|v\| = 1\}) = M'(\{v : \|v\|=1\})</math> need not imply <math>M=M'</math>, apparently.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* one question one might have is, to get the behavior of M in the linked image, can't we just squish along the standard basis directions, then rotate? surely this would produce the same ellipse. And it would seem that we've only required one rotation, instead of the two in SVD. That's true, but pay attention to where the basis vectors went. A squish followed by a rotation... would preserve orthogonality. But in M it is clear that these basis vectors are no longer orthogonal. So even though we have faithfully preserved the ellipse, we don't have the same transformation. i.e. <math>M(\{v : \|v\| = 1\}) = M'(\{v : \|v\|=1\})</math> need not imply <math>M=M'</math>, apparently<ins style="font-weight: bold; text-decoration: none;">. This must be an artifact of the fact that a circle is an extremely symmetric shape, so lots of non-identical transformations can still produce the same image of a circle. I think if we started out with a square, we would not have the same image if we just instead stretched and then did a rotation</ins>.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* (polar decomposition.) In the linked image, look at the axes of the final ellipse, labeled <math>\sigma_1</math> and <math>\sigma_2</math>. Call those vectors <math>u_1</math> and <math>u_2</math>. So <math>u_1 = Mv_1</math> and <math>u_2 = Mv_2</math> for some vectors v1 and v2. Now, backtrack along the arrows, starting from the final image, going through U, then Sigma, then V*. Pay attention to what it does to u1 and u2. In each step, the vectors remain orthogonal. So not only are u1 and u2 orthogonal, we must have that v1 and v2 are orthogonal. So now, couldn't we say, "take v1 and v2, squish along those axes. then rotate." That seems to have required only one rotation. What's going on? The problem is that a diagonal matrix can only stretch along the standard basis. So "stretch along v1 and v2" can't be done via a diagonal matrix (unless v1 and v2 are the standard basis, of course). Let's say <math>M = RD</math> where R is a rotation, and D is "stretch along v1 and v2". So <math>Dv_1 = \lambda_1 v_1</math> and <math>Dv_2 = \lambda_2 v_2</math>. Now, D is not a diagonal matrix, when viewed in the standard basis, but it ''is'' a diagonal matrix when viewed under the basis v1,v2. To get to the standard basis, we need to convert the incoming vectors like v1 into e1, then apply the stretching, then reconvert back to v1. In other words, we want to show <math>D = U\Sigma U^*</math> for some diagonal matrix Sigma and orthogonal U. Just take <math>Ue_j=v_j</math>, i.e. the matrix U is the matrix with columns [v1 v2]. Now, <math>Dv_j = U\Sigma U^* v_j = U\Sigma e_j = U\lambda_j e_j = \lambda_j v_j</math> just like we wanted. So <math>M = RD = RU\Sigma U^*</math>. Of course, RU is another orthogonal matrix, so we recover SVD.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* (polar decomposition.) In the linked image, look at the axes of the final ellipse, labeled <math>\sigma_1</math> and <math>\sigma_2</math>. Call those vectors <math>u_1</math> and <math>u_2</math>. So <math>u_1 = Mv_1</math> and <math>u_2 = Mv_2</math> for some vectors v1 and v2. Now, backtrack along the arrows, starting from the final image, going through U, then Sigma, then V*. Pay attention to what it does to u1 and u2. In each step, the vectors remain orthogonal. So not only are u1 and u2 orthogonal, we must have that v1 and v2 are orthogonal. So now, couldn't we say, "take v1 and v2, squish along those axes. then rotate." That seems to have required only one rotation. What's going on? The problem is that a diagonal matrix can only stretch along the standard basis. So "stretch along v1 and v2" can't be done via a diagonal matrix (unless v1 and v2 are the standard basis, of course). Let's say <math>M = RD</math> where R is a rotation, and D is "stretch along v1 and v2". So <math>Dv_1 = \lambda_1 v_1</math> and <math>Dv_2 = \lambda_2 v_2</math>. Now, D is not a diagonal matrix, when viewed in the standard basis, but it ''is'' a diagonal matrix when viewed under the basis v1,v2. To get to the standard basis, we need to convert the incoming vectors like v1 into e1, then apply the stretching, then reconvert back to v1. In other words, we want to show <math>D = U\Sigma U^*</math> for some diagonal matrix Sigma and orthogonal U. Just take <math>Ue_j=v_j</math>, i.e. the matrix U is the matrix with columns [v1 v2]. Now, <math>Dv_j = U\Sigma U^* v_j = U\Sigma e_j = U\lambda_j e_j = \lambda_j v_j</math> just like we wanted. So <math>M = RD = RU\Sigma U^*</math>. Of course, RU is another orthogonal matrix, so we recover SVD.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** also, if any of the entries of <math>\Sigma</math> are negative, we could have chosen a different v vector that would have made it positive, so we can assume that D is a positive operator (positive semi-definite).</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** also, if any of the entries of <math>\Sigma</math> are negative, we could have chosen a different v vector that would have made it positive, so we can assume that D is a positive operator (positive semi-definite).</div></td></tr> </table>

IssaRice https://machinelearning.subwiki.org/w/index.php?title=User:IssaRice/Linear_algebra/Singular_value_decomposition&diff=2728&oldid=prev IssaRice at 08:09, 17 January 2020 2020-01-17T08:09:57Z

<p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 08:09, 17 January 2020</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l10">Line 10:</td> <td colspan="2" class="diff-lineno">Line 10:</td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* [https://math.stackexchange.com/questions/2899052/singular-value-decomposition-reconciling-the-maximal-stretching-and-spectral my old question]:</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* [https://math.stackexchange.com/questions/2899052/singular-value-decomposition-reconciling-the-maximal-stretching-and-spectral my old question]:</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** what is this <math>\sqrt{T^*T}</math> that axler keeps talking about? that's the "stretch along well-chosen orthogonal directions" operation that we start out with in polar decomposition.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** what is this <math>\sqrt{T^*T}</math> that axler keeps talking about? that's the "stretch along well-chosen orthogonal directions" operation that we start out with in polar decomposition.</div></td></tr> <tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>** for proof (1), see [http://cognitivemedium.com/emm/emm.html michael nielsen]. basically, the maximal stretching direction has a tangent vector (on the ellipse) that is orthogonal to it, because if it ''wasn't'' orthogonal, then we could get an even more stretched out vector. the other piece that's required is that linear maps preserve tangency. i.e. if v(t) is a parametrization of a circle, and M is a matrix, then M(v(t)) traces out an ellipse as t varies. (i'm using t as a parameter even though nielsen uses it as a vector. seriously, who the heck uses t for a vector??) the tangent vector on the circle at v(t) is v'(t). this tangent vector gets mapped to M(v'(t)). the tangent vector at M(v(t)) on the ellipse is <math display=inline>\frac{d}{dt} M(v(t))</math>. now, by linearity of M and the definition of the derivative, we can basically "pull out" the M and see that <math display=inline>\frac{d}{dt} M(v(t)) = M(\frac{d}{dt} v(t))</math>.<ref group=note><math>\frac{d}{dt} M(v(t)) = \lim_{h\to0} \frac{Mv(t+h) - Mv(t)}{h} = \lim_{h\to0} \frac1h M(v(t+h)-v(t)) = \lim_{h\to0}M(\frac1h (v(t+h)-v(t))) = M(\frac{d}{dt} v(t))</math></div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>** for proof (1), see [http://cognitivemedium.com/emm/emm.html michael nielsen]. basically, the maximal stretching direction has a tangent vector (on the ellipse) that is orthogonal to it, because if it ''wasn't'' orthogonal, then we could get an even more stretched out vector. the other piece that's required is that linear maps preserve tangency. i.e. if v(t) is a parametrization of a circle, and M is a matrix, then M(v(t)) traces out an ellipse as t varies. (i'm using t as a parameter even though nielsen uses it as a vector. seriously, who the heck uses t for a vector??) the tangent vector on the circle at v(t) is v'(t). this tangent vector gets mapped to M(v'(t)). the tangent vector at M(v(t)) on the ellipse is <math display=inline>\frac{d}{dt} M(v(t))</math>. now, by linearity of M and the definition of the derivative, we can basically "pull out" the M and see that <math display=inline>\frac{d}{dt} M(v(t)) = M(\frac{d}{dt} v(t))</math>.<ref group=note><math>\frac{d}{dt} M(v(t)) = \lim_{h\to0} \frac{Mv(t+h) - Mv(t)}{h} = \lim_{h\to0} \frac1h M(v(t+h)-v(t)) = \lim_{h\to0}M(\frac1h (v(t+h)-v(t))) = M(\frac{d}{dt} v(t))</math><ins style="font-weight: bold; text-decoration: none;">. You </ins>might also want to play around with an example like <math>\begin{pmatrix}3 & 0\\ <ins style="font-weight: bold; text-decoration: none;">0 & </ins>4\end{pmatrix}</math>, which takes (cos t, sin t) to (3cos t, 4sin t). The tangent at the original point is (-sin t, cos t). The tangent at the image is (-3sin t, 4cos t), which is equal to the image of the tangent.</ref> what this means is that if you have a point on the circle and its tangent, then you map both of them under M, then the tangent of the image of the point is the image of the tangent at the point.<ref group=note> I think another way to see this is is via uniqueness of taylor approximations? like if v is a point on the circle, and u is the tangent vector at v, then points near v can be written as <math>v + \Delta u + O(\Delta^2)</math>, and if we apply M to those points, we get <math>Mv + \Delta Mu + O(\Delta^2)</math>. if taylor approximations are unique, then the fact that the term linear in <math>\Delta</math> has Mu means that Mu must be tangent at Mv.</ref> what this implies is that for our maximal stretch vector, since the tangent on the circle is orthogonal, the image of that tangent is also a tangent at the new place on the ellipse, and we already know that the tangent is orthogonal for the maximal stretch vector.</div></td></tr> <tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td colspan="2" class="diff-side-added"></td></tr> <tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;">you </del>might also want to play around with an example like <math>\begin{pmatrix}3 & 0\\4<del style="font-weight: bold; text-decoration: none;">&0</del>\end{pmatrix}</math>, which takes (cos t, sin t) to (3cos t, 4sin t). The tangent at the original point is (-sin t, cos t). The tangent at the image is (-3sin t, 4cos t), which is equal to the image of the tangent.</ref> what this means is that if you have a point on the circle and its tangent, then you map both of them under M, then the tangent of the image of the point is the image of the tangent at the point.<ref group=note> I think another way to see this is is via uniqueness of taylor approximations? like if v is a point on the circle, and u is the tangent vector at v, then points near v can be written as <math>v + \Delta u + O(\Delta^2)</math>, and if we apply M to those points, we get <math>Mv + \Delta Mu + O(\Delta^2)</math>. if taylor approximations are unique, then the fact that the term linear in <math>\Delta</math> has Mu means that Mu must be tangent at Mv.</ref> what this implies is that for our maximal stretch vector, since the tangent on the circle is orthogonal, the image of that tangent is also a tangent at the new place on the ellipse, and we already know that the tangent is orthogonal for the maximal stretch vector.</div></td><td colspan="2" class="diff-side-added"></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** so how does (2) find the same basis without talking about "maximal stretching"? well, in (2), <math>\sqrt{T^*T}</math> ''means'' "stretch along well-chosen orthogonal directions" -- it's the positive operator that appears in polar decomposition. and if we stretch along orthogonal directions, then surely one of them has to be the maximal stretching direction (rather than, say, some direction intermediate between two of the axes).</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** so how does (2) find the same basis without talking about "maximal stretching"? well, in (2), <math>\sqrt{T^*T}</math> ''means'' "stretch along well-chosen orthogonal directions" -- it's the positive operator that appears in polar decomposition. and if we stretch along orthogonal directions, then surely one of them has to be the maximal stretching direction (rather than, say, some direction intermediate between two of the axes).</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> </table>

IssaRice https://machinelearning.subwiki.org/w/index.php?title=User:IssaRice/Linear_algebra/Singular_value_decomposition&diff=2727&oldid=prev IssaRice at 08:09, 17 January 2020 2020-01-17T08:09:12Z

<p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 08:09, 17 January 2020</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l10">Line 10:</td> <td colspan="2" class="diff-lineno">Line 10:</td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* [https://math.stackexchange.com/questions/2899052/singular-value-decomposition-reconciling-the-maximal-stretching-and-spectral my old question]:</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* [https://math.stackexchange.com/questions/2899052/singular-value-decomposition-reconciling-the-maximal-stretching-and-spectral my old question]:</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** what is this <math>\sqrt{T^*T}</math> that axler keeps talking about? that's the "stretch along well-chosen orthogonal directions" operation that we start out with in polar decomposition.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** what is this <math>\sqrt{T^*T}</math> that axler keeps talking about? that's the "stretch along well-chosen orthogonal directions" operation that we start out with in polar decomposition.</div></td></tr> <tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>** for proof (1), see [http://cognitivemedium.com/emm/emm.html michael nielsen]. basically, the maximal stretching direction has a tangent vector (on the ellipse) that is orthogonal to it, because if it ''wasn't'' orthogonal, then we could get an even more stretched out vector. the other piece that's required is that linear maps preserve tangency. i.e. if v(t) is a parametrization of a circle, and M is a matrix, then M(v(t)) traces out an ellipse as t varies. (i'm using t as a parameter even though nielsen uses it as a vector. seriously, who the heck uses t for a vector??) the tangent vector on the circle at v(t) is v'(t). this tangent vector gets mapped to M(v'(t)). the tangent vector at M(v(t)) on the ellipse is <math display=inline>\frac{d}{dt} M(v(t))</math>. now, by linearity of M and the definition of the derivative, we can basically "pull out" the M and see that <math display=inline>\frac{d}{dt} M(v(t)) = M(\frac{d}{dt} v(t))</math>.<ref group=note><math>\frac{d}{dt} M(v(t)) = \lim_{h\to0} \frac{Mv(t+h) - Mv(t)}{h} = \lim_{h\to0} \frac1h M(v(t+h)-v(t)) = \lim_{h\to0}M(\frac1h (v(t+h)-v(t))) = M(\frac{d}{dt} v(t))</math></ref> what this means is that if you have a point on the circle and its tangent, then you map both of them under M, then the tangent of the image of the point is the image of the tangent at the point.<ref group=note> I think another way to see this is is via uniqueness of taylor approximations? like if v is a point on the circle, and u is the tangent vector at v, then points near v can be written as <math>v + \Delta u + O(\Delta^2)</math>, and if we apply M to those points, we get <math>Mv + \Delta Mu + O(\Delta^2)</math>. if taylor approximations are unique, then the fact that the term linear in <math>\Delta</math> has Mu means that Mu must be tangent at Mv.</ref> what this implies is that for our maximal stretch vector, since the tangent on the circle is orthogonal, the image of that tangent is also a tangent at the new place on the ellipse, and we already know that the tangent is orthogonal for the maximal stretch vector.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>** for proof (1), see [http://cognitivemedium.com/emm/emm.html michael nielsen]. basically, the maximal stretching direction has a tangent vector (on the ellipse) that is orthogonal to it, because if it ''wasn't'' orthogonal, then we could get an even more stretched out vector. the other piece that's required is that linear maps preserve tangency. i.e. if v(t) is a parametrization of a circle, and M is a matrix, then M(v(t)) traces out an ellipse as t varies. (i'm using t as a parameter even though nielsen uses it as a vector. seriously, who the heck uses t for a vector??) the tangent vector on the circle at v(t) is v'(t). this tangent vector gets mapped to M(v'(t)). the tangent vector at M(v(t)) on the ellipse is <math display=inline>\frac{d}{dt} M(v(t))</math>. now, by linearity of M and the definition of the derivative, we can basically "pull out" the M and see that <math display=inline>\frac{d}{dt} M(v(t)) = M(\frac{d}{dt} v(t))</math>.<ref group=note><math>\frac{d}{dt} M(v(t)) = \lim_{h\to0} \frac{Mv(t+h) - Mv(t)}{h} = \lim_{h\to0} \frac1h M(v(t+h)-v(t)) = \lim_{h\to0}M(\frac1h (v(t+h)-v(t))) = M(\frac{d}{dt} v(t))</math></div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">you might also want to play around with an example like <math>\begin{pmatrix}3 & 0\\4&0\end{pmatrix}</math>, which takes (cos t, sin t) to (3cos t, 4sin t). The tangent at the original point is (-sin t, cos t). The tangent at the image is (-3sin t, 4cos t), which is equal to the image of the tangent.</ins></ref> what this means is that if you have a point on the circle and its tangent, then you map both of them under M, then the tangent of the image of the point is the image of the tangent at the point.<ref group=note> I think another way to see this is is via uniqueness of taylor approximations? like if v is a point on the circle, and u is the tangent vector at v, then points near v can be written as <math>v + \Delta u + O(\Delta^2)</math>, and if we apply M to those points, we get <math>Mv + \Delta Mu + O(\Delta^2)</math>. if taylor approximations are unique, then the fact that the term linear in <math>\Delta</math> has Mu means that Mu must be tangent at Mv.</ref> what this implies is that for our maximal stretch vector, since the tangent on the circle is orthogonal, the image of that tangent is also a tangent at the new place on the ellipse, and we already know that the tangent is orthogonal for the maximal stretch vector.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** so how does (2) find the same basis without talking about "maximal stretching"? well, in (2), <math>\sqrt{T^*T}</math> ''means'' "stretch along well-chosen orthogonal directions" -- it's the positive operator that appears in polar decomposition. and if we stretch along orthogonal directions, then surely one of them has to be the maximal stretching direction (rather than, say, some direction intermediate between two of the axes).</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** so how does (2) find the same basis without talking about "maximal stretching"? well, in (2), <math>\sqrt{T^*T}</math> ''means'' "stretch along well-chosen orthogonal directions" -- it's the positive operator that appears in polar decomposition. and if we stretch along orthogonal directions, then surely one of them has to be the maximal stretching direction (rather than, say, some direction intermediate between two of the axes).</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> </table>

IssaRice https://machinelearning.subwiki.org/w/index.php?title=User:IssaRice/Linear_algebra/Singular_value_decomposition&diff=2726&oldid=prev IssaRice at 07:21, 17 January 2020 2020-01-17T07:21:20Z

<p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 07:21, 17 January 2020</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l7">Line 7:</td> <td colspan="2" class="diff-lineno">Line 7:</td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* (polar decomposition.) In the linked image, look at the axes of the final ellipse, labeled <math>\sigma_1</math> and <math>\sigma_2</math>. Call those vectors <math>u_1</math> and <math>u_2</math>. So <math>u_1 = Mv_1</math> and <math>u_2 = Mv_2</math> for some vectors v1 and v2. Now, backtrack along the arrows, starting from the final image, going through U, then Sigma, then V*. Pay attention to what it does to u1 and u2. In each step, the vectors remain orthogonal. So not only are u1 and u2 orthogonal, we must have that v1 and v2 are orthogonal. So now, couldn't we say, "take v1 and v2, squish along those axes. then rotate." That seems to have required only one rotation. What's going on? The problem is that a diagonal matrix can only stretch along the standard basis. So "stretch along v1 and v2" can't be done via a diagonal matrix (unless v1 and v2 are the standard basis, of course). Let's say <math>M = RD</math> where R is a rotation, and D is "stretch along v1 and v2". So <math>Dv_1 = \lambda_1 v_1</math> and <math>Dv_2 = \lambda_2 v_2</math>. Now, D is not a diagonal matrix, when viewed in the standard basis, but it ''is'' a diagonal matrix when viewed under the basis v1,v2. To get to the standard basis, we need to convert the incoming vectors like v1 into e1, then apply the stretching, then reconvert back to v1. In other words, we want to show <math>D = U\Sigma U^*</math> for some diagonal matrix Sigma and orthogonal U. Just take <math>Ue_j=v_j</math>, i.e. the matrix U is the matrix with columns [v1 v2]. Now, <math>Dv_j = U\Sigma U^* v_j = U\Sigma e_j = U\lambda_j e_j = \lambda_j v_j</math> just like we wanted. So <math>M = RD = RU\Sigma U^*</math>. Of course, RU is another orthogonal matrix, so we recover SVD.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* (polar decomposition.) In the linked image, look at the axes of the final ellipse, labeled <math>\sigma_1</math> and <math>\sigma_2</math>. Call those vectors <math>u_1</math> and <math>u_2</math>. So <math>u_1 = Mv_1</math> and <math>u_2 = Mv_2</math> for some vectors v1 and v2. Now, backtrack along the arrows, starting from the final image, going through U, then Sigma, then V*. Pay attention to what it does to u1 and u2. In each step, the vectors remain orthogonal. So not only are u1 and u2 orthogonal, we must have that v1 and v2 are orthogonal. So now, couldn't we say, "take v1 and v2, squish along those axes. then rotate." That seems to have required only one rotation. What's going on? The problem is that a diagonal matrix can only stretch along the standard basis. So "stretch along v1 and v2" can't be done via a diagonal matrix (unless v1 and v2 are the standard basis, of course). Let's say <math>M = RD</math> where R is a rotation, and D is "stretch along v1 and v2". So <math>Dv_1 = \lambda_1 v_1</math> and <math>Dv_2 = \lambda_2 v_2</math>. Now, D is not a diagonal matrix, when viewed in the standard basis, but it ''is'' a diagonal matrix when viewed under the basis v1,v2. To get to the standard basis, we need to convert the incoming vectors like v1 into e1, then apply the stretching, then reconvert back to v1. In other words, we want to show <math>D = U\Sigma U^*</math> for some diagonal matrix Sigma and orthogonal U. Just take <math>Ue_j=v_j</math>, i.e. the matrix U is the matrix with columns [v1 v2]. Now, <math>Dv_j = U\Sigma U^* v_j = U\Sigma e_j = U\lambda_j e_j = \lambda_j v_j</math> just like we wanted. So <math>M = RD = RU\Sigma U^*</math>. Of course, RU is another orthogonal matrix, so we recover SVD.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** also, if any of the entries of <math>\Sigma</math> are negative, we could have chosen a different v vector that would have made it positive, so we can assume that D is a positive operator (positive semi-definite).</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** also, if any of the entries of <math>\Sigma</math> are negative, we could have chosen a different v vector that would have made it positive, so we can assume that D is a positive operator (positive semi-definite).</div></td></tr> <tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* another question is: if we squish along the right orthogonal directions, can't we get away with not needing the extra rotation? after all, an ellipse can be squished into a circle without any rotations. what must be the case, although i can't explain it visually yet, is that if we do just that, then the standard basis vectors (yellow and pink) get mapped to the wrong spots.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* another question is: if we squish along the right orthogonal directions, can't we get away with not needing the extra rotation? after all, an ellipse can be squished into a circle without any rotations. what must be the case, although i can't explain it visually yet, is that if we do just that, then the standard basis vectors (yellow and pink) get mapped to the wrong spots<ins style="font-weight: bold; text-decoration: none;">. this ''might'' be an artifact of shears (the wikipedia SVD image is a shear). clearer to look at michael nielsen's [http://cognitivemedium.com/emm/images/tangent_definition.png image]. here, if we start with the ellipse and shrink along Ms and stretch along Mt, we do get a circle. but Ms doesn't go back to s, and Mt doesn't go back to t; for that we'll need the extra rotation</ins>.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* [https://math.stackexchange.com/questions/2899052/singular-value-decomposition-reconciling-the-maximal-stretching-and-spectral my old question]:</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* [https://math.stackexchange.com/questions/2899052/singular-value-decomposition-reconciling-the-maximal-stretching-and-spectral my old question]:</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** what is this <math>\sqrt{T^*T}</math> that axler keeps talking about? that's the "stretch along well-chosen orthogonal directions" operation that we start out with in polar decomposition.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** what is this <math>\sqrt{T^*T}</math> that axler keeps talking about? that's the "stretch along well-chosen orthogonal directions" operation that we start out with in polar decomposition.</div></td></tr> </table>

IssaRice https://machinelearning.subwiki.org/w/index.php?title=User:IssaRice/Linear_algebra/Singular_value_decomposition&diff=2725&oldid=prev IssaRice at 07:15, 17 January 2020 2020-01-17T07:15:36Z

<p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 07:15, 17 January 2020</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l11">Line 11:</td> <td colspan="2" class="diff-lineno">Line 11:</td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** what is this <math>\sqrt{T^*T}</math> that axler keeps talking about? that's the "stretch along well-chosen orthogonal directions" operation that we start out with in polar decomposition.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** what is this <math>\sqrt{T^*T}</math> that axler keeps talking about? that's the "stretch along well-chosen orthogonal directions" operation that we start out with in polar decomposition.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** for proof (1), see [http://cognitivemedium.com/emm/emm.html michael nielsen]. basically, the maximal stretching direction has a tangent vector (on the ellipse) that is orthogonal to it, because if it ''wasn't'' orthogonal, then we could get an even more stretched out vector. the other piece that's required is that linear maps preserve tangency. i.e. if v(t) is a parametrization of a circle, and M is a matrix, then M(v(t)) traces out an ellipse as t varies. (i'm using t as a parameter even though nielsen uses it as a vector. seriously, who the heck uses t for a vector??) the tangent vector on the circle at v(t) is v'(t). this tangent vector gets mapped to M(v'(t)). the tangent vector at M(v(t)) on the ellipse is <math display=inline>\frac{d}{dt} M(v(t))</math>. now, by linearity of M and the definition of the derivative, we can basically "pull out" the M and see that <math display=inline>\frac{d}{dt} M(v(t)) = M(\frac{d}{dt} v(t))</math>.<ref group=note><math>\frac{d}{dt} M(v(t)) = \lim_{h\to0} \frac{Mv(t+h) - Mv(t)}{h} = \lim_{h\to0} \frac1h M(v(t+h)-v(t)) = \lim_{h\to0}M(\frac1h (v(t+h)-v(t))) = M(\frac{d}{dt} v(t))</math></ref> what this means is that if you have a point on the circle and its tangent, then you map both of them under M, then the tangent of the image of the point is the image of the tangent at the point.<ref group=note> I think another way to see this is is via uniqueness of taylor approximations? like if v is a point on the circle, and u is the tangent vector at v, then points near v can be written as <math>v + \Delta u + O(\Delta^2)</math>, and if we apply M to those points, we get <math>Mv + \Delta Mu + O(\Delta^2)</math>. if taylor approximations are unique, then the fact that the term linear in <math>\Delta</math> has Mu means that Mu must be tangent at Mv.</ref> what this implies is that for our maximal stretch vector, since the tangent on the circle is orthogonal, the image of that tangent is also a tangent at the new place on the ellipse, and we already know that the tangent is orthogonal for the maximal stretch vector.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** for proof (1), see [http://cognitivemedium.com/emm/emm.html michael nielsen]. basically, the maximal stretching direction has a tangent vector (on the ellipse) that is orthogonal to it, because if it ''wasn't'' orthogonal, then we could get an even more stretched out vector. the other piece that's required is that linear maps preserve tangency. i.e. if v(t) is a parametrization of a circle, and M is a matrix, then M(v(t)) traces out an ellipse as t varies. (i'm using t as a parameter even though nielsen uses it as a vector. seriously, who the heck uses t for a vector??) the tangent vector on the circle at v(t) is v'(t). this tangent vector gets mapped to M(v'(t)). the tangent vector at M(v(t)) on the ellipse is <math display=inline>\frac{d}{dt} M(v(t))</math>. now, by linearity of M and the definition of the derivative, we can basically "pull out" the M and see that <math display=inline>\frac{d}{dt} M(v(t)) = M(\frac{d}{dt} v(t))</math>.<ref group=note><math>\frac{d}{dt} M(v(t)) = \lim_{h\to0} \frac{Mv(t+h) - Mv(t)}{h} = \lim_{h\to0} \frac1h M(v(t+h)-v(t)) = \lim_{h\to0}M(\frac1h (v(t+h)-v(t))) = M(\frac{d}{dt} v(t))</math></ref> what this means is that if you have a point on the circle and its tangent, then you map both of them under M, then the tangent of the image of the point is the image of the tangent at the point.<ref group=note> I think another way to see this is is via uniqueness of taylor approximations? like if v is a point on the circle, and u is the tangent vector at v, then points near v can be written as <math>v + \Delta u + O(\Delta^2)</math>, and if we apply M to those points, we get <math>Mv + \Delta Mu + O(\Delta^2)</math>. if taylor approximations are unique, then the fact that the term linear in <math>\Delta</math> has Mu means that Mu must be tangent at Mv.</ref> what this implies is that for our maximal stretch vector, since the tangent on the circle is orthogonal, the image of that tangent is also a tangent at the new place on the ellipse, and we already know that the tangent is orthogonal for the maximal stretch vector.</div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">** so how does (2) find the same basis without talking about "maximal stretching"? well, in (2), <math>\sqrt{T^*T}</math> ''means'' "stretch along well-chosen orthogonal directions" -- it's the positive operator that appears in polar decomposition. and if we stretch along orthogonal directions, then surely one of them has to be the maximal stretching direction (rather than, say, some direction intermediate between two of the axes).</ins></div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==See also==</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==See also==</div></td></tr> </table>

IssaRice https://machinelearning.subwiki.org/w/index.php?title=User:IssaRice/Linear_algebra/Singular_value_decomposition&diff=2724&oldid=prev IssaRice at 07:09, 17 January 2020 2020-01-17T07:09:01Z

<p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 07:09, 17 January 2020</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l10">Line 10:</td> <td colspan="2" class="diff-lineno">Line 10:</td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* [https://math.stackexchange.com/questions/2899052/singular-value-decomposition-reconciling-the-maximal-stretching-and-spectral my old question]:</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* [https://math.stackexchange.com/questions/2899052/singular-value-decomposition-reconciling-the-maximal-stretching-and-spectral my old question]:</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** what is this <math>\sqrt{T^*T}</math> that axler keeps talking about? that's the "stretch along well-chosen orthogonal directions" operation that we start out with in polar decomposition.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** what is this <math>\sqrt{T^*T}</math> that axler keeps talking about? that's the "stretch along well-chosen orthogonal directions" operation that we start out with in polar decomposition.</div></td></tr> <tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>** for proof (1), see [http://cognitivemedium.com/emm/emm.html michael nielsen]. basically, the maximal stretching direction has a tangent vector (on the ellipse) that is orthogonal to it, because if it ''wasn't'' orthogonal, then we could get an even more stretched out vector. the other piece that's required is that linear maps preserve tangency. i.e. if v(t) is a parametrization of a circle, and M is a matrix, then M(v(t)) traces out an ellipse as t varies. (i'm using t as a parameter even though nielsen uses it as a vector. seriously, who the heck uses t for a vector??) the tangent vector on the circle at v(t) is v'(t). this tangent vector gets mapped to M(v'(t)). the tangent vector at M(v(t)) on the ellipse is <math display=inline>\frac{d}{dt} M(v(t))</math>. now, by linearity of M and the definition of the derivative, we can basically "pull out" the M and see that <math display=inline>\frac{d}{dt} M(v(t)) = M(\frac{d}{dt} v(t))</math>. what this means is that if you have a point on the circle and its tangent, then you map both of them under M, then the tangent of the image of the point is the image of the tangent at the point. what this implies is that for our maximal stretch vector, since the tangent on the circle is orthogonal, the image of that tangent is also a tangent at the new place on the ellipse, and we already know that the tangent is orthogonal for the maximal stretch vector.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>** for proof (1), see [http://cognitivemedium.com/emm/emm.html michael nielsen]. basically, the maximal stretching direction has a tangent vector (on the ellipse) that is orthogonal to it, because if it ''wasn't'' orthogonal, then we could get an even more stretched out vector. the other piece that's required is that linear maps preserve tangency. i.e. if v(t) is a parametrization of a circle, and M is a matrix, then M(v(t)) traces out an ellipse as t varies. (i'm using t as a parameter even though nielsen uses it as a vector. seriously, who the heck uses t for a vector??) the tangent vector on the circle at v(t) is v'(t). this tangent vector gets mapped to M(v'(t)). the tangent vector at M(v(t)) on the ellipse is <math display=inline>\frac{d}{dt} M(v(t))</math>. now, by linearity of M and the definition of the derivative, we can basically "pull out" the M and see that <math display=inline>\frac{d}{dt} M(v(t)) = M(\frac{d}{dt} v(t))</math>.<ins style="font-weight: bold; text-decoration: none;"><ref group=note><math>\frac{d}{dt} M(v(t)) = \lim_{h\to0} \frac{Mv(t+h) - Mv(t)}{h} = \lim_{h\to0} \frac1h M(v(t+h)-v(t)) = \lim_{h\to0}M(\frac1h (v(t+h)-v(t))) = M(\frac{d}{dt} v(t))</math></ref> </ins>what this means is that if you have a point on the circle and its tangent, then you map both of them under M, then the tangent of the image of the point is the image of the tangent at the point.<ins style="font-weight: bold; text-decoration: none;"><ref group=note> I think another way to see this is is via uniqueness of taylor approximations? like if v is a point on the circle, and u is the tangent vector at v, then points near v can be written as <math>v + \Delta u + O(\Delta^2)</math>, and if we apply M to those points, we get <math>Mv + \Delta Mu + O(\Delta^2)</math>. if taylor approximations are unique, then the fact that the term linear in <math>\Delta</math> has Mu means that Mu must be tangent at Mv.</ref> </ins>what this implies is that for our maximal stretch vector, since the tangent on the circle is orthogonal, the image of that tangent is also a tangent at the new place on the ellipse, and we already know that the tangent is orthogonal for the maximal stretch vector.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==See also==</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==See also==</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* https://machinelearning.subwiki.org/wiki/User:IssaRice/Linear_algebra/Classification_of_operators -- performing SVD on some nicer operators allows you to skip some of the steps, resulting in a simpler decomposition.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* https://machinelearning.subwiki.org/wiki/User:IssaRice/Linear_algebra/Classification_of_operators -- performing SVD on some nicer operators allows you to skip some of the steps, resulting in a simpler decomposition.</div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">==Footnotes==</ins></div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"><references group="note"/></ins></div></td></tr> </table>

IssaRice https://machinelearning.subwiki.org/w/index.php?title=User:IssaRice/Linear_algebra/Singular_value_decomposition&diff=2723&oldid=prev IssaRice at 06:57, 17 January 2020 2020-01-17T06:57:20Z

<p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 06:57, 17 January 2020</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l7">Line 7:</td> <td colspan="2" class="diff-lineno">Line 7:</td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* (polar decomposition.) In the linked image, look at the axes of the final ellipse, labeled <math>\sigma_1</math> and <math>\sigma_2</math>. Call those vectors <math>u_1</math> and <math>u_2</math>. So <math>u_1 = Mv_1</math> and <math>u_2 = Mv_2</math> for some vectors v1 and v2. Now, backtrack along the arrows, starting from the final image, going through U, then Sigma, then V*. Pay attention to what it does to u1 and u2. In each step, the vectors remain orthogonal. So not only are u1 and u2 orthogonal, we must have that v1 and v2 are orthogonal. So now, couldn't we say, "take v1 and v2, squish along those axes. then rotate." That seems to have required only one rotation. What's going on? The problem is that a diagonal matrix can only stretch along the standard basis. So "stretch along v1 and v2" can't be done via a diagonal matrix (unless v1 and v2 are the standard basis, of course). Let's say <math>M = RD</math> where R is a rotation, and D is "stretch along v1 and v2". So <math>Dv_1 = \lambda_1 v_1</math> and <math>Dv_2 = \lambda_2 v_2</math>. Now, D is not a diagonal matrix, when viewed in the standard basis, but it ''is'' a diagonal matrix when viewed under the basis v1,v2. To get to the standard basis, we need to convert the incoming vectors like v1 into e1, then apply the stretching, then reconvert back to v1. In other words, we want to show <math>D = U\Sigma U^*</math> for some diagonal matrix Sigma and orthogonal U. Just take <math>Ue_j=v_j</math>, i.e. the matrix U is the matrix with columns [v1 v2]. Now, <math>Dv_j = U\Sigma U^* v_j = U\Sigma e_j = U\lambda_j e_j = \lambda_j v_j</math> just like we wanted. So <math>M = RD = RU\Sigma U^*</math>. Of course, RU is another orthogonal matrix, so we recover SVD.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* (polar decomposition.) In the linked image, look at the axes of the final ellipse, labeled <math>\sigma_1</math> and <math>\sigma_2</math>. Call those vectors <math>u_1</math> and <math>u_2</math>. So <math>u_1 = Mv_1</math> and <math>u_2 = Mv_2</math> for some vectors v1 and v2. Now, backtrack along the arrows, starting from the final image, going through U, then Sigma, then V*. Pay attention to what it does to u1 and u2. In each step, the vectors remain orthogonal. So not only are u1 and u2 orthogonal, we must have that v1 and v2 are orthogonal. So now, couldn't we say, "take v1 and v2, squish along those axes. then rotate." That seems to have required only one rotation. What's going on? The problem is that a diagonal matrix can only stretch along the standard basis. So "stretch along v1 and v2" can't be done via a diagonal matrix (unless v1 and v2 are the standard basis, of course). Let's say <math>M = RD</math> where R is a rotation, and D is "stretch along v1 and v2". So <math>Dv_1 = \lambda_1 v_1</math> and <math>Dv_2 = \lambda_2 v_2</math>. Now, D is not a diagonal matrix, when viewed in the standard basis, but it ''is'' a diagonal matrix when viewed under the basis v1,v2. To get to the standard basis, we need to convert the incoming vectors like v1 into e1, then apply the stretching, then reconvert back to v1. In other words, we want to show <math>D = U\Sigma U^*</math> for some diagonal matrix Sigma and orthogonal U. Just take <math>Ue_j=v_j</math>, i.e. the matrix U is the matrix with columns [v1 v2]. Now, <math>Dv_j = U\Sigma U^* v_j = U\Sigma e_j = U\lambda_j e_j = \lambda_j v_j</math> just like we wanted. So <math>M = RD = RU\Sigma U^*</math>. Of course, RU is another orthogonal matrix, so we recover SVD.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** also, if any of the entries of <math>\Sigma</math> are negative, we could have chosen a different v vector that would have made it positive, so we can assume that D is a positive operator (positive semi-definite).</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** also, if any of the entries of <math>\Sigma</math> are negative, we could have chosen a different v vector that would have made it positive, so we can assume that D is a positive operator (positive semi-definite).</div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">* another question is: if we squish along the right orthogonal directions, can't we get away with not needing the extra rotation? after all, an ellipse can be squished into a circle without any rotations. what must be the case, although i can't explain it visually yet, is that if we do just that, then the standard basis vectors (yellow and pink) get mapped to the wrong spots.</ins></div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">* [https://math.stackexchange.com/questions/2899052/singular-value-decomposition-reconciling-the-maximal-stretching-and-spectral my old question]:</ins></div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">** what is this <math>\sqrt{T^*T}</math> that axler keeps talking about? that's the "stretch along well-chosen orthogonal directions" operation that we start out with in polar decomposition.</ins></div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">** for proof (1), see [http://cognitivemedium.com/emm/emm.html michael nielsen]. basically, the maximal stretching direction has a tangent vector (on the ellipse) that is orthogonal to it, because if it ''wasn't'' orthogonal, then we could get an even more stretched out vector. the other piece that's required is that linear maps preserve tangency. i.e. if v(t) is a parametrization of a circle, and M is a matrix, then M(v(t)) traces out an ellipse as t varies. (i'm using t as a parameter even though nielsen uses it as a vector. seriously, who the heck uses t for a vector??) the tangent vector on the circle at v(t) is v'(t). this tangent vector gets mapped to M(v'(t)). the tangent vector at M(v(t)) on the ellipse is <math display=inline>\frac{d}{dt} M(v(t))</math>. now, by linearity of M and the definition of the derivative, we can basically "pull out" the M and see that <math display=inline>\frac{d}{dt} M(v(t)) = M(\frac{d}{dt} v(t))</math>. what this means is that if you have a point on the circle and its tangent, then you map both of them under M, then the tangent of the image of the point is the image of the tangent at the point. what this implies is that for our maximal stretch vector, since the tangent on the circle is orthogonal, the image of that tangent is also a tangent at the new place on the ellipse, and we already know that the tangent is orthogonal for the maximal stretch vector.</ins></div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==See also==</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==See also==</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* https://machinelearning.subwiki.org/wiki/User:IssaRice/Linear_algebra/Classification_of_operators -- performing SVD on some nicer operators allows you to skip some of the steps, resulting in a simpler decomposition.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* https://machinelearning.subwiki.org/wiki/User:IssaRice/Linear_algebra/Classification_of_operators -- performing SVD on some nicer operators allows you to skip some of the steps, resulting in a simpler decomposition.</div></td></tr> </table>

IssaRice https://machinelearning.subwiki.org/w/index.php?title=User:IssaRice/Linear_algebra/Singular_value_decomposition&diff=2722&oldid=prev IssaRice at 06:03, 17 January 2020 2020-01-17T06:03:55Z

<p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 06:03, 17 January 2020</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l7">Line 7:</td> <td colspan="2" class="diff-lineno">Line 7:</td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* (polar decomposition.) In the linked image, look at the axes of the final ellipse, labeled <math>\sigma_1</math> and <math>\sigma_2</math>. Call those vectors <math>u_1</math> and <math>u_2</math>. So <math>u_1 = Mv_1</math> and <math>u_2 = Mv_2</math> for some vectors v1 and v2. Now, backtrack along the arrows, starting from the final image, going through U, then Sigma, then V*. Pay attention to what it does to u1 and u2. In each step, the vectors remain orthogonal. So not only are u1 and u2 orthogonal, we must have that v1 and v2 are orthogonal. So now, couldn't we say, "take v1 and v2, squish along those axes. then rotate." That seems to have required only one rotation. What's going on? The problem is that a diagonal matrix can only stretch along the standard basis. So "stretch along v1 and v2" can't be done via a diagonal matrix (unless v1 and v2 are the standard basis, of course). Let's say <math>M = RD</math> where R is a rotation, and D is "stretch along v1 and v2". So <math>Dv_1 = \lambda_1 v_1</math> and <math>Dv_2 = \lambda_2 v_2</math>. Now, D is not a diagonal matrix, when viewed in the standard basis, but it ''is'' a diagonal matrix when viewed under the basis v1,v2. To get to the standard basis, we need to convert the incoming vectors like v1 into e1, then apply the stretching, then reconvert back to v1. In other words, we want to show <math>D = U\Sigma U^*</math> for some diagonal matrix Sigma and orthogonal U. Just take <math>Ue_j=v_j</math>, i.e. the matrix U is the matrix with columns [v1 v2]. Now, <math>Dv_j = U\Sigma U^* v_j = U\Sigma e_j = U\lambda_j e_j = \lambda_j v_j</math> just like we wanted. So <math>M = RD = RU\Sigma U^*</math>. Of course, RU is another orthogonal matrix, so we recover SVD.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* (polar decomposition.) In the linked image, look at the axes of the final ellipse, labeled <math>\sigma_1</math> and <math>\sigma_2</math>. Call those vectors <math>u_1</math> and <math>u_2</math>. So <math>u_1 = Mv_1</math> and <math>u_2 = Mv_2</math> for some vectors v1 and v2. Now, backtrack along the arrows, starting from the final image, going through U, then Sigma, then V*. Pay attention to what it does to u1 and u2. In each step, the vectors remain orthogonal. So not only are u1 and u2 orthogonal, we must have that v1 and v2 are orthogonal. So now, couldn't we say, "take v1 and v2, squish along those axes. then rotate." That seems to have required only one rotation. What's going on? The problem is that a diagonal matrix can only stretch along the standard basis. So "stretch along v1 and v2" can't be done via a diagonal matrix (unless v1 and v2 are the standard basis, of course). Let's say <math>M = RD</math> where R is a rotation, and D is "stretch along v1 and v2". So <math>Dv_1 = \lambda_1 v_1</math> and <math>Dv_2 = \lambda_2 v_2</math>. Now, D is not a diagonal matrix, when viewed in the standard basis, but it ''is'' a diagonal matrix when viewed under the basis v1,v2. To get to the standard basis, we need to convert the incoming vectors like v1 into e1, then apply the stretching, then reconvert back to v1. In other words, we want to show <math>D = U\Sigma U^*</math> for some diagonal matrix Sigma and orthogonal U. Just take <math>Ue_j=v_j</math>, i.e. the matrix U is the matrix with columns [v1 v2]. Now, <math>Dv_j = U\Sigma U^* v_j = U\Sigma e_j = U\lambda_j e_j = \lambda_j v_j</math> just like we wanted. So <math>M = RD = RU\Sigma U^*</math>. Of course, RU is another orthogonal matrix, so we recover SVD.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** also, if any of the entries of <math>\Sigma</math> are negative, we could have chosen a different v vector that would have made it positive, so we can assume that D is a positive operator (positive semi-definite).</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>** also, if any of the entries of <math>\Sigma</math> are negative, we could have chosen a different v vector that would have made it positive, so we can assume that D is a positive operator (positive semi-definite).</div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">==See also==</ins></div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">* https://machinelearning.subwiki.org/wiki/User:IssaRice/Linear_algebra/Classification_of_operators -- performing SVD on some nicer operators allows you to skip some of the steps, resulting in a simpler decomposition.</ins></div></td></tr> </table>

IssaRice https://machinelearning.subwiki.org/w/index.php?title=User:IssaRice/Linear_algebra/Singular_value_decomposition&diff=2712&oldid=prev IssaRice at 10:41, 15 January 2020 2020-01-15T10:41:44Z

<p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 10:41, 15 January 2020</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l6">Line 6:</td> <td colspan="2" class="diff-lineno">Line 6:</td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* one question one might have is, to get the behavior of M in the linked image, can't we just squish along the standard basis directions, then rotate? surely this would produce the same ellipse. And it would seem that we've only required one rotation, instead of the two in SVD. That's true, but pay attention to where the basis vectors went. A squish followed by a rotation... would preserve orthogonality. But in M it is clear that these basis vectors are no longer orthogonal. So even though we have faithfully preserved the ellipse, we don't have the same transformation. i.e. <math>M(\{v : \|v\| = 1\}) = M'(\{v : \|v\|=1\})</math> need not imply <math>M=M'</math>, apparently.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* one question one might have is, to get the behavior of M in the linked image, can't we just squish along the standard basis directions, then rotate? surely this would produce the same ellipse. And it would seem that we've only required one rotation, instead of the two in SVD. That's true, but pay attention to where the basis vectors went. A squish followed by a rotation... would preserve orthogonality. But in M it is clear that these basis vectors are no longer orthogonal. So even though we have faithfully preserved the ellipse, we don't have the same transformation. i.e. <math>M(\{v : \|v\| = 1\}) = M'(\{v : \|v\|=1\})</math> need not imply <math>M=M'</math>, apparently.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* (polar decomposition.) In the linked image, look at the axes of the final ellipse, labeled <math>\sigma_1</math> and <math>\sigma_2</math>. Call those vectors <math>u_1</math> and <math>u_2</math>. So <math>u_1 = Mv_1</math> and <math>u_2 = Mv_2</math> for some vectors v1 and v2. Now, backtrack along the arrows, starting from the final image, going through U, then Sigma, then V*. Pay attention to what it does to u1 and u2. In each step, the vectors remain orthogonal. So not only are u1 and u2 orthogonal, we must have that v1 and v2 are orthogonal. So now, couldn't we say, "take v1 and v2, squish along those axes. then rotate." That seems to have required only one rotation. What's going on? The problem is that a diagonal matrix can only stretch along the standard basis. So "stretch along v1 and v2" can't be done via a diagonal matrix (unless v1 and v2 are the standard basis, of course). Let's say <math>M = RD</math> where R is a rotation, and D is "stretch along v1 and v2". So <math>Dv_1 = \lambda_1 v_1</math> and <math>Dv_2 = \lambda_2 v_2</math>. Now, D is not a diagonal matrix, when viewed in the standard basis, but it ''is'' a diagonal matrix when viewed under the basis v1,v2. To get to the standard basis, we need to convert the incoming vectors like v1 into e1, then apply the stretching, then reconvert back to v1. In other words, we want to show <math>D = U\Sigma U^*</math> for some diagonal matrix Sigma and orthogonal U. Just take <math>Ue_j=v_j</math>, i.e. the matrix U is the matrix with columns [v1 v2]. Now, <math>Dv_j = U\Sigma U^* v_j = U\Sigma e_j = U\lambda_j e_j = \lambda_j v_j</math> just like we wanted. So <math>M = RD = RU\Sigma U^*</math>. Of course, RU is another orthogonal matrix, so we recover SVD.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* (polar decomposition.) In the linked image, look at the axes of the final ellipse, labeled <math>\sigma_1</math> and <math>\sigma_2</math>. Call those vectors <math>u_1</math> and <math>u_2</math>. So <math>u_1 = Mv_1</math> and <math>u_2 = Mv_2</math> for some vectors v1 and v2. Now, backtrack along the arrows, starting from the final image, going through U, then Sigma, then V*. Pay attention to what it does to u1 and u2. In each step, the vectors remain orthogonal. So not only are u1 and u2 orthogonal, we must have that v1 and v2 are orthogonal. So now, couldn't we say, "take v1 and v2, squish along those axes. then rotate." That seems to have required only one rotation. What's going on? The problem is that a diagonal matrix can only stretch along the standard basis. So "stretch along v1 and v2" can't be done via a diagonal matrix (unless v1 and v2 are the standard basis, of course). Let's say <math>M = RD</math> where R is a rotation, and D is "stretch along v1 and v2". So <math>Dv_1 = \lambda_1 v_1</math> and <math>Dv_2 = \lambda_2 v_2</math>. Now, D is not a diagonal matrix, when viewed in the standard basis, but it ''is'' a diagonal matrix when viewed under the basis v1,v2. To get to the standard basis, we need to convert the incoming vectors like v1 into e1, then apply the stretching, then reconvert back to v1. In other words, we want to show <math>D = U\Sigma U^*</math> for some diagonal matrix Sigma and orthogonal U. Just take <math>Ue_j=v_j</math>, i.e. the matrix U is the matrix with columns [v1 v2]. Now, <math>Dv_j = U\Sigma U^* v_j = U\Sigma e_j = U\lambda_j e_j = \lambda_j v_j</math> just like we wanted. So <math>M = RD = RU\Sigma U^*</math>. Of course, RU is another orthogonal matrix, so we recover SVD.</div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">** also, if any of the entries of <math>\Sigma</math> are negative, we could have chosen a different v vector that would have made it positive, so we can assume that D is a positive operator (positive semi-definite).</ins></div></td></tr> </table>

IssaRice