User:IssaRice/Chain rule proofs: Difference between revisions

From Machinelearning
Line 117: Line 117:
===Main idea===
===Main idea===


Let <math>(x_n)_{n=1}^\infty</math> be a sequence in <math>X \setminus \{x_0\}</math> that converges to <math>x_0</math>. Then we want to write
Let <math>(x_n)_{n=1}^\infty</math> be a sequence taking values in <math>X \setminus \{x_0\}</math> that converges to <math>x_0</math>. Then we want to write


<math display="block">\frac{g(f(x_n)) - g(f(x_0))}{x_n - x_0} = \frac{g(f(x_n)) - g(f(x_0))}{f(x_n) - f(x_0)} \cdot \frac{f(x_n) - f(x_0)}{x_n - x_0}</math>
<math display="block">\frac{g(f(x_n)) - g(f(x_0))}{x_n - x_0} = \frac{g(f(x_n)) - g(f(x_0))}{f(x_n) - f(x_0)} \cdot \frac{f(x_n) - f(x_0)}{x_n - x_0}</math>
Line 125: Line 125:
===Proof===
===Proof===


Let <math>(x_n)_{n=1}^\infty</math> be a sequence in <math>X \setminus \{x_0\}</math> that converges to <math>x_0</math>.
Let <math>(x_n)_{n=1}^\infty</math> be a sequence taking values in <math>X \setminus \{x_0\}</math> that converges to <math>x_0</math>.


Define a function <math>\phi : Y \to \mathbf R</math> by
Define a function <math>\phi : Y \to \mathbf R</math> by
Line 139: Line 139:
for all <math>x_n</math>. (Why? Consider the cases <math>f(x_n) = f(x_0)</math> and <math>f(x_n) \ne f(x_0)</math> separately.)
for all <math>x_n</math>. (Why? Consider the cases <math>f(x_n) = f(x_0)</math> and <math>f(x_n) \ne f(x_0)</math> separately.)


Differentiability of <math>g</math> at <math>f(x_0)</math> says that if <math>(y_n)_{n=1}^\infty</math> is a sequence in <math>Y \setminus \{y_0\}</math> that converges to <math>f(x_0)</math>, then <math>\frac{g(y_n) - g(f(x_0))}{y_n - f(x_0)} \to g'(f(x_0))</math> as <math>n \to \infty</math>. What if <math>(y_n)_{n=1}^\infty</math> is instead a sequence in <math>Y</math>? Then we can say <math>\phi(y_n) \to g'(f(x_0))</math> as <math>n\to\infty</math>. To show this, let <math>\epsilon > 0</math>. Now we can find <math>N \geq 1</math> such that <math>\left\vert\frac{g(y_n) - g(f(x_0))}{y_n - f(x_0)} - g'(f(x_0))\right\vert \leq \epsilon</math> for all <math>n \geq N</math>. But this means if <math>n \geq N</math>, then we have two cases: either <math>y_n \in Y</math> and <math>y_n \ne f(x_0)</math>, in which case <math>|\phi(y_n) - g'(f(x_0))| \leq \epsilon</math> as above, or else <math>y_n = f(x_0)</math>, in which case <math>\phi(y_n) = g'(f(x_0))</math> so <math>|\phi(y) - g'(f(x_0))| = 0 \leq \epsilon</math>.
Differentiability of <math>g</math> at <math>f(x_0)</math> says that if <math>(y_n)_{n=1}^\infty</math> is a sequence taking values in <math>Y \setminus \{y_0\}</math> that converges to <math>f(x_0)</math>, then <math>\frac{g(y_n) - g(f(x_0))}{y_n - f(x_0)} \to g'(f(x_0))</math> as <math>n \to \infty</math>. What if <math>(y_n)_{n=1}^\infty</math> is instead a sequence taking values in <math>Y</math>? Then we can say <math>\phi(y_n) \to g'(f(x_0))</math> as <math>n\to\infty</math>. To show this, let <math>\epsilon > 0</math>. Now we can find <math>N \geq 1</math> such that <math>\left\vert\frac{g(y_n) - g(f(x_0))}{y_n - f(x_0)} - g'(f(x_0))\right\vert \leq \epsilon</math> for all <math>n \geq N</math>. But this means if <math>n \geq N</math>, then we have two cases: either <math>y_n \in Y</math> and <math>y_n \ne f(x_0)</math>, in which case <math>|\phi(y_n) - g'(f(x_0))| \leq \epsilon</math> as above, or else <math>y_n = f(x_0)</math>, in which case <math>\phi(y_n) = g'(f(x_0))</math> so <math>|\phi(y) - g'(f(x_0))| = 0 \leq \epsilon</math>.


Differentiability of <math>f</math> at <math>x_0</math> implies continuity of <math>f</math> at <math>x_0</math>, so this means that <math>f(x_n) \to f(x_0)</math> as <math>n \to \infty</math>. Since <math>f(x_n) \in Y</math> for each <math>n \geq 1</math>, we can use <math>(f(x_n))_{n=1}^\infty</math> as our sequence in <math>Y</math> to conclude that as <math>n \to \infty</math> we have <math>\phi(f(x_n)) \to g'(f(x_0))</math>.
Differentiability of <math>f</math> at <math>x_0</math> implies continuity of <math>f</math> at <math>x_0</math>, so this means that <math>f(x_n) \to f(x_0)</math> as <math>n \to \infty</math>. Since <math>f(x_n) \in Y</math> for each <math>n \geq 1</math>, we can use <math>(f(x_n))_{n=1}^\infty</math> as our sequence in <math>Y</math> to conclude that as <math>n \to \infty</math> we have <math>\phi(f(x_n)) \to g'(f(x_0))</math>.

Revision as of 23:55, 28 November 2018

Using Newton's approximation

Main idea

The main idea of using Newton's approximation to prove the chain rule is that since f is differentiable at we have the approximation when is near . Similarly since g is differentiable at we have the approximation when is near . Since f is differentiable at , it is continuous there also, so we know that is near whenever is near . This allows us to substitute into whenever is near . So we get

Thus we get , which is what the chain rule says.

Proof

We want to show is differentiable at with derivative . By Newton's approximation, this is equivalent to showing that for every there exists such that

whenever . So let .

Now we do some algebraic manipulation. Write

where . This holds for every . Since we thus have

Similarly write

where .

Substituting the expression for in the expression for we get

we can rewrite this as

Thus our goal now is to show .

But by the triangle inequality it suffices to show .

where we are free to choose .

To get the bound for (using Newton's approximation), we need to make sure is small. But by continuity of at we can do this.

where again we are free to choose .

TODO: can we do this same proof but without using the error term notation?

TODO: somehow Folland does this without explicitly using continuity of f; i need to understand if he's using it implicitly somehow or he's actually proving it when bounding using

old proof

Since is differentiable at , we know is a real number, and we can write

(there is no magic: the terms just cancel out)

If we define we can write

Newton's approximation says that as long as .

Since is differentiable at , we know that it must be continuous at . This means we can keep as long as we keep .

Since and , this means we can substitute and get

Now we use the differentiability of . We can write

Again, we can define and write this as

Now we can substitute this into the expression for to get

where we have canceled out two terms using .

Thus we have

We can write this as

where . Now the left hand side looks like the expression in Newton's approximation. This means to show is differentiable at , we just need to show that .

The stuff in square brackets is our "error term" for . Now we just need to make sure it is small, even after dividing by .

But f is differentiable at , so by Newton's approximation,

we also have

We can bound this from above using the triangle inequality:

Now we can just choose small enough.

Limits of sequences

Main idea

Let be a sequence taking values in that converges to . Then we want to write

Now use the limit laws to conclude that the limit is . The problem is that can be zero even when .

Proof

Let be a sequence taking values in that converges to .

Define a function by

The idea is that we want to say is going to , so we just define it at the undefined points to already be at that limit.

Now we have

for all . (Why? Consider the cases and separately.)

Differentiability of at says that if is a sequence taking values in that converges to , then as . What if is instead a sequence taking values in ? Then we can say as . To show this, let . Now we can find such that for all . But this means if , then we have two cases: either and , in which case as above, or else , in which case so .

Differentiability of at implies continuity of at , so this means that as . Since for each , we can use as our sequence in to conclude that as we have .

Now by the limit laws

Since the sequence was arbitrary, we can conclude that .

TODO: Tao says that division by zero occurs when , which seems strange to me.