User:IssaRice/Chain rule proofs: Difference between revisions

Revision as of 23:55, 28 November 2018

Using Newton's approximation

Main idea

The main idea of using Newton's approximation to prove the chain rule is that since f is differentiable at $x_{0}$ we have the approximation $f (x) \approx f (x_{0}) + f^{'} (x_{0}) (x - x_{0})$ when $x$ is near $x_{0}$ . Similarly since g is differentiable at $f (x_{0})$ we have the approximation $g (y) \approx g (f (x_{0})) + g^{'} (f (x_{0})) (y - f (x_{0}))$ when $y$ is near $f (x_{0})$ . Since f is differentiable at $x_{0}$ , it is continuous there also, so we know that $f (x)$ is near $f (x_{0})$ whenever $x$ is near $x_{0}$ . This allows us to substitute $f (x)$ into $y$ whenever $x$ is near $x_{0}$ . So we get

$\begin{array}{r} g (f (x)) & \approx g (f (x_{0})) + g' (f (x_{0})) (f (x) - f (x_{0})) \\ \approx g (f (x_{0})) + g' (f (x_{0})) (f' (x_{0}) (x - x_{0})) \end{array}$

Thus we get $g \circ f (x) \approx g \circ f (x_{0}) + g^{'} (f (x_{0})) f^{'} (x_{0}) (x - x_{0})$ , which is what the chain rule says.

Proof

We want to show $g \circ f$ is differentiable at $x_{0}$ with derivative $L : = g^{'} (f (x_{0})) f^{'} (x_{0})$ . By Newton's approximation, this is equivalent to showing that for every $ϵ > 0$ there exists $δ > 0$ such that

$| g \circ f (x) - (g \circ f (x_{0}) + L (x - x_{0})) | \leq ϵ | x - x_{0} |$

whenever $| x - x_{0} | \leq δ$ . So let $ϵ > 0$ .

Now we do some algebraic manipulation. Write

$g (y) = g (y_{0}) + g^{'} (y_{0}) (y - y_{0}) + E_{g} (y, y_{0})$

where $E_{g} (y, y_{0}) : = g (y) - (g (y_{0}) + g^{'} (y_{0}) (y - y_{0}))$ . This holds for every $y \in Y$ . Since $f (x) \in Y$ we thus have

$g (f (x)) = g (f (x_{0})) + g^{'} (f (x_{0})) (f (x) - f (x_{0})) + E_{g} (f (x), f (x_{0}))$

Similarly write

$f (x) = f (x_{0}) + f^{'} (x_{0}) (x - x_{0}) + E_{f} (x, x_{0})$

where $E_{f} (x, x_{0}) : = f (x) - (f (x_{0}) + f^{'} (x_{0}) (x - x_{0}))$ .

Substituting the expression for $f (x)$ in the expression for $g (f (x))$ we get

$\begin{array}{r} g (f (x)) & = g (f (x_{0})) + g' (f (x_{0})) (f' (x_{0}) (x - x_{0}) + E_{f} (x, x_{0})) + E_{g} (f (x), f (x_{0})) \\ = g (f (x_{0})) + g' (f (x_{0})) f' (x_{0}) (x - x_{0}) + g' (f (x_{0})) E_{f} (x, x_{0}) + E_{g} (f (x), f (x_{0})) \end{array}$

we can rewrite this as $g \circ f (x) - (g \circ f (x_{0}) + L (x - x_{0})) = g^{'} (f (x_{0})) E_{f} (x, x_{0}) + E_{g} (f (x), f (x_{0}))$

Thus our goal now is to show $| g^{'} (f (x_{0})) E_{f} (x, x_{0}) + E_{g} (f (x), f (x_{0})) | \leq ϵ | x - x_{0} |$ .

But by the triangle inequality it suffices to show $| g^{'} (f (x_{0})) E_{f} (x, x_{0}) | + | E_{g} (f (x), f (x_{0})) | \leq ϵ | x - x_{0} |$ .

$| g^{'} (f (x_{0})) E_{f} (x, x_{0}) | \leq | g^{'} (f (x_{0})) | ϵ_{1} | x - x_{0} |$ where we are free to choose $ϵ_{1}$ .

To get the bound for $| E_{g} (f (x), f (x_{0})) |$ (using Newton's approximation), we need to make sure $| f (x) - f (x_{0}) |$ is small. But by continuity of $f$ at $x_{0}$ we can do this.

$\begin{array}{r} | E_{g} (f (x), f (x_{0})) | & \leq ϵ_{2} | f (x) - f (x_{0}) | \\ = ϵ_{2} | f' (x_{0}) (x - x_{0}) + E_{f} (x, x_{0}) | \\ \leq ϵ_{2} | f' (x_{0}) | | x - x_{0} | + ϵ_{2} ϵ_{3} | x - x_{0} | \\ = (ϵ_{2} | f' (x_{0}) | + ϵ_{2} ϵ_{3}) | x - x_{0} | \end{array}$

where again we are free to choose $ϵ_{2}, ϵ_{3}$ .

TODO: can we do this same proof but without using the error term notation?

TODO: somehow Folland does this without explicitly using continuity of f; i need to understand if he's using it implicitly somehow or he's actually proving it when bounding $| h |$ using $| u |$

old proof

Since $g$ is differentiable at $y_{0}$ , we know $g^{'} (y_{0})$ is a real number, and we can write

$g (y) = g (y_{0}) + g^{'} (y_{0}) (y - y_{0}) + [g (y) - (g (y_{0}) + g^{'} (y_{0}) (y - y_{0}))]$

(there is no magic: the terms just cancel out)

If we define $E_{g} (y, y_{0}) : = g (y) - (g (y_{0}) + g^{'} (y_{0}) (y - y_{0}))$ we can write

$g (y) = g (y_{0}) + g^{'} (f (x_{0})) (y - y_{0}) + E_{g} (y, y_{0})$

Newton's approximation says that $| E_{g} (y, y_{0}) | \leq ϵ | y - y_{0} |$ as long as $| y - y_{0} | \leq δ$ .

Since $f$ is differentiable at $x_{0}$ , we know that it must be continuous at $x_{0}$ . This means we can keep $| f (x) - y_{0} | \leq δ$ as long as we keep $| x - x_{0} | \leq δ^{'}$ .

Since $f (x) \in Y$ and $| f (x) - y_{0} | \leq δ$ , this means we can substitute $y = f (x)$ and get

$g (f (x)) = g (y_{0}) + g^{'} (f (x_{0})) (f (x) - y_{0}) + E_{g} (f (x), y_{0})$

Now we use the differentiability of $f$ . We can write

$f (x) = f (x_{0}) + f^{'} (x_{0}) (x - x_{0}) + [f (x) - (f (x_{0}) + f^{'} (x_{0}) (x - x_{0}))]$

Again, we can define $E_{f} (x, x_{0}) : = f (x) - (f (x_{0}) + f^{'} (x_{0}) (x - x_{0}))$ and write this as

$f (x) = f (x_{0}) + f^{'} (x_{0}) (x - x_{0}) + E_{f} (x, x_{0})$

Now we can substitute this into the expression for $g (f (x))$ to get

$g (f (x)) = g (y_{0}) + g^{'} (f (x_{0})) (f^{'} (x_{0}) (x - x_{0}) + E_{f} (x, x_{0})) + E_{g} (f (x), f (x_{0}))$

where we have canceled out two terms using $f (x_{0}) = y_{0}$ .

Thus we have

$g (f (x)) = g (y_{0}) + g^{'} (f (x_{0})) f^{'} (x_{0}) (x - x_{0}) + [g^{'} (f (x_{0})) E_{f} (x, x_{0}) + E_{g} (f (x), f (x_{0}))]$

We can write this as

$(g \circ f) (x) - ((g \circ f) (x_{0}) + L (x - x_{0})) = [g^{'} (f (x_{0})) E_{f} (x, x_{0}) + E_{g} (f (x), f (x_{0}))]$

where $L : = g^{'} (f (x_{0})) f^{'} (x_{0})$ . Now the left hand side looks like the expression in Newton's approximation. This means to show $g \circ f$ is differentiable at $x_{0}$ , we just need to show that $| g^{'} (f (x_{0})) E_{f} (x, x_{0}) + E_{g} (f (x), f (x_{0})) | \leq ϵ | x - x_{0} |$ .

The stuff in square brackets is our "error term" for $g \circ f$ . Now we just need to make sure it is small, even after dividing by $| x - x_{0} |$ .

But f is differentiable at $x_{0}$ , so by Newton's approximation,

$| g^{'} (f (x_{0})) E_{f} (x, x_{0}) | \leq | g^{'} (f (x_{0})) | ϵ_{1} | x - x_{0} |$

we also have

$| E_{g} (f (x), f (x_{0})) | \leq ϵ_{2} | f (x) - f (x_{0}) | = ϵ_{2} | f^{'} (x_{0}) (x - x_{0}) + E_{f} (x, x_{0}) |$

We can bound this from above using the triangle inequality:

$\begin{array}{r} | E_{g} (f (x), f (x_{0})) | & \leq ϵ_{2} | f' (x_{0}) (x - x_{0}) | + ϵ_{2} | E_{f} (x, x_{0}) | \\ \leq ϵ_{2} | f' (x_{0}) | | x - x_{0} | + ϵ_{2} ϵ_{1} | x - x_{0} | \end{array}$

Now we can just choose $ϵ_{1}, ϵ_{2}$ small enough.

Limits of sequences

Main idea

Let $(x_{n})_{n = 1}^{\infty}$ be a sequence taking values in $X ∖ {x_{0}}$ that converges to $x_{0}$ . Then we want to write

$\frac{g (f (x_{n})) - g (f (x_{0}))}{x_{n} - x_{0}} = \frac{g (f (x_{n})) - g (f (x_{0}))}{f (x_{n}) - f (x_{0})} \cdot \frac{f (x_{n}) - f (x_{0})}{x_{n} - x_{0}}$

Now use the limit laws to conclude that the limit is $g^{'} (f (x_{0})) \cdot f^{'} (x_{0})$ . The problem is that $f (x_{n}) - f (x_{0})$ can be zero even when $x_{n} \neq x_{0}$ .

Proof

Let $(x_{n})_{n = 1}^{\infty}$ be a sequence taking values in $X ∖ {x_{0}}$ that converges to $x_{0}$ .

Define a function $ϕ : Y \to R$ by

$ϕ (y) : = {\begin{matrix} y \neq f (x_{0}) \\ g' (f (x_{0})) & y = f (x_{0}) \end{matrix}$

The idea is that we want to say $\frac{g (f (x_{n})) - g (f (x_{0}))}{f (x_{n}) - f (x_{0})}$ is going to $g^{'} (f (x_{0}))$ , so we just define it at the undefined points to already be at that limit.

Now we have

$\frac{g (f (x_{n})) - g (f (x_{0}))}{x_{n} - x_{0}} = ϕ (f (x_{n})) \cdot \frac{f (x_{n}) - f (x_{0})}{x_{n} - x_{0}}$

for all $x_{n}$ . (Why? Consider the cases $f (x_{n}) = f (x_{0})$ and $f (x_{n}) \neq f (x_{0})$ separately.)

Differentiability of $g$ at $f (x_{0})$ says that if $(y_{n})_{n = 1}^{\infty}$ is a sequence taking values in $Y ∖ {y_{0}}$ that converges to $f (x_{0})$ , then $\frac{g (y_{n}) - g (f (x_{0}))}{y_{n} - f (x_{0})} \to g^{'} (f (x_{0}))$ as $n \to \infty$ . What if $(y_{n})_{n = 1}^{\infty}$ is instead a sequence taking values in $Y$ ? Then we can say $ϕ (y_{n}) \to g^{'} (f (x_{0}))$ as $n \to \infty$ . To show this, let $ϵ > 0$ . Now we can find $N \geq 1$ such that $| \frac{g (y_{n}) - g (f (x_{0}))}{y_{n} - f (x_{0})} - g^{'} (f (x_{0})) | \leq ϵ$ for all $n \geq N$ . But this means if $n \geq N$ , then we have two cases: either $y_{n} \in Y$ and $y_{n} \neq f (x_{0})$ , in which case $| ϕ (y_{n}) - g^{'} (f (x_{0})) | \leq ϵ$ as above, or else $y_{n} = f (x_{0})$ , in which case $ϕ (y_{n}) = g^{'} (f (x_{0}))$ so $| ϕ (y) - g^{'} (f (x_{0})) | = 0 \leq ϵ$ .

Differentiability of $f$ at $x_{0}$ implies continuity of $f$ at $x_{0}$ , so this means that $f (x_{n}) \to f (x_{0})$ as $n \to \infty$ . Since $f (x_{n}) \in Y$ for each $n \geq 1$ , we can use $(f (x_{n}))_{n = 1}^{\infty}$ as our sequence in $Y$ to conclude that as $n \to \infty$ we have $ϕ (f (x_{n})) \to g^{'} (f (x_{0}))$ .

Now by the limit laws

$\begin{array}{r} lim_{n \to \infty} \frac{g (f (x_{n})) - g (f (x_{0}))}{x_{n} - x_{0}} & = (lim_{n \to \infty} ϕ (f (x_{n}))) (lim_{n \to \infty} \frac{f (x_{n}) - f (x_{0})}{x_{n} - x_{0}}) \\ = g' (f (x_{0})) f' (x_{0}) \end{array}$

Since the sequence $(x_{n})_{n = 1}^{\infty}$ was arbitrary, we can conclude that $lim_{x \to x_{0}; x \in X ∖ {x_{0}}} \frac{g (f (x)) - g (f (x_{0}))}{x - x_{0}} = g^{'} (f (x_{0})) f^{'} (x_{0})$ .

$\frac{g (f (x_{n})) - g (f (x_{0}))}{f (x_{n}) - f (x_{0})} \to g^{'} (f (x_{0}))$

TODO: Tao says that division by zero occurs when $f^{'} (x_{0}) = 0$ , which seems strange to me.

@@ Line 117: / Line 117: @@
 ===Main idea===
-Let <math>(x_n)_{n=1}^\infty</math> be a sequence in <math>X \setminus \{x_0\}</math> that converges to <math>x_0</math>. Then we want to write
+Let <math>(x_n)_{n=1}^\infty</math> be a sequence taking values in <math>X \setminus \{x_0\}</math> that converges to <math>x_0</math>. Then we want to write
 <math display="block">\frac{g(f(x_n)) - g(f(x_0))}{x_n - x_0} = \frac{g(f(x_n)) - g(f(x_0))}{f(x_n) - f(x_0)} \cdot \frac{f(x_n) - f(x_0)}{x_n - x_0}</math>
@@ Line 125: / Line 125: @@
 ===Proof===
-Let <math>(x_n)_{n=1}^\infty</math> be a sequence in <math>X \setminus \{x_0\}</math> that converges to <math>x_0</math>.
+Let <math>(x_n)_{n=1}^\infty</math> be a sequence taking values in <math>X \setminus \{x_0\}</math> that converges to <math>x_0</math>.
 Define a function <math>\phi : Y \to \mathbf R</math> by
@@ Line 139: / Line 139: @@
 for all <math>x_n</math>. (Why? Consider the cases <math>f(x_n) = f(x_0)</math> and <math>f(x_n) \ne f(x_0)</math> separately.)
-Differentiability of <math>g</math> at <math>f(x_0)</math> says that if <math>(y_n)_{n=1}^\infty</math> is a sequence in <math>Y \setminus \{y_0\}</math> that converges to <math>f(x_0)</math>, then <math>\frac{g(y_n) - g(f(x_0))}{y_n - f(x_0)} \to g'(f(x_0))</math> as <math>n \to \infty</math>. What if <math>(y_n)_{n=1}^\infty</math> is instead a sequence in <math>Y</math>? Then we can say <math>\phi(y_n) \to g'(f(x_0))</math> as <math>n\to\infty</math>. To show this, let <math>\epsilon > 0</math>. Now we can find <math>N \geq 1</math> such that <math>\left\vert\frac{g(y_n) - g(f(x_0))}{y_n - f(x_0)} - g'(f(x_0))\right\vert \leq \epsilon</math> for all <math>n \geq N</math>. But this means if <math>n \geq N</math>, then we have two cases: either <math>y_n \in Y</math> and <math>y_n \ne f(x_0)</math>, in which case <math>|\phi(y_n) - g'(f(x_0))| \leq \epsilon</math> as above, or else <math>y_n = f(x_0)</math>, in which case <math>\phi(y_n) = g'(f(x_0))</math> so <math>|\phi(y) - g'(f(x_0))| = 0 \leq \epsilon</math>.
+Differentiability of <math>g</math> at <math>f(x_0)</math> says that if <math>(y_n)_{n=1}^\infty</math> is a sequence taking values in <math>Y \setminus \{y_0\}</math> that converges to <math>f(x_0)</math>, then <math>\frac{g(y_n) - g(f(x_0))}{y_n - f(x_0)} \to g'(f(x_0))</math> as <math>n \to \infty</math>. What if <math>(y_n)_{n=1}^\infty</math> is instead a sequence taking values in <math>Y</math>? Then we can say <math>\phi(y_n) \to g'(f(x_0))</math> as <math>n\to\infty</math>. To show this, let <math>\epsilon > 0</math>. Now we can find <math>N \geq 1</math> such that <math>\left\vert\frac{g(y_n) - g(f(x_0))}{y_n - f(x_0)} - g'(f(x_0))\right\vert \leq \epsilon</math> for all <math>n \geq N</math>. But this means if <math>n \geq N</math>, then we have two cases: either <math>y_n \in Y</math> and <math>y_n \ne f(x_0)</math>, in which case <math>|\phi(y_n) - g'(f(x_0))| \leq \epsilon</math> as above, or else <math>y_n = f(x_0)</math>, in which case <math>\phi(y_n) = g'(f(x_0))</math> so <math>|\phi(y) - g'(f(x_0))| = 0 \leq \epsilon</math>.
 Differentiability of <math>f</math> at <math>x_0</math> implies continuity of <math>f</math> at <math>x_0</math>, so this means that <math>f(x_n) \to f(x_0)</math> as <math>n \to \infty</math>. Since <math>f(x_n) \in Y</math> for each <math>n \geq 1</math>, we can use <math>(f(x_n))_{n=1}^\infty</math> as our sequence in <math>Y</math> to conclude that as <math>n \to \infty</math> we have <math>\phi(f(x_n)) \to g'(f(x_0))</math>.