User:IssaRice/Chain rule proofs

Using Newton's approximation

Main idea

The main idea of using Newton's approximation to prove the chain rule is that since f is differentiable at $x_{0}$ we have the approximation $f(x)\approx f(x_{0})+f'(x_{0})(x-x_{0})$ when $x$ is near $x_{0}$ . Similarly since g is differentiable at $f(x_{0})$ we have the approximation $g(y)\approx g(f(x_{0}))+g'(f(x_{0}))(y-f(x_{0}))$ when $y$ is near $f(x_{0})$ . Since f is differentiable at $x_{0}$ , it is continuous there also, so we know that $f(x)$ is near $f(x_{0})$ whenever $x$ is near $x_{0}$ . This allows us to substitute $f(x)$ into $y$ whenever $x$ is near $x_{0}$ . So we get

{\begin{aligned}g(f(x))&\approx g(f(x_{0}))+g'(f(x_{0}))(f(x)-f(x_{0}))\\&\approx g(f(x_{0}))+g'(f(x_{0}))(f'(x_{0})(x-x_{0}))\end{aligned}}

Thus we get $g\circ f(x)\approx g\circ f(x_{0})+g'(f(x_{0}))f'(x_{0})(x-x_{0})$ , which is what the chain rule says.

Proof

We want to show $g\circ f$ is differentiable at $x_{0}$ with derivative $L:=g'(f(x_{0}))f'(x_{0})$ . By Newton's approximation, this is equivalent to showing that for every $\epsilon >0$ there exists $\delta >0$ such that

|g\circ f(x)-(g\circ f(x_{0})+L(x-x_{0}))|\leq \epsilon |x-x_{0}|

whenever $|x-x_{0}|\leq \delta$ . So let $\epsilon >0$ .

Now we do some algebraic manipulation. Write

g(y)=g(y_{0})+g'(y_{0})(y-y_{0})+E_{g}(y,y_{0})

where $E_{g}(y,y_{0}):=g(y)-(g(y_{0})+g'(y_{0})(y-y_{0}))$ . This holds for every $y\in Y$ . Since $f(x)\in Y$ we thus have

g(f(x))=g(f(x_{0}))+g'(f(x_{0}))(f(x)-f(x_{0}))+E_{g}(f(x),f(x_{0}))

Similarly write

f(x)=f(x_{0})+f'(x_{0})(x-x_{0})+E_{f}(x,x_{0})

where $E_{f}(x,x_{0}):=f(x)-(f(x_{0})+f'(x_{0})(x-x_{0}))$ .

Substituting the expression for $f(x)$ in the expression for $g(f(x))$ we get

{\begin{aligned}g(f(x))&=g(f(x_{0}))+g'(f(x_{0}))(f'(x_{0})(x-x_{0})+E_{f}(x,x_{0}))+E_{g}(f(x),f(x_{0}))\\&=g(f(x_{0}))+g'(f(x_{0}))f'(x_{0})(x-x_{0})+g'(f(x_{0}))E_{f}(x,x_{0})+E_{g}(f(x),f(x_{0}))\end{aligned}}

we can rewrite this as $g\circ f(x)-(g\circ f(x_{0})+L(x-x_{0}))=g'(f(x_{0}))E_{f}(x,x_{0})+E_{g}(f(x),f(x_{0}))$

Thus our goal now is to show $|g'(f(x_{0}))E_{f}(x,x_{0})+E_{g}(f(x),f(x_{0}))|\leq \epsilon |x-x_{0}|$ .

But by the triangle inequality it suffices to show $|g'(f(x_{0}))E_{f}(x,x_{0})|+|E_{g}(f(x),f(x_{0}))|\leq \epsilon |x-x_{0}|$ .

$|g'(f(x_{0}))E_{f}(x,x_{0})|\leq |g'(f(x_{0}))|\epsilon _{1}|x-x_{0}|$ where we are free to choose $\epsilon _{1}$ .

To get the bound for $|E_{g}(f(x),f(x_{0}))|$ (using Newton's approximation), we need to make sure $|f(x)-f(x_{0})|$ is small. But by continuity of $f$ at $x_{0}$ we can do this.

{\begin{aligned}|E_{g}(f(x),f(x_{0}))|&\leq \epsilon _{2}|f(x)-f(x_{0})|\\&=\epsilon _{2}|f'(x_{0})(x-x_{0})+E_{f}(x,x_{0})|\\&\leq \epsilon _{2}|f'(x_{0})||x-x_{0}|+\epsilon _{2}\epsilon _{3}|x-x_{0}|\\&=(\epsilon _{2}|f'(x_{0})|+\epsilon _{2}\epsilon _{3})|x-x_{0}|\end{aligned}}

where again we are free to choose $\epsilon _{2},\epsilon _{3}$ .

TODO: can we do this same proof but without using the error term notation?

TODO: somehow Folland does this without explicitly using continuity of f; i need to understand if he's using it implicitly somehow or he's actually proving it when bounding $|\mathbf {h} |$ using $|u|$

old proof

Since $g$ is differentiable at $y_{0}$ , we know $g'(y_{0})$ is a real number, and we can write

g(y)=g(y_{0})+g'(y_{0})(y-y_{0})+[g(y)-(g(y_{0})+g'(y_{0})(y-y_{0}))]

(there is no magic: the terms just cancel out)

If we define $E_{g}(y,y_{0}):=g(y)-(g(y_{0})+g'(y_{0})(y-y_{0}))$ we can write

g(y)=g(y_{0})+g'(f(x_{0}))(y-y_{0})+E_{g}(y,y_{0})

Newton's approximation says that $|E_{g}(y,y_{0})|\leq \epsilon |y-y_{0}|$ as long as $|y-y_{0}|\leq \delta$ .

Since $f$ is differentiable at $x_{0}$ , we know that it must be continuous at $x_{0}$ . This means we can keep $|f(x)-y_{0}|\leq \delta$ as long as we keep $|x-x_{0}|\leq \delta '$ .

Since $f(x)\in Y$ and $|f(x)-y_{0}|\leq \delta$ , this means we can substitute $y=f(x)$ and get

g(f(x))=g(y_{0})+g'(f(x_{0}))(f(x)-y_{0})+E_{g}(f(x),y_{0})

Now we use the differentiability of $f$ . We can write

f(x)=f(x_{0})+f'(x_{0})(x-x_{0})+[f(x)-(f(x_{0})+f'(x_{0})(x-x_{0}))]

Again, we can define $E_{f}(x,x_{0}):=f(x)-(f(x_{0})+f'(x_{0})(x-x_{0}))$ and write this as

f(x)=f(x_{0})+f'(x_{0})(x-x_{0})+E_{f}(x,x_{0})

Now we can substitute this into the expression for $g(f(x))$ to get

g(f(x))=g(y_{0})+g'(f(x_{0}))(f'(x_{0})(x-x_{0})+E_{f}(x,x_{0}))+E_{g}(f(x),f(x_{0}))

where we have canceled out two terms using $f(x_{0})=y_{0}$ .

Thus we have

g(f(x))=g(y_{0})+g'(f(x_{0}))f'(x_{0})(x-x_{0})+[g'(f(x_{0}))E_{f}(x,x_{0})+E_{g}(f(x),f(x_{0}))]

We can write this as

(g\circ f)(x)-((g\circ f)(x_{0})+L(x-x_{0}))=[g'(f(x_{0}))E_{f}(x,x_{0})+E_{g}(f(x),f(x_{0}))]

where $L:=g'(f(x_{0}))f'(x_{0})$ . Now the left hand side looks like the expression in Newton's approximation. This means to show $g\circ f$ is differentiable at $x_{0}$ , we just need to show that $|g'(f(x_{0}))E_{f}(x,x_{0})+E_{g}(f(x),f(x_{0}))|\leq \epsilon |x-x_{0}|$ .

The stuff in square brackets is our "error term" for $g\circ f$ . Now we just need to make sure it is small, even after dividing by $|x-x_{0}|$ .

But f is differentiable at $x_{0}$ , so by Newton's approximation,

|g'(f(x_{0}))E_{f}(x,x_{0})|\leq |g'(f(x_{0}))|\epsilon _{1}|x-x_{0}|

we also have

|E_{g}(f(x),f(x_{0}))|\leq \epsilon _{2}|f(x)-f(x_{0})|=\epsilon _{2}|f'(x_{0})(x-x_{0})+E_{f}(x,x_{0})|

We can bound this from above using the triangle inequality:

{\begin{aligned}|E_{g}(f(x),f(x_{0}))|&\leq \epsilon _{2}|f'(x_{0})(x-x_{0})|+\epsilon _{2}|E_{f}(x,x_{0})|\\&\leq \epsilon _{2}|f'(x_{0})||x-x_{0}|+\epsilon _{2}\epsilon _{1}|x-x_{0}|\end{aligned}}

Now we can just choose $\epsilon _{1},\epsilon _{2}$ small enough.

Limits of sequences

Main idea

Let $(x_{n})_{n=1}^{\infty }$ be a sequence in $X\setminus \{x_{0}\}$ that converges to $x_{0}$ . Then we want to write

{\frac {g(f(x_{n}))-g(f(x_{0}))}{x_{n}-x_{0}}}={\frac {g(f(x_{n}))-g(f(x_{0}))}{f(x_{n})-f(x_{0})}}\cdot {\frac {f(x_{n})-f(x_{0})}{x_{n}-x_{0}}}

Now use the limit laws to conclude that the limit is $g'(f(x_{0}))\cdot f'(x_{0})$ . The problem is that $f(x_{n})-f(x_{0})$ can be zero even when $x_{n}\neq x_{0}$ .

Proof

Let $(x_{n})_{n=1}^{\infty }$ be a sequence in $X\setminus \{x_{0}\}$ that converges to $x_{0}$ .

Define a function $\phi :Y\to \mathbf {R}$ by

\phi (y):={\begin{cases}{\frac {g(y)-g(f(x_{0}))}{y-f(x_{0})}}&{\text{if }}y\neq f(x_{0})\\g'(f(x_{0}))&{\text{if }}y=f(x_{0})\end{cases}}

The idea is that we want to say ${\frac {g(f(x_{n}))-g(f(x_{0}))}{f(x_{n})-f(x_{0})}}$ is going to $g'(f(x_{0}))$ , so we just define it at the undefined points to already be at that limit.

Now we have

{\frac {g(f(x_{n}))-g(f(x_{0}))}{x_{n}-x_{0}}}=\phi (f(x_{n}))\cdot {\frac {f(x_{n})-f(x_{0})}{x_{n}-x_{0}}}

for all $x_{n}$ . (Why? Consider the cases $f(x_{n})=f(x_{0})$ and $f(x_{n})\neq f(x_{0})$ separately.)

Differentiability of $g$ at $f(x_{0})$ says that if $(y_{n})_{n=1}^{\infty }$ is a sequence in $Y\setminus \{y_{0}\}$ that converges to $f(x_{0})$ , then ${\frac {g(y_{n})-g(f(x_{0}))}{y_{n}-f(x_{0})}}\to g'(f(x_{0}))$ as $n\to \infty$ . What if $(y_{n})_{n=1}^{\infty }$ is instead a sequence in $Y$ ? Then we can say $\phi (y_{n})\to g'(f(x_{0}))$ as $n\to \infty$ . To show this, let $\epsilon >0$ . Now we can find $N\geq 1$ such that $\left\vert {\frac {g(y_{n})-g(f(x_{0}))}{y_{n}-f(x_{0})}}-g'(f(x_{0}))\right\vert \leq \epsilon$ for all $n\geq N$ . But this means if $n\geq N$ , then we have two cases: either $y_{n}\in Y$ and $y_{n}\neq f(x_{0})$ , in which case $|\phi (y_{n})-g'(f(x_{0}))|\leq \epsilon$ as above, or else $y_{n}=f(x_{0})$ , in which case $\phi (y_{n})=g'(f(x_{0}))$ so $|\phi (y)-g'(f(x_{0}))|=0\leq \epsilon$ .

Differentiability of $f$ at $x_{0}$ implies continuity of $f$ at $x_{0}$ , so this means that $f(x_{n})\to f(x_{0})$ as $n\to \infty$ . Since $f(x_{n})\in Y$ for each $n\geq 1$ , we can use $(f(x_{n}))_{n=1}^{\infty }$ as our sequence in $Y$ to conclude that as $n\to \infty$ we have $\phi (f(x_{n}))\to g'(f(x_{0}))$ .

Now by the limit laws

{\begin{aligned}\lim _{n\to \infty }{\frac {g(f(x_{n}))-g(f(x_{0}))}{x_{n}-x_{0}}}&=\left(\lim _{n\to \infty }\phi (f(x_{n}))\right)\left(\lim _{n\to \infty }{\frac {f(x_{n})-f(x_{0})}{x_{n}-x_{0}}}\right)\\&=g'(f(x_{0}))f'(x_{0})\end{aligned}}

Since the sequence $(x_{n})_{n=1}^{\infty }$ was arbitrary, we can conclude that $\lim _{x\to x_{0};\,x\in X\setminus \{x_{0}\}}{\frac {g(f(x))-g(f(x_{0}))}{x-x_{0}}}=g'(f(x_{0}))f'(x_{0})$ .

${\frac {g(f(x_{n}))-g(f(x_{0}))}{f(x_{n})-f(x_{0})}}\to g'(f(x_{0}))$

TODO: Tao says that division by zero occurs when $f'(x_{0})=0$ , which seems strange to me.