User:IssaRice/Chain rule proofs: Difference between revisions

From Machinelearning
Line 17: Line 17:
(there is no magic: the terms just cancel out)
(there is no magic: the terms just cancel out)


If we define <math>E_g(\Delta y) := g(y) - (g(y_0) + g'(y_0)(y-y_0))</math> we can write
If we define <math>E_g(y,y_0) := g(y) - (g(y_0) + g'(y_0)(y-y_0))</math> we can write


<math display="block">g(y) = g(y_0) + g'(f(x_0))(y - y_0) + E_g(\Delta y)</math>
<math display="block">g(y) = g(y_0) + g'(f(x_0))(y - y_0) + E_g(y,y_0)</math>


Newton's approximation says that <math>|E_g(\Delta y)| \leq \epsilon|y-y_0|</math> as long as <math>|y-y_0|\leq \delta</math>.
Newton's approximation says that <math>|E_g(y,y_0)| \leq \epsilon|y-y_0|</math> as long as <math>|y-y_0|\leq \delta</math>.


Since <math>f</math> is differentiable at <math>x_0</math>, we know that it must be continuous at <math>x_0</math>. This means we can keep <math>|f(x)-y_0|\leq \delta</math> as long as we keep <math>|x-x_0|\leq \delta'</math>.
Since <math>f</math> is differentiable at <math>x_0</math>, we know that it must be continuous at <math>x_0</math>. This means we can keep <math>|f(x)-y_0|\leq \delta</math> as long as we keep <math>|x-x_0|\leq \delta'</math>.
Line 27: Line 27:
Since <math>f(x) \in Y</math> and <math>|f(x)-y_0|\leq \delta</math>, this means we can substitute <math>y = f(x)</math> and get
Since <math>f(x) \in Y</math> and <math>|f(x)-y_0|\leq \delta</math>, this means we can substitute <math>y = f(x)</math> and get


<math display="block">g(f(x)) = g(y_0) + g'(f(x_0))(f(x) - y_0) + E_g(\Delta f)</math>
<math display="block">g(f(x)) = g(y_0) + g'(f(x_0))(f(x) - y_0) + E_g(f(x),y_0)</math>


Now we use the differentiability of <math>f</math>. We can write
Now we use the differentiability of <math>f</math>. We can write
Line 33: Line 33:
<math display="block">f(x) = f(x_0) + f'(x_0)(x - x_0) + [f(x) - (f(x_0) + f'(x_0)(x-x_0))]</math>
<math display="block">f(x) = f(x_0) + f'(x_0)(x - x_0) + [f(x) - (f(x_0) + f'(x_0)(x-x_0))]</math>


Again, we can define <math>E_f(\Delta x) := f(x) - (f(x_0) + f'(x_0)(x-x_0))</math> and write this as
Again, we can define <math>E_f(x,x_0) := f(x) - (f(x_0) + f'(x_0)(x-x_0))</math> and write this as


<math display="block">f(x) = f(x_0) + f'(x_0)(x - x_0) + E_f(\Delta x)</math>
<math display="block">f(x) = f(x_0) + f'(x_0)(x - x_0) + E_f(x,x_0)</math>


Now we can substitute this into the expression for <math>g(f(x))</math> to get
Now we can substitute this into the expression for <math>g(f(x))</math> to get


<math display="block">g(f(x)) = g(y_0) + g'(f(x_0))(f'(x_0)(x - x_0) + E_f(\Delta x)) + E_g(\Delta f)</math>
<math display="block">g(f(x)) = g(y_0) + g'(f(x_0))(f'(x_0)(x - x_0) + E_f(x,x_0)) + E_g(f(x),f(x_0))</math>


where we have canceled out two terms using <math>f(x_0) = y_0</math>.
where we have canceled out two terms using <math>f(x_0) = y_0</math>.
Line 45: Line 45:
Thus we have
Thus we have


<math display="block">g(f(x)) = g(y_0) + g'(f(x_0))f'(x_0)(x - x_0) + [g'(f(x_0))E_f(\Delta x) + E_g(\Delta f)]</math>
<math display="block">g(f(x)) = g(y_0) + g'(f(x_0))f'(x_0)(x - x_0) + [g'(f(x_0))E_f(x,x_0) + E_g(f(x), f(x_0))]</math>


We can write this as
We can write this as


<math display="block">(g\circ f)(x) - ((g\circ f)(x_0) + L(x - x_0)) = [g'(f(x_0))E_f(\Delta x) + E_g(\Delta f)]</math>
<math display="block">(g\circ f)(x) - ((g\circ f)(x_0) + L(x - x_0)) = [g'(f(x_0))E_f(x,x_0) + E_g(f(x), f(x_0))]</math>


where <math>L := g'(f(x_0))f'(x_0)</math>. Now the left hand side looks like the expression in Newton's approximation. This means to show <math>g\circ f</math> is differentiable at <math>x_0</math>, we just need to show that <math>|g'(f(x_0))E_f(\Delta x) + E_g(\Delta f)| \leq \epsilon|x - x_0|</math>.
where <math>L := g'(f(x_0))f'(x_0)</math>. Now the left hand side looks like the expression in Newton's approximation. This means to show <math>g\circ f</math> is differentiable at <math>x_0</math>, we just need to show that <math>|g'(f(x_0))E_f(x,x_0) + E_g(f(x), f(x_0))| \leq \epsilon|x - x_0|</math>.


The stuff in square brackets is our "error term" for <math>g\circ f</math>. Now we just need to make sure it is small, even after dividing by <math>|x-x_0|</math>.
The stuff in square brackets is our "error term" for <math>g\circ f</math>. Now we just need to make sure it is small, even after dividing by <math>|x-x_0|</math>.
Line 57: Line 57:
But f is differentiable at <math>x_0</math>, so by Newton's approximation,
But f is differentiable at <math>x_0</math>, so by Newton's approximation,


<math display="block">|g'(f(x_0))E_f(\Delta x)| \leq |g'(f(x_0))| \epsilon_1 |x-x_0|</math>
<math display="block">|g'(f(x_0))E_f(x,x_0)| \leq |g'(f(x_0))| \epsilon_1 |x-x_0|</math>


we also have
we also have


<math display="block">|E_g(\Delta f)| \leq \epsilon_2 |f(x)-f(x_0)| = \epsilon_2 |f'(x_0)(x-x_0) + E_f(\Delta x)|</math>
<math display="block">|E_g(f(x), f(x_0))| \leq \epsilon_2 |f(x)-f(x_0)| = \epsilon_2 |f'(x_0)(x-x_0) + E_f(x,x_0)|</math>


We can bound this from above using the triangle inequality:
We can bound this from above using the triangle inequality:


<math display="block">\begin{align}|E_g(\Delta f)| &\leq \epsilon_2 |f'(x_0)(x-x_0)| + \epsilon_2 |E_f(\Delta x)| \\ &\leq \epsilon_2 |f'(x_0)| |x-x_0| + \epsilon_2 \epsilon_1 |x-x_0|\end{align}</math>
<math display="block">\begin{align}|E_g(f(x), f(x_0))| &\leq \epsilon_2 |f'(x_0)(x-x_0)| + \epsilon_2 |E_f(x,x_0)| \\ &\leq \epsilon_2 |f'(x_0)| |x-x_0| + \epsilon_2 \epsilon_1 |x-x_0|\end{align}</math>


Now we can just choose <math>\epsilon_1, \epsilon_2</math> small enough.
Now we can just choose <math>\epsilon_1, \epsilon_2</math> small enough.


==Limits of sequences==
==Limits of sequences==

Revision as of 02:21, 28 November 2018

Using Newton's approximation

Main idea

The main idea of using Newton's approximation to prove the chain rule is that since f is differentiable at x0 we have the approximation f(x)f(x0)+f(x0)(xx0) when x is near x0. Similarly since g is differentiable at f(x0) we have the approximation g(y)g(f(x0))+g(f(x0))(yf(x0)) when y is near f(x0). Since f is differentiable at x0, it is continuous there also, so we know that f(x) is near f(x0) whenever x is near x0. This allows us to substitute f(x) into y whenever x is near x0. So we get

g(f(x))g(f(x0))+g'(f(x0))(f(x)f(x0))g(f(x0))+g'(f(x0))(f'(x0)(xx0))

Thus we get gf(x)gf(x0)+g(f(x0))f(x0)(xx0), which is what the chain rule says.

Proof

Since g is differentiable at y0, we know g(y0) is a real number, and we can write

g(y)=g(y0)+g(y0)(yy0)+[g(y)(g(y0)+g(y0)(yy0))]

(there is no magic: the terms just cancel out)

If we define Eg(y,y0):=g(y)(g(y0)+g(y0)(yy0)) we can write

g(y)=g(y0)+g(f(x0))(yy0)+Eg(y,y0)

Newton's approximation says that |Eg(y,y0)|ϵ|yy0| as long as |yy0|δ.

Since f is differentiable at x0, we know that it must be continuous at x0. This means we can keep |f(x)y0|δ as long as we keep |xx0|δ.

Since f(x)Y and |f(x)y0|δ, this means we can substitute y=f(x) and get

g(f(x))=g(y0)+g(f(x0))(f(x)y0)+Eg(f(x),y0)

Now we use the differentiability of f. We can write

f(x)=f(x0)+f(x0)(xx0)+[f(x)(f(x0)+f(x0)(xx0))]

Again, we can define Ef(x,x0):=f(x)(f(x0)+f(x0)(xx0)) and write this as

f(x)=f(x0)+f(x0)(xx0)+Ef(x,x0)

Now we can substitute this into the expression for g(f(x)) to get

g(f(x))=g(y0)+g(f(x0))(f(x0)(xx0)+Ef(x,x0))+Eg(f(x),f(x0))

where we have canceled out two terms using f(x0)=y0.

Thus we have

g(f(x))=g(y0)+g(f(x0))f(x0)(xx0)+[g(f(x0))Ef(x,x0)+Eg(f(x),f(x0))]

We can write this as

(gf)(x)((gf)(x0)+L(xx0))=[g(f(x0))Ef(x,x0)+Eg(f(x),f(x0))]

where L:=g(f(x0))f(x0). Now the left hand side looks like the expression in Newton's approximation. This means to show gf is differentiable at x0, we just need to show that |g(f(x0))Ef(x,x0)+Eg(f(x),f(x0))|ϵ|xx0|.

The stuff in square brackets is our "error term" for gf. Now we just need to make sure it is small, even after dividing by |xx0|.

But f is differentiable at x0, so by Newton's approximation,

|g(f(x0))Ef(x,x0)||g(f(x0))|ϵ1|xx0|

we also have

|Eg(f(x),f(x0))|ϵ2|f(x)f(x0)|=ϵ2|f(x0)(xx0)+Ef(x,x0)|

We can bound this from above using the triangle inequality:

|Eg(f(x),f(x0))|ϵ2|f'(x0)(xx0)|+ϵ2|Ef(x,x0)|ϵ2|f'(x0)||xx0|+ϵ2ϵ1|xx0|

Now we can just choose ϵ1,ϵ2 small enough.

Limits of sequences