User:IssaRice/Chain rule proofs: Difference between revisions

From Machinelearning
Line 1: Line 1:
==Using Newton's approximation==
==Using Newton's approximation==
==Main idea==
The main idea of using Newton's approximation to prove the chain rule is that since f is differentiable at <math>x_0</math> we have the approximation <math>f(x) \approx f(x_0) + f'(x_0)(x-x_0)</math> when <math>x</math> is near <math>x_0</math>. Similarly since g is differentiable at <math>f(x_0)</math> we have the approximation <math>g(y) \approx g(f(x_0)) + g'(f(x_0))(y - f(x_0))</math> when <math>y</math> is near <math>f(x_0)</math>. Since f is differentiable at <math>x_0</math>, it is continuous there also, so we know that <math>f(x)</math> is near <math>f(x_0)</math> whenever <math>x</math> is near <math>x_0</math>. This allows us to substitute <math>f(x)</math> into <math>y</math> whenever <math>x</math> is near <math>x_0</math>. So we get <math>g(f(x)) \approx g(f(x_0)) + g'(f(x_0))(f(x) - f(x_0)) \approx g(f(x_0)) + g'(f(x_0))(f'(x_0)(x-x_0))</math>.
==Proof==


Since <math>g</math> is differentiable at <math>y_0</math>, we know <math>g'(y_0)</math> is a real number, and we can write
Since <math>g</math> is differentiable at <math>y_0</math>, we know <math>g'(y_0)</math> is a real number, and we can write

Revision as of 02:10, 28 November 2018

Using Newton's approximation

Main idea

The main idea of using Newton's approximation to prove the chain rule is that since f is differentiable at x0 we have the approximation f(x)f(x0)+f(x0)(xx0) when x is near x0. Similarly since g is differentiable at f(x0) we have the approximation g(y)g(f(x0))+g(f(x0))(yf(x0)) when y is near f(x0). Since f is differentiable at x0, it is continuous there also, so we know that f(x) is near f(x0) whenever x is near x0. This allows us to substitute f(x) into y whenever x is near x0. So we get g(f(x))g(f(x0))+g(f(x0))(f(x)f(x0))g(f(x0))+g(f(x0))(f(x0)(xx0)).

Proof

Since g is differentiable at y0, we know g(y0) is a real number, and we can write

g(y)=g(y0)+g(y0)(yy0)+[g(y)(g(y0)+g(y0)(yy0))]

(there is no magic: the terms just cancel out)

If we define Eg(Δy):=g(y)(g(y0)+g(y0)(yy0)) we can write

g(y)=g(y0)+g(f(x0))(yy0)+Eg(Δy)

Newton's approximation says that |Eg(Δy)|ϵ|yy0| as long as |yy0|δ.

Since f is differentiable at x0, we know that it must be continuous at x0. This means we can keep |f(x)y0|δ as long as we keep |xx0|δ.

Since f(x)Y and |f(x)y0|δ, this means we can substitute y=f(x) and get

g(f(x))=g(y0)+g(f(x0))(f(x)y0)+Eg(Δf)

Now we use the differentiability of f. We can write

f(x)=f(x0)+f(x0)(xx0)+[f(x)(f(x0)+f(x0)(xx0))]

Again, we can define Ef(Δx):=f(x)(f(x0)+f(x0)(xx0)) and write this as

f(x)=f(x0)+f(x0)(xx0)+Ef(Δx)

Now we can substitute this into the expression for g(f(x)) to get

g(f(x))=g(y0)+g(f(x0))(f(x0)(xx0)+Ef(Δx))+Eg(Δf)

where we have canceled out two terms using f(x0)=y0.

Thus we have

g(f(x))=g(y0)+g(f(x0))f(x0)(xx0)+[g(f(x0))Ef(Δx)+Eg(Δf)]

We can write this as

(gf)(x)((gf)(x0)+L(xx0))=[g(f(x0))Ef(Δx)+Eg(Δf)]

where L:=g(f(x0))f(x0). Now the left hand side looks like the expression in Newton's approximation. This means to show gf is differentiable at x0, we just need to show that |g(f(x0))Ef(Δx)+Eg(Δf)|ϵ|xx0|.

The stuff in square brackets is our "error term" for gf. Now we just need to make sure it is small, even after dividing by |xx0|.

But f is differentiable at x0, so by Newton's approximation,

|g(f(x0))Ef(Δx)||g(f(x0))|ϵ1|xx0|

we also have

|Eg(Δf)|ϵ2|f(x)f(x0)|=ϵ2|f(x0)(xx0)+Ef(Δx)|

We can bound this from above using the triangle inequality:

|Eg(Δf)|ϵ2|f'(x0)(xx0)|+ϵ2|Ef(Δx)|ϵ2|f'(x0)||xx0|+ϵ2ϵ1|xx0|

Now we can just choose ϵ1,ϵ2 small enough.

Limits of sequences