Backpropagation derivation using Leibniz notation: Difference between revisions

From Machinelearning
No edit summary
No edit summary
Line 10: Line 10:


where <math>n(l+1)</math> is the number of neurons in the <math>(l+1)</math>th layer.
where <math>n(l+1)</math> is the number of neurons in the <math>(l+1)</math>th layer.
Backpropagation works recursively starting at the later layers. Since we are trying to compute <math>\frac{\partial C}{\partial a^l_j}</math> for the <math>l</math>th layer, we can assume inductively that we have already computed <math>\frac{\partial C}{\partial a^{l+1}_i}</math>.

Revision as of 22:32, 8 November 2018

The cost function C depends on wjkl only through the activation of the jth neuron in the lth layer, i.e. on the value of ajl. Thus we can use the chain rule to expand:

Cwjkl=Cajlajlwjkl

We know that ajlwjkl=σ(zjl)akl1 because ajl=σ(zjl)=σ(kwjklakl1+bjl). We have used the chain rule again here.

In turn, C depends on ajl only through the activations of the (l+1)th layer. Thus we can write:

Cajl=i{1,,n(l+1)}Cail+1ail+1ajl

where n(l+1) is the number of neurons in the (l+1)th layer.

Backpropagation works recursively starting at the later layers. Since we are trying to compute Cajl for the lth layer, we can assume inductively that we have already computed Cail+1.