Backpropagation derivation using Leibniz notation

From Machinelearning
Revision as of 23:02, 8 November 2018 by IssaRice (talk | contribs)

Throughout this page, let be the number of neurons in the th layer of the neural network.

The cost function depends on only through the activation of the th neuron in the th layer, i.e. on the value of . Thus we can use the chain rule to expand:

We know that because . We have used the chain rule again here.

In turn, depends on only through the activations of the th layer. Thus we can write (using the chain rule once again):

Backpropagation works recursively starting at the later layers. Since we are trying to compute for the th layer, we can assume inductively that we have already computed .

It remains to find . But so we have

Putting all this together, we obtain

Let us verify that we can calculate the right-hand side. We by induction hypothesis, we can calculate . We calculate , , and during the forward pass through the network. Finally, is just a weight in the network, so we already know its value.