Backpropagation derivation using Leibniz notation

From Machinelearning
Revision as of 22:28, 8 November 2018 by IssaRice (talk | contribs)

The cost function depends on only through the activation of the th neuron in the th layer, i.e. on the value of . Thus we can use the chain rule to expand:

We know that because . We have used the chain rule again here.

In turn, depends on only through the activations of the th layer. Thus we can write:

where is the number of neurons in the th layer.