Backpropagation derivation using Leibniz notation: Difference between revisions

From Machinelearning
No edit summary
No edit summary
Line 20: Line 20:


<math display="block">\begin{align}\frac{\partial C}{\partial w^l_{jk}} &= \frac{\partial C}{\partial a^l_j} \frac{\partial a^l_j}{\partial w^l_{jk}} \\ &= \left(\sum_{i=1}^{n(l+1)} \frac{\partial C}{\partial a^{l+1}_i} \frac{\partial a^{l+1}_i}{\partial a^l_j}\right) \sigma'(z^l_j)a^{l-1}_k \\ &= \left(\sum_{i=1}^{n(l+1)} \frac{\partial C}{\partial a^{l+1}_i} \sigma'(z^{l+1}_i)w^{l+1}_{ij}\right) \sigma'(z^l_j)a^{l-1}_k\end{align}</math>
<math display="block">\begin{align}\frac{\partial C}{\partial w^l_{jk}} &= \frac{\partial C}{\partial a^l_j} \frac{\partial a^l_j}{\partial w^l_{jk}} \\ &= \left(\sum_{i=1}^{n(l+1)} \frac{\partial C}{\partial a^{l+1}_i} \frac{\partial a^{l+1}_i}{\partial a^l_j}\right) \sigma'(z^l_j)a^{l-1}_k \\ &= \left(\sum_{i=1}^{n(l+1)} \frac{\partial C}{\partial a^{l+1}_i} \sigma'(z^{l+1}_i)w^{l+1}_{ij}\right) \sigma'(z^l_j)a^{l-1}_k\end{align}</math>
Let us verify that we can calculate the right-hand side. We by induction hypothesis, we can calculate <math>\frac{\partial C}{\partial a^{l+1}_i}</math>. We calculate <math>z^{l+1}_i</math>, <math>z^l_j</math>, and <math>a^{l-1}_k</math> during the forward pass through the network. Finally, <math>w^{l+1}_{ij}</math> is just a weight in the network, so we already know its value.

Revision as of 23:02, 8 November 2018

Throughout this page, let n(l) be the number of neurons in the lth layer of the neural network.

The cost function C depends on wjkl only through the activation of the jth neuron in the lth layer, i.e. on the value of ajl. Thus we can use the chain rule to expand:

Cwjkl=Cajlajlwjkl

We know that ajlwjkl=σ(zjl)akl1 because ajl=σ(zjl)=σ(k=1n(l1)wjklakl1+bjl). We have used the chain rule again here.

In turn, C depends on ajl only through the activations of the (l+1)th layer. Thus we can write (using the chain rule once again):

Cajl=i=1n(l+1)Cail+1ail+1ajl

Backpropagation works recursively starting at the later layers. Since we are trying to compute Cajl for the lth layer, we can assume inductively that we have already computed Cail+1.

It remains to find ail+1ajl. But ail+1=σ(zil+1)=σ(jwijl+1ajl+bil+1) so we have

ail+1ajl=σ(zil+1)wijl+1

Putting all this together, we obtain

Cwjkl=Cajlajlwjkl=(i=1n(l+1)Cail+1ail+1ajl)σ'(zjl)akl1=(i=1n(l+1)Cail+1σ(zil+1)wijl+1)σ'(zjl)akl1

Let us verify that we can calculate the right-hand side. We by induction hypothesis, we can calculate Cail+1. We calculate zil+1, zjl, and akl1 during the forward pass through the network. Finally, wijl+1 is just a weight in the network, so we already know its value.