|
|
| Line 7: |
Line 7: |
| In turn, <math>C</math> depends on <math>a^l_j</math> only through the activations of the <math>(l+1)</math>th layer. Thus we can write (using the chain rule once again): | | In turn, <math>C</math> depends on <math>a^l_j</math> only through the activations of the <math>(l+1)</math>th layer. Thus we can write (using the chain rule once again): |
|
| |
|
| <math display="block">\frac{\partial C}{\partial a^l_j} = \sum_{i \in \{1,\ldots,n(l+1)\}} \frac{\partial C}{\partial a^{l+1}_i} \frac{\partial a^{l+1}_i}{\partial a^l_j}</math> | | <math display="block">\frac{\partial C}{\partial a^l_j} = \sum_{i=1}^n(l+1) \frac{\partial C}{\partial a^{l+1}_i} \frac{\partial a^{l+1}_i}{\partial a^l_j}</math> |
|
| |
|
| where <math>n(l+1)</math> is the number of neurons in the <math>(l+1)</math>th layer. | | where <math>n(l+1)</math> is the number of neurons in the <math>(l+1)</math>th layer. |
Revision as of 22:47, 8 November 2018
The cost function
depends on
only through the activation of the
th neuron in the
th layer, i.e. on the value of
. Thus we can use the chain rule to expand:

We know that
because
. We have used the chain rule again here.
In turn,
depends on
only through the activations of the
th layer. Thus we can write (using the chain rule once again):

where
is the number of neurons in the
th layer.
Backpropagation works recursively starting at the later layers. Since we are trying to compute
for the
th layer, we can assume inductively that we have already computed
.
It remains to find
. But
so we have

Putting all this together, we obtain
