Backpropagation derivation using Leibniz notation: Difference between revisions
No edit summary |
No edit summary |
||
| Line 7: | Line 7: | ||
In turn, <math>C</math> depends on <math>a^l_j</math> only through the activations of the <math>(l+1)</math>th layer. Thus we can write: | In turn, <math>C</math> depends on <math>a^l_j</math> only through the activations of the <math>(l+1)</math>th layer. Thus we can write: | ||
<math>\frac{\partial C}{\partial a^l_j} = \sum_{i \in \{1,\ldots,n(l+1)\}} \frac{\partial C}{\partial a^{l+1}_i} \frac{\partial a^{l+1}_i}{a^l_j}</math> | <math>\frac{\partial C}{\partial a^l_j} = \sum_{i \in \{1,\ldots,n(l+1)\}} \frac{\partial C}{\partial a^{l+1}_i} \frac{\partial a^{l+1}_i}{\partial a^l_j}</math> | ||
where <math>n(l+1)</math> is the number of neurons in the <math>(l+1)</math>th layer. | where <math>n(l+1)</math> is the number of neurons in the <math>(l+1)</math>th layer. | ||
Revision as of 22:28, 8 November 2018
The cost function depends on only through the activation of the th neuron in the th layer, i.e. on the value of . Thus we can use the chain rule to expand:
We know that because . We have used the chain rule again here.
In turn, depends on only through the activations of the th layer. Thus we can write:
where is the number of neurons in the th layer.