Backpropagation derivation using Leibniz notation: Difference between revisions
No edit summary |
No edit summary |
||
| Line 3: | Line 3: | ||
Most of the notation on this page is borrowed from Michael Nielsen's book.<ref>[http://neuralnetworksanddeeplearning.com/chap2.html "Chapter 2: How the backpropagation algorithm works"] in ''Neural Networks and Deep Learning''. Michael A. Nielsen. ''Determination Press''. 2015. Retrieved November 8, 2018.</ref> | Most of the notation on this page is borrowed from Michael Nielsen's book.<ref>[http://neuralnetworksanddeeplearning.com/chap2.html "Chapter 2: How the backpropagation algorithm works"] in ''Neural Networks and Deep Learning''. Michael A. Nielsen. ''Determination Press''. 2015. Retrieved November 8, 2018.</ref> | ||
<div style="border: 1px solid black; padding: 5px;">'''Theorem.''' Let <math>N</math> be a neural network with <math>L</math> layers and <math>n(l)</math> be the number of neurons in layer <math>l</math> for <math>l \in \{1, \ldots, L\}</math>. a cost function <math> C = \frac12 \sum_{j=1}^n(L)</div> | |||
< | |||
''Proof.'' The cost function <math>C</math> depends on <math>w^l_{jk}</math> only through the activation of the <math>j</math>th neuron in the <math>l</math>th layer, i.e. on the value of <math>a^l_j</math>. Thus we can use the chain rule to expand: | ''Proof.'' The cost function <math>C</math> depends on <math>w^l_{jk}</math> only through the activation of the <math>j</math>th neuron in the <math>l</math>th layer, i.e. on the value of <math>a^l_j</math>. Thus we can use the chain rule to expand: | ||
Revision as of 23:53, 8 November 2018
This page presents a derivation/proof of backpropagation derivation using Leibniz notation. Leibniz notation is the most common notation for presenting backpropagation, but it is somewhat complicated due to its blurring of the function/value distinction and its reliance on functional relationships being implicit. Those who prefer function notation may wish to refer to backpropagation derivation using function notation instead of (or in addition to) this page.
Most of the notation on this page is borrowed from Michael Nielsen's book.[1]
We know that because . We have used the chain rule again here.
In turn, depends on only through the activations of the th layer. Thus we can write (using the chain rule once again):
Backpropagation works recursively starting at the later layers. Since we are trying to compute for the th layer, we can assume inductively that we have already computed .
It remains to find . But so we have
Putting all this together, we obtain
Let us verify that we can calculate the right-hand side. By induction hypothesis, we can calculate . We calculate , , and during the forward pass through the network. Finally, is just a weight in the network, so we already know its value.
References
- ↑ "Chapter 2: How the backpropagation algorithm works" in Neural Networks and Deep Learning. Michael A. Nielsen. Determination Press. 2015. Retrieved November 8, 2018.