Backpropagation derivation using Leibniz notation: Difference between revisions

Revision as of 22:24, 8 November 2018

The cost function $C$ depends on $w_{j k}^{l}$ only through the activation of the $j$ th neuron in the $l$ th layer, i.e. on the value of $a_{j}^{l}$ . Thus we can use the chain rule to expand:

$\frac{\partial C}{\partial w_{j k}^{l}} = \frac{\partial C}{\partial a_{j}^{l}} \frac{\partial a_{j}^{l}}{\partial w_{j k}^{l}}$

We know that $\frac{\partial a_{j}^{l}}{\partial w_{j k}^{l}} = σ^{'} (z_{j}^{l}) a_{k}^{l - 1}$ because $a_{j}^{l} = σ (z_{j}^{l}) = σ (\sum_{k} w_{j k}^{l} a^{l - 1} + b_{j}^{l})$ .

In turn, $C$ depends on $a_{j}^{l}$ only through the activations of the $(l + 1)$ th layer.

@@ Line 3: / Line 3: @@
 <math>\frac{\partial C}{\partial w^l_{jk}} = \frac{\partial C}{\partial a^l_j} \frac{\partial a^l_j}{\partial w^l_{jk}}</math>
-We know that <math>\frac{\partial a^l_j}{\partial w^l_{jk}} = \sigma'(z^l_j)a^{l-1}_k</math> because <math>a^l_j = \sigma\left(\sum_k w^l_{jk}a^{l-1} + b^l_j\right)</math>.
+We know that <math>\frac{\partial a^l_j}{\partial w^l_{jk}} = \sigma'(z^l_j)a^{l-1}_k</math> because <math>a^l_j = \sigma(z^l_j) = \sigma\left(\sum_k w^l_{jk}a^{l-1} + b^l_j\right)</math>.
 In turn, <math>C</math> depends on <math>a^l_j</math> only through the activations of the <math>(l+1)</math>th layer.