Backpropagation derivation using Leibniz notation

The cost function $C$ depends on $w_{j k}^{l}$ only through the activation of the $j$ th neuron in the $l$ th layer, i.e. on the value of $a_{j}^{l}$ . Thus we can use the chain rule to expand:

$\frac{\partial C}{\partial w_{j k}^{l}} = \frac{\partial C}{\partial a_{j}^{l}} \frac{\partial a_{j}^{l}}{\partial w_{j k}^{l}}$

We know that $\frac{\partial a_{j}^{l}}{\partial w_{j k}^{l}} = σ^{'} (z_{j}^{l}) a_{k}^{l - 1}$ because $a_{j}^{l} = σ (z_{j}^{l}) = σ (\sum_{k} w_{j k}^{l} a_{k}^{l - 1} + b_{j}^{l})$ . We have used the chain rule again here.

In turn, $C$ depends on $a_{j}^{l}$ only through the activations of the $(l + 1)$ th layer. Thus we can write:

$\frac{\partial C}{\partial a_{j}^{l}} = \sum_{i \in {1, \dots, n (l + 1)}} \frac{\partial C}{\partial a_{i}^{l + 1}} \frac{\partial a_{i}^{l + 1}}{a_{j}^{l}}$

where $n (l + 1)$ is the number of neurons in the $(l + 1)$ th layer.