Backpropagation derivation using Leibniz notation

From Machinelearning
Revision as of 22:21, 8 November 2018 by IssaRice (talk | contribs) (Created page with "The cost function <math>C</math> depends on <math>w^l_{jk}</math> only through the activation of the <math>j</math>th neuron in the <math>l</math>th layer, i.e. on the value o...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The cost function C depends on wjkl only through the activation of the jth neuron in the lth layer, i.e. on the value of ajl. Thus we can use the chain rule to expand:

Cwjkl=Cajlajlwjkl

We know that ajlwjkl=σ(zjl)akl1 because ajl=σ(kwjklal1+bjl).

In turn, C depends on ajl only through the activations of the (l+1)th layer.