Derivative of a quadratic form: Difference between revisions

From Machinelearning
Line 20: Line 20:


Focusing on the subexpression <math>(x_0 + h)^{\mathrm T}A(x_0 + h)</math>, since <math>A</math> is a matrix, it is a linear transformation, so we obtain <math>(x_0 + h)^{\mathrm T}(Ax_0 + Ah)</math>. Since the transpose of a sum is the sum of the transposes, we have <math>(x_0^{\mathrm T} + h^{\mathrm T})(Ax_0 + Ah)</math>. Now using linearity we have <math>x_0^{\mathrm T}Ax_0 + h^{\mathrm T} Ax_0 + x_0^{\mathrm T} Ah + h^{\mathrm T}Ah</math>.
Focusing on the subexpression <math>(x_0 + h)^{\mathrm T}A(x_0 + h)</math>, since <math>A</math> is a matrix, it is a linear transformation, so we obtain <math>(x_0 + h)^{\mathrm T}(Ax_0 + Ah)</math>. Since the transpose of a sum is the sum of the transposes, we have <math>(x_0^{\mathrm T} + h^{\mathrm T})(Ax_0 + Ah)</math>. Now using linearity we have <math>x_0^{\mathrm T}Ax_0 + h^{\mathrm T} Ax_0 + x_0^{\mathrm T} Ah + h^{\mathrm T}Ah</math>.
Now the fraction is
:<math>\frac{\|x_0^{\mathrm T}Ax_0 + h^{\mathrm T} Ax_0 + x_0^{\mathrm T} Ah + h^{\mathrm T}Ah - x_0^{\mathrm T}Ax_0 - L(h)\|}{\|h\|} = \frac{\|h^{\mathrm T} Ax_0 + x_0^{\mathrm T} Ah + h^{\mathrm T}Ah - L(h)\|}{\|h\|}</math>
Focusing on <math>h^{\mathrm T} Ax_0</math>, it is a real number so <math>h^{\mathrm T} Ax_0 = (h^{\mathrm T} Ax_0)^{\mathrm T} = x_0^{\mathrm T}A^{\mathrm T}h</math>.


==Using the chain rule==
==Using the chain rule==

Revision as of 23:02, 13 July 2018

Let AMn,n(R) be an n by n real-valued matrix, and let f:RnR be defined by f(x)=xTAx. On this page, we calculate the derivative of f.

Understanding the problem

Straightforward method

Using the definition of the derivative

The derivative is the linear transformation L such that:

limxx0;xx0|f(x)(f(x0)+L(xx0))||xx0|=0

Using our function, this is:

limxx0;xx0|xTAxx0TAx0L(xx0)||xx0|=0

Defining h=xx0, we have x=x0+h and

|(x0+h)TA(x0+h)x0TAx0L(h)||h|

Focusing on the subexpression (x0+h)TA(x0+h), since A is a matrix, it is a linear transformation, so we obtain (x0+h)T(Ax0+Ah). Since the transpose of a sum is the sum of the transposes, we have (x0T+hT)(Ax0+Ah). Now using linearity we have x0TAx0+hTAx0+x0TAh+hTAh.

Now the fraction is

|x0TAx0+hTAx0+x0TAh+hTAhx0TAx0L(h)||h|=|hTAx0+x0TAh+hTAhL(h)||h|

Focusing on hTAx0, it is a real number so hTAx0=(hTAx0)T=x0TATh.

Using the chain rule