Derivative of a quadratic form: Difference between revisions

From Machinelearning
Line 30: Line 30:


:<math>\frac{\|x_0^{\mathrm T}A^{\mathrm T}h + x_0^{\mathrm T} Ah + h^{\mathrm T}Ah - L(h)\|}{\|h\|} = \frac{\|x_0^{\mathrm T}(A^{\mathrm T} + A)h + h^{\mathrm T}Ah - L(h)\|}{\|h\|}</math>
:<math>\frac{\|x_0^{\mathrm T}A^{\mathrm T}h + x_0^{\mathrm T} Ah + h^{\mathrm T}Ah - L(h)\|}{\|h\|} = \frac{\|x_0^{\mathrm T}(A^{\mathrm T} + A)h + h^{\mathrm T}Ah - L(h)\|}{\|h\|}</math>
In the numerator, <math>h^{\mathrm T}Ah</math> is a higher order term that will disappear when taking the limit, so the linear transformation we are looking for must be <math>L(h) = x_0^{\mathrm T}(A^{\mathrm T} + A)h</math>. Since <math>A</math> is symmetric, we have <math>A^{\mathrm T} + A = 2A</math> and <math>L(h) = 2x_0^{\mathrm T}Ah</math>.


==Using the chain rule==
==Using the chain rule==

Revision as of 23:09, 13 July 2018

Let AMn,n(R) be an n by n real-valued matrix, and let f:RnR be defined by f(x)=xTAx. On this page, we calculate the derivative of f.

Understanding the problem

Straightforward method

Using the definition of the derivative

The derivative is the linear transformation L such that:

limxx0;xx0|f(x)(f(x0)+L(xx0))||xx0|=0

Using our function, this is:

limxx0;xx0|xTAxx0TAx0L(xx0)||xx0|=0

Defining h=xx0, we have x=x0+h and

|(x0+h)TA(x0+h)x0TAx0L(h)||h|

Focusing on the subexpression (x0+h)TA(x0+h), since A is a matrix, it is a linear transformation, so we obtain (x0+h)T(Ax0+Ah). Since the transpose of a sum is the sum of the transposes, we have (x0T+hT)(Ax0+Ah). Now using linearity we have x0TAx0+hTAx0+x0TAh+hTAh.

Now the fraction is

|x0TAx0+hTAx0+x0TAh+hTAhx0TAx0L(h)||h|=|hTAx0+x0TAh+hTAhL(h)||h|

Focusing on hTAx0, it is a real number so taking the transpose leaves it unchanged: hTAx0=(hTAx0)T=x0TATh.

Now the fraction is

|x0TATh+x0TAh+hTAhL(h)||h|=|x0T(AT+A)h+hTAhL(h)||h|

In the numerator, hTAh is a higher order term that will disappear when taking the limit, so the linear transformation we are looking for must be L(h)=x0T(AT+A)h. Since A is symmetric, we have AT+A=2A and L(h)=2x0TAh.

Using the chain rule