Derivative of a quadratic form

From Machinelearning
Revision as of 00:16, 14 July 2018 by IssaRice (talk | contribs)

Let AMn,n(R) be an n by n symmetric real-valued matrix, and let f:RnR be defined by f(x)=xTAx. On this page, we calculate the derivative of f.

Understanding the problem

Straightforward method

This method is the most straightforward, and involves breaking apart the matrix and vector into components and performing the differentiation. While straightforward, it appears messy due to the indices involved.

Let A=(aki) and x=(x1,,xn).

We expand

xTAx=xT(i=1na1ixii=1nanixi)=k=1nxki=1nakixi

Now we find the partial derivative of the above with respect to xj. To distinguish the constants from the variable, it makes sense to split the sum:

k=1nxki=1nakixi=xji=1najixi+kjxki=1nakixi=xj(ajjxj+ijajixi)+kjxk(akjxj+ijakixi)

The first equality comes from splitting the outer summation, and the second comes from splitting the two inner summations.

Now distributing we have

ajjxj2+(ijajixi)xj+kj(akjxkxj+xkijakixi)=ajjxj2+(ijajixi)xj+(kjakjxk)xj+kjxkijakixi

It is now easy to do the differentiation. We obtain

2ajjxj+ijajixi+kjakjxk

Since the matrix is symmetric, akj=ajk so kjakjxk=kjajkxk=ijajixi. The final equality follows because k is just an indexing variable and we are free to rename it. But now the derivative becomes

2ajjxj+2ijajixi=2i=1najixi

But this is just the jth component of 2Ax. It follows that the full derivative is just 2Ax (or its transpose, depending on whether we want to view it as a row or column vector).

Using the definition of the derivative

This is an expanded version of the answer at [1].

The derivative is the linear transformation L such that:

limxx0;xx0|f(x)(f(x0)+L(xx0))||xx0|=0

Using our function, this is:

limxx0;xx0|xTAxx0TAx0L(xx0)||xx0|=0

Defining h=xx0, we have x=x0+h and

|(x0+h)TA(x0+h)x0TAx0L(h)||h|

Focusing on the subexpression (x0+h)TA(x0+h), since A is a matrix, it is a linear transformation, so we obtain (x0+h)T(Ax0+Ah). Since the transpose of a sum is the sum of the transposes, we have (x0T+hT)(Ax0+Ah). Now using linearity we have x0TAx0+hTAx0+x0TAh+hTAh.

Now the fraction is

|x0TAx0+hTAx0+x0TAh+hTAhx0TAx0L(h)||h|=|hTAx0+x0TAh+hTAhL(h)||h|

Focusing on hTAx0, it is a real number so taking the transpose leaves it unchanged: hTAx0=(hTAx0)T=x0TATh.

Now the fraction is

|x0TATh+x0TAh+hTAhL(h)||h|=|x0T(AT+A)h+hTAhL(h)||h|

In the numerator, hTAh is a higher order term that will disappear when taking the limit, so the linear transformation we are looking for must be L(h)=x0T(AT+A)h. Since A is symmetric, we have AT+A=2A and L(h)=2x0TAh.

Using the chain rule