Derivative of a quadratic form
Let be an
by
symmetric real-valued matrix, and let
be defined by
. On this page, we calculate the derivative of
using three methods.
Contents
Understanding the problem
Since is a real-valued function of
, the derivative and the gradient coincide.
Straightforward method
This method is the most straightforward, and involves breaking apart the matrix and vector into components and performing the differentiation. While straightforward, it appears messy due to the indices involved.
Let and
.
We expand
Now we find the partial derivative of the above with respect to . To distinguish the constants from the variable, it makes sense to split the sum:
The first equality comes from splitting the outer summation, and the second comes from splitting the two inner summations.
Now distributing we have
It is now easy to do the differentiation. We obtain
Since the matrix is symmetric, so
. The final equality follows because
is just an indexing variable and we are free to rename it. But now the derivative becomes
But this is just the th component of
. It follows that the full derivative is just
(or its transpose, depending on whether we want to view it as a row or column vector).
Using the definition of the derivative
This is an expanded version of the answer at [1].
Using the definition, we can compute the derivative from first principles without exposing the components.
The derivative is the linear transformation such that:
Using our function, this is:
Defining , we have
and
Focusing on the subexpression , since
is a matrix, it is a linear transformation, so we obtain
. Since the transpose of a sum is the sum of the transposes, we have
. Now using linearity we have
.
Now the fraction is
Focusing on , it is a real number so taking the transpose leaves it unchanged:
.
Now the fraction is
In the numerator, is a higher order term that will disappear when taking the limit, so the linear transformation we are looking for must be
. Since
is symmetric, we have
and
.
Using the chain rule
In this approach, we think of as a composition of
and
and use the multivariable chain rule.
Define:
What is tricky is that is not
; to make the composition work, we must stick on
to
to form
before passing to
.
Now the multivariable chain rule says:
The notation is confusing because means different things on each side of the equation (since
is both the input variable and an intermediate variable).
Looking only at the first half of the terms, is
if
and
otherwise, so we keep only the
th term, where we see
.
Now looking at the second half of the terms, and
.
Putting all the above together, we obtain
In the last equality we used the fact that is symmetric.
We now have the th component of the derivative, so the full derivative is
.
See [2] for something similar.