Derivative of a quadratic form: Difference between revisions

Revision as of 00:04, 14 July 2018

Let $A \in M_{n, n} (R)$ be an $n$ by $n$ real-valued matrix, and let $f : R^{n} \to R$ be defined by $f (x) = x^{T} A x$ . On this page, we calculate the derivative of $f$ .

Understanding the problem

Straightforward method

This method is the most straightforward, and involves breaking apart the matrix and vector into components and performing the differentiation. While straightforward, it appears messy due to the indices involved.

Let $A = (a_{k i})$ and $x = (x_{1}, \dots, x_{n})$ .

We expand

x^{T} A x = x^{T} (\begin{matrix} \sum_{i = 1}^{n} a_{1 i} x_{i} \\ ⋮ \\ \sum_{i = 1}^{n} a_{n i} x_{i} \end{matrix}) = \sum_{k = 1}^{n} x_{k} \sum_{i = 1}^{n} a_{k i} x_{i}

Now we find the partial derivative of the above with respect to $x_{j}$ . To distinguish the constants from the variable, it makes sense to split the sum:

\sum_{k = 1}^{n} x_{k} \sum_{i = 1}^{n} a_{k i} x_{i} = x_{j} \sum_{i = 1}^{n} a_{j i} x_{i} + \sum_{k \neq j} x_{k} \sum_{i = 1}^{n} a_{k i} x_{i} = x_{j} (a_{j j} x_{j} + \sum_{i \neq j} a_{j i} x_{i}) + \sum_{k \neq j} x_{k} (a_{k j} x_{j} + \sum_{i \neq j} a_{k i} x_{i})

Now distributing we have

a_{j j} x_{j}^{2} + (\sum_{i \neq j} a_{j i} x_{i}) x_{j} + \sum_{k \neq j} (a_{k j} x_{k} x_{j} + x_{k} \sum_{i \neq j} a_{k i} x_{i}) = a_{j j} x_{j}^{2} + (\sum_{i \neq j} a_{j i} x_{i}) x_{j} + (\sum_{k \neq j} a_{k j} x_{k}) x_{j} + \sum_{k \neq j} x_{k} \sum_{i \neq j} a_{k i} x_{i}

It is now easy to do the differentiation. We have

2 a_{j j} x_{j} + \sum_{i \neq j} a_{j i} x_{i} + \sum_{k \neq j} a_{k j} x_{k}

Since the matrix is symmetric, $a_{k j} = a_{j k}$ so $\sum_{k \neq j} a_{k j} x_{k} = \sum_{k \neq j} a_{j k} x_{k} = \sum_{i \neq j} a_{j i} x_{i}$ . The final equality follows because $k$ is just an indexing variable and we are free to rename it. But now the derivative becomes

2 a_{j j} x_{j} + 2 \sum_{i \neq j} a_{j i} x_{i} = 2 \sum_{i = 1}^{n} a_{j i} x_{i}

Using the definition of the derivative

This is an expanded version of the answer at [1].

The derivative is the linear transformation $L$ such that:

lim_{x \to x_{0}; x \neq x_{0}} \frac{| f (x) - (f (x_{0}) + L (x - x_{0})) |}{| x - x_{0} |} = 0

Using our function, this is:

lim_{x \to x_{0}; x \neq x_{0}} \frac{| x^{T} A x - x_{0}^{T} A x_{0} - L (x - x_{0}) |}{| x - x_{0} |} = 0

Defining $h = x - x_{0}$ , we have $x = x_{0} + h$ and

\frac{| (x_{0} + h)^{T} A (x_{0} + h) - x_{0}^{T} A x_{0} - L (h) |}{| h |}

Focusing on the subexpression $(x_{0} + h)^{T} A (x_{0} + h)$ , since $A$ is a matrix, it is a linear transformation, so we obtain $(x_{0} + h)^{T} (A x_{0} + A h)$ . Since the transpose of a sum is the sum of the transposes, we have $(x_{0}^{T} + h^{T}) (A x_{0} + A h)$ . Now using linearity we have $x_{0}^{T} A x_{0} + h^{T} A x_{0} + x_{0}^{T} A h + h^{T} A h$ .

Now the fraction is

\frac{| x_{0}^{T} A x_{0} + h^{T} A x_{0} + x_{0}^{T} A h + h^{T} A h - x_{0}^{T} A x_{0} - L (h) |}{| h |} = \frac{| h^{T} A x_{0} + x_{0}^{T} A h + h^{T} A h - L (h) |}{| h |}

Focusing on $h^{T} A x_{0}$ , it is a real number so taking the transpose leaves it unchanged: $h^{T} A x_{0} = (h^{T} A x_{0})^{T} = x_{0}^{T} A^{T} h$ .

Now the fraction is

\frac{| x_{0}^{T} A^{T} h + x_{0}^{T} A h + h^{T} A h - L (h) |}{| h |} = \frac{| x_{0}^{T} (A^{T} + A) h + h^{T} A h - L (h) |}{| h |}

In the numerator, $h^{T} A h$ is a higher order term that will disappear when taking the limit, so the linear transformation we are looking for must be $L (h) = x_{0}^{T} (A^{T} + A) h$ . Since $A$ is symmetric, we have $A^{T} + A = 2 A$ and $L (h) = 2 x_{0}^{T} A h$ .

@@ Line 4: / Line 4: @@
 ==Straightforward method==
+This method is the most straightforward, and involves breaking apart the matrix and vector into components and performing the differentiation. While straightforward, it appears messy due to the indices involved.
 Let <math>A = (a_{ki})</math> and <math>x = (x_1,\ldots,x_n)</math>.
@@ Line 18: / Line 20: @@
 :<math>a_{jj}x_j^2 + \left(\sum_{i\ne j} a_{ji} x_i\right)x_j + \sum_{k\ne j} \left(a_{kj}x_k x_j + x_k \sum_{i\ne j} a_{ki} x_i\right) = a_{jj}x_j^2 + \left(\sum_{i\ne j} a_{ji} x_i\right)x_j + \left(\sum_{k\ne j}a_{kj}x_k\right) x_j + \sum_{k\ne j}x_k \sum_{i\ne j} a_{ki} x_i</math>
+It is now easy to do the differentiation. We have
+:<math>2a_{jj}x_j + \sum_{i\ne j} a_{ji} x_i + \sum_{k\ne j}a_{kj}x_k</math>
+Since the matrix is symmetric, <math>a_{kj} = a_{jk}</math> so <math>\sum_{k\ne j}a_{kj}x_k = \sum_{k\ne j}a_{jk}x_k = \sum_{i\ne j}a_{ji}x_i</math>. The final equality follows because <math>k</math> is just an indexing variable and we are free to rename it. But now the derivative becomes
+:<math>2a_{jj}x_j + 2\sum_{i\ne j} a_{ji} x_i = 2\sum_{i=1}^n a_{ji} x_i</math>
 ==Using the definition of the derivative==

Revision as of 00:04, 14 July 2018

Understanding the problem

Straightforward method

Using the definition of the derivative

Using the chain rule