Just as elementary differentiation rules are helpful for optimizing single-variable functions, matrix differentiation rules are helpful for optimizing expressions written in matrix form. This technique is used often in statistics.

Suppose is a function from to .Writing , we define the Jacobian matrix (or derivative matrix) to be

Note that if m=1, then differentiating f with respect to \mathbf{x} is the same as taking the gradient of f.

With this definition, we obtain the following analogues to some basic single-variable differentiation results: if A is a constant matrix, then

\begin{align*}\frac{\partial}{\partial \mathbf{x}} (A \mathbf{x}) &= A \\\ \frac{\partial}{\partial \mathbf{x}} (\mathbf{x}' A) &= A' \\\ \frac{\partial}{\partial \mathbf{x}} (\mathbf{u}' \mathbf{v}) &= \mathbf{u}'\frac{\partial \mathbf{v}}{\partial \mathbf{x}} + \mathbf{v}'\frac{\partial \mathbf{u}}{\partial \mathbf{x}}\end{align*}

The third of these equations is the rule.

The Hessian of a function f:\mathbb{R}^n \to \mathbb{R} may be written in terms of the matrix differentiation operator as follows:

Some authors define \frac{\partial f}{\partial \mathbf{x}'} to be \left(\frac{\partial f}{\partial \mathbf{x}}\right)', in which case the Hessian operator can be written as \frac{\partial^2}{\partial \mathbf{x} \partial \mathbf{x}'}.

Exercise Let f: \mathbb{R}^n \to \mathbb{R} be defined by f(\mathbf{x}) = \mathbf{x}' A \mathbf{x} where A is a symmetric matrix. Find \frac{\partial f}{\partial \mathbf{x}}.

Solution.We can apply the product rule to find that

Exercise Suppose A is an m\times n matrix and \mathbf{b} \in \mathbb{R}^m.Use matrix differentiation to find the vector \mathbf{x} which minimizes |A \mathbf{x} - \mathbf{b}|^2.Hint: begin by writing |A \mathbf{x} - \mathbf{b}|^2 as (A \mathbf{x} - \mathbf{b})' (A \mathbf{x} - \mathbf{b}).You may assume that the rank of A is n.

Solution.We write

\begin{align*}|A \mathbf{x} - \mathbf{b}|^2 &= (A \mathbf{x} - \mathbf{b})' (A \mathbf{x} - \mathbf{b}) \\\ &= \mathbf{x}' A' A \mathbf{x} - \mathbf{b}' A \mathbf{x} + \mathbf{x}' A' \mathbf{b} + |\mathbf{b}|^2.\end{align*}

To minimize this function, we find its gradient

\begin{align*}\frac{\partial}{\partial \mathbf{x}}|A \mathbf{x} - \mathbf{b}|^2 = 2\,\mathbf{x}' A' A - \mathbf{b}' A + (A'\mathbf{b})' = 2\mathbf{x}' A' A- 2\mathbf{b}' A\end{align*}