One way of describing the chain rule is to say that derivatives of compositions of differentiable functions may be obtained by linearizing.If linear functions (functions of the form ) are composed, then the slope of the composition is the product of the slopes of the functions being composed. Since differentiable functions are practically linear if you zoom in far enough, they behave the same way under composition.

The chain rule in multivariable calculus works similarly. If we compose a differentiable function with a differentiable function , we get a function whose derivative is

Note that the right-hand side can also be written as , since is a row vector, and the product of a row vector and a column vector is the same as the dot product of the of the row vector and the column vector. We can explain this formula geometrically: the change that results from making a small move from to is the dot product of the gradient of and the small step .

Exercise Suppose that , that , and that and .Find the derivative of the function at the point .

Solution.The chain rule implies that the derivative of is

Exercise Find the derivative with respect to of the function by writing the function as where and and .

Solution.Let where and . We have that and . Since both derivatives of and with respect to are 1, the chain rule implies that

Exercise Suppose that for some matrix , and suppose that is the componentwise squaring function (in other words, ). Find the derivative of .

Note: you might find it convenient to express your answer using the function diag which maps a vector to a matrix with that vector along the diagonal.

Solution.The derivative matrix of is diagonal, since the derivative of with respect to is zero unless .The diagonal entries are .The derivative of is , as we saw in the section on matrix differentiation. Therefore, the derivative of the composition is

We can check this exercise numerically:

import numpy as np
A = np.random.random_sample((5,5))
x = np.random.random_sample(5)
Δx = 1e-6 * np.random.random_sample(5)
def f(y):
"Componentwise square x"
return y**2
def g(x):
"Multiply A by x"
return A @ x
derivative = 2 * np.diag(A @ x) @ A
np.allclose(f(g(x + Δx)) - f(g(x)), derivative @ Δx)