Multivariable CalculusThe chain rule
One way of describing the
The chain rule in multivariable calculus works similarly. If we compose a differentiable function with a differentiable function , we get a function whose derivative is
Note that the right-hand side can also be written as , since is a row vector, and the product of a row vector and a column vector is the same as the dot product of the
We visualize XEQUATIONX4192XEQUATIONX by drawing the points , which trace out a curve in the plane. We visualize only by showing the direction of its gradient at the point . The change in from one point on the curve to another is the dot product of the change in position and the gradient.
Exercise
Suppose that , that , and that and . Find the derivative of the function at the point .
Solution. The chain rule implies that the derivative of is
Exercise
Find the derivative with respect to of the function by writing the function as where and and .
Solution. Let where and . We have that and . Since both derivatives of and with respect to are 1, the chain rule implies that
Exercise
Suppose that for some matrix , and suppose that is the componentwise squaring function (in other words, ). Find the derivative of .
Note: you might find it convenient to express your answer using the function diag which maps a vector to a matrix with that vector along the diagonal.
Solution. The derivative matrix of is diagonal, since the derivative of with respect to is zero unless . The diagonal entries are . The derivative of is , as we saw in the section on matrix differentiation. Therefore, the derivative of the composition is
We can check this exercise numerically:
import numpy as np A = np.random.random_sample((5,5)) x = np.random.random_sample(5) Δx = 1e-6 * np.random.random_sample(5) def f(y): "Componentwise square x" return y**2 def g(x): "Multiply A by x" return A @ x derivative = 2 * np.diag(A @ x) @ A np.allclose(f(g(x + Δx)) - f(g(x)), derivative @ Δx)