Demystifying the Chain Rule in Calculus
Table of Contents
Introduction
There are a number of posts on PF involving a general confusion over the multi-variable chain rule. The problem is often caused by a lack of clarity about the roles of functions and variables and what precisely each derivative means. This insight is an attempt to clarify things.
Part A: The Single-Variable Chain Rule
1) Functional Notation
We begin with a review of the chain rule in one dimension. For this we need two functions ##f## and ##g## and a third function defined by composition of ##f## and ##g##:
$$h = f \circ g$$
If we use ##x## as our variable, then this means that:
$$h(x) = f(g(x))$$
With the appropriate assumptions about ##f## and ##g## the chain rule says that:
$$h'(x) = f'(g(x))g'(x) \ \ (1)$$
Note that this means: the derivative of ##h##, evaluated at a point ##x##, equals the derivative of ##f##, evaluated at the point ##g(x)##, times the derivative of ##g##, evaluated at the point ##x##.
Although this notation is more complicated than the differential notation below, it has the advantage that it explicitly shows what each derivative means.
Another important point is that the notation ##f’, g’, h’## is independent of the variable ##x##. In other words, we can use the notation ##f’## to identify unambiguously the function that is the derivative of ##f##, without having to specify a variable.
Dissociating a function (and its derivative) from the dummy variable that is used to define it is an important step in understanding how derivatives and the chain rule really work. Unfortunately, as we will see, standard mathematical notation makes this harder to do in the multi-variable case.
2) Differential Notation
The alternative notation for the chain rule is, of course:
$$\frac{df}{dx} = \frac{df}{dg} \frac{dg}{dx} \ \ (2)$$
This notation is relatively free and easy and allows calculus to be done quickly, but notice how much precision has been lost:
- ##f## now stands for two different funtions: the original ##f## and the composition ##f \circ g##.
- The points at which these functions are to be evaluated has been lost.
- And, what exactly is ##\frac{df}{dg}##?
This is why you have to be careful using the differential notation not to lose track of how things are defined and what is a function of what.
Note that equation (2) means exactly the same as equation (1). If you have any doubt about what (2) means, then go back to equation (1).
Part B: Multi-Variable Chain Rule
In multi-variable calculus, we start with a function ##f## of several independent variables: ##x, y, z##, say. Assuming ##f## is differentiable, we can then define three new functions, the partial derivatives of ##f##:
$$f_x \equiv \frac{\partial f}{\partial x}, \ f_y \equiv \frac{\partial f}{\partial y}, f_z \equiv \frac{\partial f}{\partial z} \ \ (3)$$
Notice that the notation for partial derivatives is tied to a particular set of dummy variables – ##x, y, z## in this case. This, as we shall see, may lead to ambiguity.
To make sure we understand what is meant by these derivatives, here is an example. Let
$$f(x, y, z) = xy + x^2z + xyz \ \ (4)$$
Then we have:
$$f_x(x, y, z) \equiv \frac{\partial f}{\partial x} = y + 2xz + yz \ \ (4a)$$
$$f_y(x, y, z) \equiv \frac{\partial f}{\partial y} = x + xz \ \ (4b)$$
$$f_z(x, y, z) \equiv \frac{\partial f}{\partial z} = x^2 + xy \ \ (4c)$$
Note that these partial derivatives are themselves functions of the three variables. In general, once we have defined ##f## we have also defined ##f_x, f_y, f_z## and these are just three functions that can be applied to any variables we like. For example, from equation ##(4a)## we get:
$$f_x(u, v, w) = v + 2uw + vw \ \ (5)$$
Note that technically ##f_x## is really “the function obtained by differentiating ##f## with respect to its first argument – which we just happen to call ##x##.” There is no equivalent of the single-variable notation ##f’##, which would allow us to avoid using the variable ##x## here. For good or bad, we are stuck with the ##f_x## and ##\frac{\partial f}{\partial x}## notation.
Ambiguity may now arise, however, if ##u, v, w## are themselves functions of ##x, y, z##. For example, if we define:
$$h(x) = f(u(x), v(x), w(x)) \ \ (6)$$
Then we have defined a new function, ##h##, of a single variable ##x##. The chain rule says that:
$$h'(x) = f_x(u(x), v(x), w(x))u'(x) + f_y(u(x), v(x), w(x))v'(x) + f_z(u(x), v(x), w(x))w'(x) \ \ (7)$$
But, what exactly is ##f_x## here? Well, it is the function formed by taking the partial derivative of ##f## with respect to its first argument. Note that the symbol ##x## is now overloaded. But, again, we are stuck with the notation and we have to juggle using ##x## in these two roles. In our example, ##f_x## is the function defined in equation ##(5)##, regardless of what ##u, v, w## are .
In fact, things get worse. If simply ##u(x) = x## and, as is often the case, instead of ##v(x), w(x)## we use ##y(x), z(x)##. Now we have overloaded all three symbols ##x, y, z##. We have:
$$h(x) = f(x, y(x), z(x)) \ \ (8)$$
And the chain rule in this case gives:
$$h'(x) = f_x(x, y(x), z(x)) + f_y(x, y(x), z(x))y'(x) + f_z(x, y(x), z(x))z'(x) \ \ (9)$$
Where technically ##y, z## are used for both the dummy variables with which ##f## was defined (and which are used to denote the partial derivatives of ##f##) and also as functions of ##x##. To illustrate this we could, for example, capitalise the letters where they represent dummy variables denoting which partial derivative we mean and leave them as they are where they repersent specific variables and functions. This would give:
$$h'(x) = f_X(x, y(x), z(x)) + f_Y(x, y(x), z(x))y'(x) + f_Z(x, y(x), z(x))z'(x) \ \ (9b)$$
This highlights that ##X, Y, Z## denote the first, second and third partial derivatives of ##f## and are, in fact, unrelated to our variables ##x, y, z##. A distinction which in the usual notation in equation ##(9)## is simply not made.
In the differential notation equation (9) becomes:
$$\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx} + \frac{\partial f}{\partial z} \frac{dz}{dx} \ \ (10)$$
Where the left-hand side is often called the “total” derivative of ##f##. Note, however, that it is not really a derivative of ##f## at all, but the derivative of the composite function ##h##, which we defined in equation ##(8)## above.
This analysis, if nothing else, may at least take some of the ambiguity out of the “total” derivative ##\frac{df}{dx}## and the partial derivative ##\frac{\partial f}{\partial x}##. Equation ##(9)## hopefully makes clear what equation ##(10)## really means.
Although equation ##(10)## may be the form of the chain rule with which most people are familiar, the notation hides a multitude of sins. If you get confused over the chain rule it is worth being able to deconstruct it back to the functional format (equation ##(9)##) to see what is really going on. The key step in that process is to recognise that we have used the symbols ##x, y, z##, and indeed the function ##f##, in two different roles.
It’s actually quite rare in standard mathematical notation to have to juggle the same symbol in two different roles. But, the multi-variable chain rule in equations ##(9)## and ##(10)## is such a case.
Part C: Example
To illustrate how the multi-variable chain rule is used in a physical context, consider a particle under a time-dependent potential, which is defined as a function of four variables: ##V(x, y, z, t)##.
##V## is a multi-variable function with four partial derivatives (which themselves are multi-variable functions): ##V_x(x, y, z, t), V_y(x, y, z, t), V_z(x, y, z, t), V_t(x, y, z, t)##. Note that these functions are defined in general for ##V##, independent of what the particle is doing.
Now, if the particle takes a specific trajectory through space, we can define a new function which is the potential of the particle along its trajectory:
$$V_p(t) = V(x(t), y(t), z(t), t) \ \ (11)$$
Note that texts will often use the same symbol for both these functions and simply write:
$$V(t) = V(x(t), y(t), z(t), t) \ \ (12)$$
Overloading the symbol ##V## and trusting that the student doesn’t get confused by how the derivatives are calculated. I’ll stay with this convention, although I personally like to distinguish the functions by writing ##V_p## on the left-hand side, as in equation ##(11)##.
In any case, we can calculate how the potential changes with time (finally, for simplicity, I’ll drop the variables and write ##V_x \equiv V_x(x, y, z, t)## etc.):
$$V'(t) = V_x x'(t) + V_y y'(t) + V_z z'(t) + V_t \ \ (13)$$
Or, in the differential notation:
$$\frac{dV}{dt} = \frac{\partial V}{\partial x} \frac{dx}{dt}+ \frac{\partial V}{\partial y} \frac{dy}{dt} + \frac{\partial V}{\partial z} \frac{dz}{dt} + \frac{\partial V}{\partial t} \ \ (14)$$
This is standard notation, but you can see from equations ##(11)## and ##(12)##that the ##V## on the left-hand side is actually a different function from the ##V## on the right-hand side.
Note also that ##V_x = \frac{\partial V}{\partial x}## is the partial derivative of the function ##V## with respect to its ##x## coordinate. This ##x## is not the same as the function ##x(t)## representing the particle’s x-coordinate over time. In other words ##V_x = \frac{\partial V}{\partial x}## is the same function, regardless of the path of the particle. The path of the particle ##(x(t), y(t), z(t))## is the set of points at which this function ##V_x## is evaluated for that particular path.
One particular area of confusion is where the student thinks they need to differentiate the function ##V## along the particular particle trajectory to get ##V_x## (or ##\frac{\partial V}{\partial x}##). This is not the case. ##V_x## (or ##\frac{\partial V}{\partial x}##) is calculated by a general, spatial derivative before any particular trajectory is considered. Then, this function is evaluated along a particular trajectory.
Conclusion
A thorough understanding of the single-variable chain rule is an important prerequisite for multi-variable calculus.
The multi-variable chain rule involves a certain overload of the symbols ##x, y, z## and, in the usual differential notation, an overload of the symbol representing the function. These ambiguities must be faced and understood. In addition, the differential notation misses out much of the detail about how things are defined and the points at which they are evaluated. Being able to go back to the functional notation can often clarify what is really going on.
In a physical context, it is important to distinguish where a function is defined and has been differentiated with respect to general spatial and time coordinates – yielding its partial derivatives ##V_x## etc. – and where these partial derivatives are being evaluated at specific points along, for example, the trajectory of a particle.
This insight hopefully provides a useful supplement to anyone learning multi-variable calculus.
BSc in pure mathematics (1984). Retired from a career in Information Technology in 2014. I divide my time between studying physics when I’m home in London and mountaineering.
Favourite area of physics is Quantum Mechanics.
Nice article.
I would only comment that in multivariate calculus one inevitably gets into directional derivatives. To understand these I think it is helpful to think of the derivative(or differential) of a function as linear map on direction vectors. The Chain Rule then says that if you compose two functions, the derivative of the composition is the composition of the derivatives. In classical multivariable calculus this means you matrix multiply the Jacobian matrices.
Also thinking of the derivative in this way gives a conceptual framework for the Chain Rule rather than only a rule.
Thank you PeroK, I've found myself lost with things like this a couple of times and I agree when you say that the differential notation lacks of many things. The last issue that I lately found confusing was that you pointed out in equation (8):
Having a function f(g(x,t),x) write the partial derivative of f wrt x without having written the same term in both sides of the equation. :woot:
One option would be , being f[SUB]i[/SUB] the partial derivative of f wrt its i-th argument, write f[SUB]x[/SUB] = f[SUB]1[/SUB](x,t) g[SUB]x[/SUB]+f[SUB]2[/SUB] ,
this way you could avoid writting f[SUB]2[/SUB] again as f[SUB]x[/SUB] and it would be the same as you suggested there (1,2 instead of X,Y).
Another option: Using differential notation you would have to use parenthesis and write explicitly that
f[SUB]1[/SUB] = (∂f / ∂g) keeping the 2nd argument of f fixed, but that would bring notational clustering so better stick with the first option. :P
Fortunately, some people will read this article and they won't have to question all their knowwledge again as I did in that moment.
The light type used, 33% saturation, makes it difficult to read. Any chance of increasing the amount of "ink" used? This thread, as all others, uses 50-75% saturation. And the reply box I'm typing in uses 98% saturation.
Thanks.
This is a very insightful insight. I have just begun studying calculus and I was extremely worried about this magical chain rule. This article was helpful in demystifying whatever I could understand from the insight :bow:.
Great insight, it addresses the main issues an average student (and i myself had) might stumble into when comes in first contact with the chain rule.
Nice article, @PeroK!
Congrats on your first Insight @PeroK!