Why Multiply by the Derivative of the Inner Function in the Chain Rule?

In summary: I think that quote by mathwonk sums it up nicely. And as for the proof in quotation marks, it's more of an intuitive explanation rather than a formal proof. In summary, the chain rule states that the derivative of a composite function is the product of the derivatives of its component functions, and multiplying by the intermediate variable is a convenient way to involve it in the equation. This is based on the idea that the best linear approximation to a composite function is obtained by composing the best approximations to its component functions, which in turn is obtained by simply multiplying the linear functions.
  • #1
Quincy
228
0
Chain Rule - intuitive "Proof"

Suppose y = f(u), and u = g(x), then dy/dx = dy/du * du/dx.

In an intuitive "proof" of the chain rule, it has this step: dy/dx = [tex]\lim_{\Delta x \to 0} \frac {\Delta y}{\Delta x}[/tex] = [tex]\lim_{\Delta x \to 0} \frac {\Delta y}{\Delta u}[/tex] * [tex]\frac {\Delta u}{\Delta x}[/tex]

My question is, why multiply by [tex]\frac {\Delta u}{\Delta u}[/tex]? I know mathematically, it's because [tex]\frac {\Delta u}{\Delta u}[/tex] = 1, and multiplying by 1 doesn't change the function, but I'm looking for a philosophical reason. I found a quote by the user mathwonk in an old thread, which says: "it seems plausible that the best linear approximation to a composite function, is obtained by composing the best approximations to the component functions. On the other hand for a linear function, composing means simply multiplying." Can someone expand on this?...
 
Physics news on Phys.org
  • #2


A change Δx in x causes a change Δu in u which causes a change Δy in y. The derivatives are limits of difference quotients so you would expect Δy/Δu [itex]\rightarrow[/itex] dy/du and Δu/Δx [itex]\rightarrow[/itex] du/dx. Multiplying the numerator and denominator by Δu is a convenient and seemingly appropriate way to involve the intermediate variable. That's about as philosophical as I get. And I gather that you do know that method isn't really a proof.
 
  • #3


LCKurtz said:
Multiplying the numerator and denominator by Δu is a convenient and seemingly appropriate way to involve the intermediate variable.
But why is it appropriate?


LCKurtz said:
And I gather that you do know that method isn't really a proof.
Yes, that's why I put proof in quotation marks.
 
  • #4


Quincy said:
But why is it appropriate?

Because I need a Δy/Δu and Δu/Δx in the equation and multiplying by Δu/Δu doesn't change the equation.
 
  • #5


Surely you've seen that done before in mathematics? When you get common denominators to add fractions, you multiply numerator and denominator by the same thing- that's exactly the same idea. When you complete the square in a quadratic, you add and subtract the same thing. Almost the same idea.
 
  • #6


Quincy said:
I found a quote by the user mathwonk in an old thread, which says: "it seems plausible that the best linear approximation to a composite function, is obtained by composing the best approximations to the component functions. On the other hand for a linear function, composing means simply multiplying." Can someone expand on this?...

Interesting quote. I'm glad I came across this. (Thanks mathwonk!)

I'm not sure what more can be said. But for my own sake, and maybe yours as well, I'll do a little explaining to think the idea through.

Differentiable functions are locally linear. That means, you take any differentiable function f and a point p. There exist constants a and b such that f(x) = ax + b as long as x is pretty damn close to p. The derivative of f at point p is simply a. (That's almost TOO convenient...).

Combine this idea with the chain rule. Let f and g be differentiable functions. We note that f . g is differentiable too. We pick a point p. We want to find the derivative of f . g at p. We use our rule above. Since f . g is differentiable, it is locally linear. Which means, (f . g)(x) = f(g(x)) = a x + b for some a and b. Our goal is to determine the value of a.

Well, since f and g are both differentiable, we know that they too are locally linear. So let's will some more variables into existence, using the same rule:

g(x) = a' x + b'

(The primes are not differentiation. The a's are the derivatives at point p and the b's are the constants).

So the derivative of g at point p is a'.

Next, we use our rule on f. But it's a little different this time! We're not taking the derivative of f at p. Instead, we're taking the derivative of f at the point g(p)! But a point is a point, and fixing g(p), we use our rule to conclude that

f(x) = a'' x + b'' for all x's that are pretty damn close to g(p).

So now we have a' (the derivative of g at point p) and a'' (the derivative of f at point g(p)). Let's compose f and g!

(f . g)(x) = f(g(x))

If x is close to p, then we can expand g(x) as a' x + b':

(f . g)(x) = f(a' x + b') for all x's pretty damn close to p.

Now, and again, g(x) is pretty damn close to p, so we can expand f:

(f . g)(x) = a'' (a' x + b') + b'' = (a'' a') x + (a'' b' + b'') for all x pretty damn close to p.

And we draw our conclusion. The derivative of f . g at point p is simply the first-order term, a'' a'. Exciting. What does that mean? Pulling from the definitions above, a'' is the derivative of f at point g(p), and a' is the derivative of g at point p. That is exactly what the chain rule is.

OK. That wasn't as simple as I hoped. But I hope you get the picture a little better when you replace the f'(g(x)) clutter with constants. I think, in particular, you can see where the multiplication comes in. It's linear. You substitute. You shave off a few bits and toss it into your constant, and multiply a few coefficients.
 
  • #7


Thanks!
 

FAQ: Why Multiply by the Derivative of the Inner Function in the Chain Rule?

What is the chain rule?

The chain rule is a mathematical rule used to find the derivative of a composite function.

How is the chain rule used?

The chain rule is used to find the derivative of a function that is composed of two or more other functions.

What is a composite function?

A composite function is a function that is formed by combining two or more other functions, where the output of one function becomes the input of another.

Can you provide an intuitive proof of the chain rule?

Yes, an intuitive proof of the chain rule involves imagining a bicycle chain, where each link represents a different function. As the chain moves, the rate of change of one link affects the rate of change of the entire chain, which is similar to how the chain rule works for composite functions.

Why is the chain rule important in mathematics and science?

The chain rule is important because it allows us to find the derivative of complex functions, which is essential in many areas of mathematics and science, including physics, engineering, and economics.

Similar threads

Back
Top