Help Understanding Stewart Chain Rule Proof [Picture Provided]

In summary, the conversation discusses the chain rule proof in calculus and the differences between the traditional and intuitive proofs. The traditional proof is deemed correct and involves showing the limit of (∆y/∆x) as ∆x -->0 is equal to y’(u).u’(x). However, there is a problem when u(x) is not assumed to be "one-to-one." The second proof attempts to fix this problem but is deemed unnecessary and less understandable. The conversation also mentions how this result was traditionally proved correctly in older books, but modern books often use alternative methods.
  • #1
nickadams
182
0
Hi I've spent a long time trying to understand this chain rule proof but I just can't get it...

I have attached 2 pictures: the second one is an intuitive chain rule proof that turns out to be bogus and the first is the correct proof. So I am trying to understand first of all what does the second proof actually mean, and secondly what did they do to fix the problem presented by the first invalid proof.

Basically any and all help for me to gain insight into this proof would be appreciated. Don't be afraid to write too much or too little because I am very lost about what they are doing. What does the "property" that they lay out (resulting in equation 7) before they start the actual proof actually mean and how does it help their proof? Why introduce epsilon?

I wish there was someone here in person to walk me through it but I guess I will have to try my best to get it over the internet.
Thanks a lot!
 

Attachments

  • chain rule proof.jpg
    chain rule proof.jpg
    33.6 KB · Views: 740
  • chainrule2.jpg
    chainrule2.jpg
    26.9 KB · Views: 758
Last edited:
Physics news on Phys.org
  • #2
Here is the traditional proof of the chain rule: (Stewart's first proof made correct):

We have a composite function y(u(x)), and assume both component functions y(u) and u(x) are differentiable, and we claim that also y(u(x)) is differentiable, and that its derivative equals y’(u).u’(x) = y’(u(x)).u’(x).

All we have to do is show the limit of (∆y/∆x) as ∆x -->0, equals y’(u).u’(x).

We are assuming that (∆u/∆x)-->u’(x) as ∆x -->0, and also that (∆y/∆u)-->y’(u) as ∆u -->0.

Of course one needs to know the meaning of a limit. I.e. (∆y/∆u)-->y’(u) as ∆u -->0, means that the fraction (∆y/∆u) gets really close to the number y’(u) as long as ∆u is really small but not zero, and the same for the other limit.

Unfortunately since the function u(x) is not assumed to be “one to one”, it can happen that two different values of x give the same value of u, and then we would have ∆u = 0 even though ∆x ≠ 0.

So if we try to break up the fraction ∆y/∆x into a product (∆y/∆u)(∆u/∆x) and use the product rule for limits, we have a problem since this product may not really equal ∆y/∆x for all ∆x that is really small but not zero. I.e. if there is a small non zero ∆x such that ∆u = 0, then ∆y/∆x does not equal (∆y/∆u)( ∆u/∆x), since the fraction (∆y/∆u) does not make sense.

Now we get to start from as small a ∆x as we want in this limit, so if there is ever a ∆x so small that ∆u ≠ 0 for that ∆x and also for all smaller ∆x, there is no problem. So the only case where we have not proved the chain rule is when there is a sequence of ∆x’s approaching zero, and for all of them we still have ∆u = 0.

Now in that case, it follows that the fraction ∆u/∆x equals zero for all those ∆x’s, and since this fraction has a limit, the only possible limit is zero. I.e. in the only case where the proof does not work, we know that u’(x) = 0. Thus for the theorem to hold in that case, we only need to prove that y’(x) = y’(u).u’(x) = y’(u).0 = 0. I.e. all we have to do is prove that in this case the fraction ∆y/∆x still approaches zero even though we cannot always factor it into a product of fractions.

The secret is to notice that we can still factor it as ∆y/∆x = (∆y/∆u)(∆u/∆x),

as long as ∆u ≠ 0. I.e. there are two kinds of ∆x’s, those for which ∆u = 0, and those for which ∆u ≠ 0. But when ∆u = 0 we do not need to factor it, i.e. it is trivial then that the fraction ∆y/∆x = 0, since the top is the difference of the values of y at the same two values of u, so of course it equals zero. I.e. ∆u = 0 means the two values of u are the same, so y has the same value at bo0th of them so ∆y = 0, hence also ∆y/∆ = 0.

And in the case where ∆u ≠ 0, we can still factor the fraction as ∆y/∆x = (∆y/∆u)(∆u/∆x), and use the other product argument. I.e. as long as ∆x is really small, if ∆u ≠ 0, then the fraction ∆y/∆x = (∆y/∆u)(∆u/∆x). And since u’(x) = 0 in this case, (∆u/∆x) is a small number, and (∆y/∆u) is close to the finite number y’(u), so the product (∆y/∆u)(∆u/∆x), is a small number.

And in the case where ∆u = 0, things are actually even better. I.e. although we cannot factor the fraction, it does not matter because then ∆u= 0 implies also ∆y = 0, so the fraction ∆y/∆x is as close to zero as it can get, since it equals zero.

Thus in the “bad” case where ∆x is small and non zero, but ∆u = 0, the chain rule still holds because both sides of the equaion equal zero.

Thus Stewart’s second proof is unnecessary. It works because he has managed to take the denominators out of the argument. But he has also managed to make the argument less understandable.

This result was traditionally proved correctly in turn of
the century English language books, such as Pierpont's Theory of functions
of a real variable, and in 19th century European books such as that of
Tannery [see the article by Carslaw, in vol XXIX of B.A.M.S.], but
unfortunately not in the first three editions of the influential book Pure
Mathematics, by G.H.Hardy. Although Hardy reinstated the classical proof in later editions, modern books usually deal with the problem by giving the slightly more sophisticated linear approximation proof, or making what to me are somewhat artificial constructions.

Summary:
The point is simply that in proving a function has limit L, one only needs
to prove it at points where the function does not already have value L.
Thus to someone who says that the usual argument for the chain rule for
y(u(x)), does not work for x's where ∆u = 0, one can simply reply that
these points are irrelevant.

Assume f is differentiable at g(a), g is differentiable at a, and on every
neighborhood of a there are points x where g(x) = g(a). We claim the
derivative of f(g(x)) at a equals f'(g(a))(g'(a)).
Proof:
1) Clearly under these hypotheses, g'(a) = 0.
Consequently,
2) the chain rule holds at a if and only if lim∆f/∆x = 0 as x approaches a.
3) Note that ∆f = ∆f/∆x = 0 at all x such that g(x) = g(a).
4) In general, to prove that lim h(x) = L, as x approaches a, it suffices
to prove it for the restriction of h to those x such that h(x) ≠ L.
5) Thus in arguing that ∆f/∆x approaches 0, we may restrict to x such that
g(x) ≠ g(a), where the usual argument applies.
 
Last edited:
  • #3
The proof in Stewart is the linear approximation proof, that just says the same thing but takes out the division step.

I.e. saying that deltay/deltax -->y'(a) as deltax -->0, is the same as saying that

deltay/deltax - y'(a) -->0, i.e. that [deltay - y'(a)deltax]/deltax -->0.

now if we multiply by deltax, we get that {[deltay - y'(a)deltax]/deltax}.delta x =

= [deltay - y'(a)deltax] = e.deltax, where e-->0, i.e. e = [deltay - y'(a)deltax]/deltax.Thus we can state that y’(a) is the derivative of y without usin=g denominatiors, by saying that

delta y = y’(a)deltax + e.deltax, where e-->0 as deltax does.so if y(x) = y(u(x)), in order to show that dy/dx (a) = y’(u(a)).(u’(a)),

all we have to do is show that we can write

y(u(x)) – y(u(a)) = [y’(u(a)).(u’(a))].deltax + E.deltax, where E is something that goes to zero as deltax does.This is a messy substitution using what we are given,

i.e. y(u(x)) – y(u(a)) = y’(u(a)). delta u + e delta u, where e-->0, because y is differentiable wrt u.

But u is also differentiable wrt x, so we can plug the same sort of thing in for delta u:

y(u(x)) – y(u(a)) = y’(u(a)). {u’(a).delta x + e1.delta x} + e{u’(a).delta x + e1.delta x},

where e-->0 as delta u does, and e1-->0 as delta x does. Fortunately u is continuous in x, so delta u goes to zero when delta x does, hence e-->0 also when delta x does.

Now just expand and collect terms getting:

y(u(x)) – y(u(a)) = y’(u(a)). u’(a).delta x + {e1.y’(u(a)) + e.u’(a) + e1}.delta x}

= y’(u(a)). u’(a).delta x + E. delta x, where E = {e1.y’(u(a)) + e.u’(a) + e1}, and that does go to zero as delta x does.

Hence the multiplier of delta x must be the derivative dy/dx. i.e. dy/dx = y’(u(a)). u’(a).

the one good thing about this more comp-licated proof is that this idea also works in several variables where you cannot just divide by the vector variable. also it reminds you that a derivative is really a linear approximation to the original function, which is important to know.

But the idea that the original simpler proof does not work is just wrong, and may be evidence that a lot of calculus book writers just copy their stuff from other recent best sellers without thinking it through or doing historical research on the topic. Or to be fair, maybe they are aware of it but choose for pedagogical reasons not to mention it.
 
Last edited:
  • #4
mathwonk said:
Here is the traditional proof of the chain rule: (Stewart's first proof made correct):

We have a composite function y(u(x)), and assume both component functions y(u) and u(x) are differentiable, and we claim that also y(u(x)) is differentiable, and that its derivative equals y’(u).u’(x) = y’(u(x)).u’(x).

WANT TO SHOW:

let
dy = domain of y(u)
du = domain of u(x)
ru = range of u(x)
[tex]\forall_{a\in d_y,\;b\in d_u,\;c\in r_u}\exists_{k_1, k_2, k_3\in\ \mathbb R}\;s.t.\;\left\{\lim_{x\to\ a}\frac{y(x)-y(a)}{y-a}=k_1 \wedge \lim_{x\to\ b}\frac{u(x)-u(b)}{x-b}=k_2 \wedge \lim_{x\to\ c}\frac{y(x)-y(c)}{y-c}=k_3\right\}\\\longrightarrow\;\forall_{a\in\ d_y,\;b\in\ d_u,\;c\in\ r_u}\left\{\frac{d}{dx}y(u(x)) = \left(\lim_{x\to\ a}\frac{y(x)-y(a)}{y-a}\cdot\lim_{x\to\ b}\frac{u(x)-u(b)}{x-b}\right) = \left(\frac{d}{dx}y(u(x))\cdot\lim_{x\to\ b}\frac{u(x)-u(b)}{x-b}\right)\right\}[/tex] ... did I interpret this first part correctly? I didn't know how to write the derivative of y(u(x)) in limit definition form.

All we have to do is show the limit of (∆y/∆x) as ∆x -->0, equals y’(u).u’(x).

... do you say ∆y instead of ∆y(u(x)) because if we can show that lim ∆x->0(∆y/∆x) = y’(u).u’(x) for an arbitrary domain of y, then obviously the equality will hold true for the domain made up by the range of u(x)?

We are assuming that (∆u/∆x)-->u’(x) as ∆x -->0, and also that (∆y/∆u)-->y’(u) as ∆u -->0.

Of course one needs to know the meaning of a limit. I.e. (∆y/∆u)-->y’(u) as ∆u -->0, means that the fraction (∆y/∆u) gets really close to the number y’(u) as long as ∆u is really small but not zero, and the same for the other limit.

... okay I think i think this is consistent with what I typed in the "want to show"

Unfortunately since the function u(x) is not assumed to be “one to one”, it can happen that two different values of x give the same value of u, and then we would have ∆u = 0 even though ∆x ≠ 0.

... okay because u(x) being one-to-one would imply each input x would give you a unique u(x) value, and we didn't assume this.

So if we try to break up the fraction ∆y/∆x into a product (∆y/∆u)(∆u/∆x) and use the product rule for limits, we have a problem since this product may not really equal ∆y/∆x for all ∆x that is really small but not zero. I.e. if there is a small non zero ∆x such that ∆u = 0, then ∆y/∆x does not equal (∆y/∆u)( /∆x), since the fraction (∆y/∆u) does not make sense.
[tex]\small\text{... okay because we have} \frac{∆y}{∆x}\text{and we want to multiply by} \frac{∆u}{∆u} \text{but} \frac{∆u}{∆u} \text{is undefined if ∆u=0}\\\small\text{and we can't guarantee ∆u≠0 since we didn't assume u(x) is 1-to-1}\\
\small\text{Also, if ∆u=0 for a nonzero ∆x that means there is at least one number p in the domain of u(x) where}\\\lim_{x\to\ p}\frac{u(x)-u(p)}{x-p} = \lim_{x\to\ p+∆x}\frac{u(x)-u(p+∆x)}{x-(p+∆x)}[/tex]

i don't know if that's relevant or not (or even true for that matter)
Now we get to start from as small a ∆x as we want in this limit, so if there is ever a ∆x so small that ∆u ≠ 0 for that ∆x and also for all smaller ∆x, there is no problem. So the only case where we have not proved the chain rule is when there is a sequence of ∆x’s approaching zero, and for all of them we still have ∆u = 0.

gotcha

Now in that case, it follows that the fraction ∆u/∆x equals zero for all those ∆x’s, and since this fraction has a limit, the only possible limit is zero. I.e. in the only case where the proof does not work, we know that u’(x) = 0.

okay.

Thus for the theorem to hold in that case, we only need to prove that y’(x) = y’(u).u’(x) = y’(u).0 = 0. I.e. all we have to do is prove that in this case the fraction ∆y/∆x still approaches zero even though we cannot always factor it into a product of fractions.

is y'(x) the same as y'(u(x))? I thought we were trying to show y'(u(x)) = y'(u)*u'(x)? I was on board before because we were just doing ∆y/∆x which I assumed was considering u(x) to be the input into y. So I thought it was just saying "change in y(u(x)) resulting from a tiny change of x". But with y'(x) it seems like we're saying "change in y(u(x)) resulting from a tiny change in u(x)"?

sorry; I'm stuck. :frown:
 
  • #5
edit: in the denominators of the limits in the "want to show" section i meant to have x-a and x-c not y-a and y-c
 
  • #6
can anyone help get me unstuck?
 
  • #7
I know it took me 5 months from the time I posted the initial question to when I posted my follow-up, but now I got stuck. :(

I want to understand but I can't go anywhere with the proof at this point
 

FAQ: Help Understanding Stewart Chain Rule Proof [Picture Provided]

What is the Stewart Chain Rule?

The Stewart Chain Rule is a mathematical theorem used to calculate the derivative of a function that is composed of multiple other functions. It is an extension of the Chain Rule and is commonly used in calculus.

Why is the Stewart Chain Rule important?

The Stewart Chain Rule allows us to find the derivative of complex functions, which is useful in many real-world applications such as physics, engineering, and economics. It also helps us understand the relationships between different variables in a function.

What is the proof of the Stewart Chain Rule?

The proof of the Stewart Chain Rule involves using the Chain Rule and the product rule to expand the derivative of a function composed of two other functions. This process is repeated for each additional function in the composition until the final form of the Stewart Chain Rule is derived.

How do I use the Stewart Chain Rule in a problem?

To use the Stewart Chain Rule, first identify the outer function and its derivative. Then, identify the inner function and its derivative. Plug these values into the formula and simplify to find the derivative of the composed function.

Can the Stewart Chain Rule be extended to functions with more than two variables?

Yes, the Stewart Chain Rule can be extended to functions with any number of variables. The proof and process are similar to the two-variable case, but involve using partial derivatives and the multivariable Chain Rule.

Back
Top