Is There Proof for the Chain Rule?

In summary, the chain rule states that the derivative of a function is the limit of the derivatives of the function's arguments.
  • #1
Yh Hoo
73
0
Any proof for the CHAIN RULE ??

Can somebody please show me the proof of the chain rule?? even though i have been applying that concept since i touch differentiation but i still have doubt and question on this concept!
 
Physics news on Phys.org
  • #2


Yh Hoo said:
Can somebody please show me the proof of the chain rule?? even though i have been applying that concept since i touch differentiation but i still have doubt and question on this concept!

Hey Yh Hoo.

A quick google search will give you:

http://en.wikipedia.org/wiki/Chain_rule#Proofs_of_the_chain_rule

But even without this, it's best to resort to the definition of the derivative which is in terms of f'(x) = lim h->0 [f(x+h) - f(x)]/h, but instead you consider f'(g(x)). So expand this out and you get: lim a->b [f(g(b)) - f(g(a))]/[g(b)-g(a)] and then consider the multiplication by [g(b) - g(a)]/[b-a] and use this identity to get things in terms of the derivative of the whole thing.
 
  • #3


Here's a proof:

http://kruel.co/math/chainrule.pdf

Here's a java applet:

http://webspace.ship.edu/msrenault/GeoGebraCalculus/derivative_intuitive_chain_rule.html

Here's my own input on the matter:

Let's agree on some notation, let y=f(x), z=g(y), so z=g(f(x)).

So there is a proof of z'(x)=g'(f(x))g'(x) in a link above.

But let's ignore the proof and build some intuition, so that we also have a feel for why it is true, not just stare blankly through the proof and nod.

Try y=ax+b, and z=cy+d. If you play with that, you may have more of a feel for it. You can try b=d=0 too.

More intuition: I think I read once that Newton, a big helper in inventing the calculus, thought of the cahin rule in terms of gears. So the way z spins depends on how it is connected to y, which the way y spins depends on how it is connected to x. The g(x) inside of f', you can say is because z is connected to y=g(x), not x.

Some tips: Don't forget the g(x) inside of f'. To see what I mean, try g(x)=ln(x), f(x)=1+x^2.

A nice tool for remembering the rule is what I think is called Liebniz's notation: dz/dx=(dz/dy)(dy/dx).

Be careful of notation, for instance, z' could mean dz/dx or dz/dy.
 
  • #4


chiro said:
Hey Yh Hoo.

A quick google search will give you:

http://en.wikipedia.org/wiki/Chain_rule#Proofs_of_the_chain_rule

But even without this, it's best to resort to the definition of the derivative which is in terms of f'(x) = lim h->0 [f(x+h) - f(x)]/h, but instead you consider f'(g(x)). So expand this out and you get: lim a->b [f(g(b)) - f(g(a))]/[g(b)-g(a)] and then consider the multiplication by [g(b) - g(a)]/[b-a] and use this identity to get things in terms of the derivative of the whole thing.

This doesn't always work as g(b)-g(a) can be zero and then the fraction is undefined.
 
  • #5


Thanks guys, i will look at that first.
 
  • #6


I like the following non-rigorous argument:

It follows immediately from the definition of "derivative" that when h is small,
\begin{align*}
f(x+h)\approx f(x)+hf'(x).
\end{align*} Let's use this formula (twice) to approximate f(g(x+h)).
\begin{align*}
f(g(x+h))\approx f\big(g(x)+hg'(x)\big)\approx f(g(x))+hg'(x)f'(g(x)).
\end{align*} This implies that
\begin{align*}(f\circ g)'(x) &\approx \frac{f(g(x+h))-f(g(x))}{h}\approx \frac{f(g(x))+hg'(x)f'(g(x))-f(g(x))}{h}\\ &\approx f'(g(x))g'(x).
\end{align*}
A rigorous proof will cover at least two pages in a book, if the author does it as a straightforward application of the ε-δ definition of "limit", and includes all the details. There are tricks you can use to make the proof shorter, but I prefer not to use them, for the following two reasons:

1. In my opinion, they make it harder to understand what's really going on.
2. The people who need to study the proof have recently learned the ε-δ definition and are still pretty bad at using it, so it's an excellent exercise for them to study the longer but more straightforward proof.

I actually typed up a long but straightforward proof for my personal notes the last time I participated in one of these threads, but I never posted it. I will do that now. See my next post below (in ten minutes or so).
 
  • #7


There may still be some minor inaccuracies in the statement and the proof of the theorem. I will take another look at this later today to see if I find any. Feel free to post a comment if you find a mistake before I do.

Theorem: Let ##x\in\mathbb R## be arbitrary. Let f,g be arbitrary functions. If g is differentiable at x, and f is differentiable at g(x), then ##f\circ g## is differentiable at x, and
\begin{align}
(f\circ g)'(x)=f'(g(x))g'(x).
\end{align}
(Comment: In this context, "function" means "real-valued function with a domain that's a subset of ℝ").

Proof:
Let ε>0 be arbitrary. We're going to show that there exists a δ>0 such that for all h,
\begin{align}
|h|<\delta\ \Rightarrow\ %\left|\frac{f(g(x+h))-f(g(x))}{h}-f'(g(x))g'(x)\right|<\varepsilon
\left|\frac{f\circ g(x+h)-f\circ g(x)}{h}-f'(g(x))g'(x)\right|<\varepsilon
\end{align} This will prove both that ##f\circ g## is differentiable at x, and that we have the right formula for ##(f\circ g)'(x)##. We start by defining a notation. For each real number x and each function u that's differentiable at x, define
\begin{align}
R_{u,x}(h)=u(x+h)-u(x)-hu'(x)
\end{align} for all h such that x+h is in the domain of u. Note that ##R_{u,x}(0)=0## and that
\begin{align}
\frac{R_{u,x}(h)}{h}=\frac{u(x+h)-u(x)}{h}-u'(x)\rightarrow 0\text{ as }h\rightarrow 0.
\end{align} Let h be arbitrary. Define ##k=hg'(x)+R_{g,x}(h)##. Since g is differentiable at x, and f is differentiable at g(x), we have
\begin{align}
f(g(x+h)) &=f\big(g(x)+hg'(x)+R_{g,x}(h)\big)
=f(g(x)+k)\\
&=f(g(x))+kf'(g(x))+R_{f,g(x)}(k).
\end{align}This implies that
\begin{align}
&\frac{f\circ g(x+h)-f\circ g(x)}{h} =\frac{f(g(x+h))-f(g(x))}{h}\\
&=\frac{f(g(x))+kf'(g(x))+R_{f,g(x)}(k)-f(g(x))}{h}\\
&=\frac{\big(hg'(x)+R_{g,x}(h)\big)f'(g(x))+R_{f,g(x)}(k)}{h}\\
&=f'(g(x))g'(x)+f'(g(x))\frac{R_{g,x}(h)}{h}+\frac{R_{f,g(x)}(k)}{h},
\end{align} which implies that
\begin{align}
&\left|\frac{f\circ g(x+h)-f\circ g(x)}{h}-f'(g(x))g'(x)\right|
=\left|f'(g(x))\frac{R_{g,x}(h)}{h}+\frac{R_{f,g(x)}(k)}{h}\right|\\
&\leq |f'(g(x))|\left|\frac{R_{g,x}(h)}{h}\right|+\left|\frac{R_{f,g(x)}(k)}{h}\right|.
\end{align}
We're going to show that there exists a δ>0 such that each of the two terms above is <ε/2 when |h|<δ. The first term presents no difficulties. We just choose ##\delta_1>0## such that
\begin{align}
|h|<\delta_1\ \Rightarrow\ \left|\frac{R_{g,x}(h)}{h}\right| <\frac{\varepsilon}{2|f'(g(x))|}.
\end{align} The second term is much more difficult to deal with. If k=0, we have
$$\left|\frac{R_{f,g(x)}(k)}{h}\right|=0
<\frac{\varepsilon}{2}
$$ for all h. If k≠0, we have
\begin{align}
\left|\frac{R_{f,g(x)}(k)}{h}\right|
&=\left|\frac{R_{f,g(x)}(k)}{k}\right|\left|\frac{k}{h}\right|=
\left|\frac{R_{f,g(x)}(k)}{k}\right|\left|g'(x) +\frac{R_{g,x}(h)}{h}\right|\\
&\leq \left|\frac{R_{f,g(x)}(k)}{k}\right|\bigg(|g'(x)|
+\left|\frac{R_{g,x}(h)}{h}\right|\bigg).
\end{align} Choose ##\delta_2>0## such that
\begin{align}
|h|<\delta_2\ \Rightarrow\ \left|\frac{R_{g,x}(h)}{h}\right|<|g'(x)|.
\end{align} Choose ##\delta_3>0## such that
\begin{align}
|k|<\delta_3\ \Rightarrow\ \left|\frac{R_{f,g(x)}(k)}{k}\right| <\frac{\varepsilon}{4|g'(x)|}.
\end{align}Choose ##\delta_4>0## such that
$$|h|<\delta_3\ \Rightarrow |k|<\delta_3.$$ This is possible because
$$|k|=|hg'(x)+R_{g,x}(h)|=|g(x+h)-g(h)|,$$ and g is continuous at x. These choices ensure that if k≠0, then for all h with ##|h|<\min\{\delta_2,\delta_4\}##, we have
\begin{align}
\left|\frac{R_{f,g(x)}(k)}{h}\right|\leq
\left|\frac{R_{f,g(x)}(k)}{k}\right|\bigg(|g'(x)|
+\left|\frac{R_{g,x}(h)}{h}\right|\bigg) <\frac{\varepsilon}{4|g'(x)|}2|g'(x)|
<\frac{\varepsilon}{2}.
\end{align} If we define ##\delta=\min\{\delta_1,\delta_2,\delta_4\}##, then for all real numbers h such that ##|h|<\delta##,
\begin{align}
&\left|\frac{f\circ g(x+h)-f\circ g(x)}{h}-f'(g(x))g'(x)\right|\\
&\leq |f'(g(x))|\left|\frac{R_{g,x}(h)}{h}\right|+\left|\frac{R_{f,g(x)}(k)}{h}\right|
<\frac{\varepsilon}{2}+\frac{\varepsilon}{2} =\varepsilon.
\end{align}
 
Last edited:
  • #8


Yh Hoo said:
Can somebody please show me the proof of the chain rule?? even though i have been applying that concept since i touch differentiation but i still have doubt and question on this concept!


Try to formalize the following: [tex]\frac{f(g(x))-f(g(x_0))}{x-x_0}=\frac{f(g(x))-f(g(x_0))}{g(x)-g(x_0)}\frac{g(x)-g(x_0)}{x-x_0}[/tex]

Just note the following: as [itex]\,x\to 0\,[/itex] , also [itex]\,g(x)\to g(x_0)\,[/itex] (why?) , and if [itex]\,g(x)=g(x_0)\,[/itex] identically on a certain neighbourhood of

the point [itex]\,x_o\,[/itex] , then [itex]\,f(g(x))=f(g(x_0))\,[/itex] identically in the same neighbourhood, so the result is trivial then...

DonAntonio
 
  • #9


The only thing wrong with the proof using

(f(g(x+h)) - f(g(x)))/(h) = ((f(g(x+h)) - f(g(x)))/(g(x+h) - g(x)))*((g(x+h) - g(x))/h

is that you don't know g(x + h) - g(x) != 0 on some small interval of h.

You can get around this by letting h_n be a sequence going to 0 and splitting it into two subsequences, one consisting of the elements where g(x+h_i) = g(x) and one consisting of the elements that do not and taking the limits of both.

Also this idea is way, way easier then the standard proof given in most analysis books.
 

FAQ: Is There Proof for the Chain Rule?

What is the chain rule?

The chain rule is a mathematical concept that allows us to find the derivative of a composite function, where one function is inside another. It helps us to calculate the rate of change of a dependent variable with respect to an independent variable.

Why do we need the chain rule?

The chain rule is an important tool in calculus because many real-world situations involve multiple functions working together. For example, in physics and engineering, we often need to find the rate of change of a quantity that depends on several variables. The chain rule helps us to do this efficiently.

How does the chain rule work?

The chain rule involves taking the derivative of the outer function and multiplying it by the derivative of the inner function. In other words, we take the derivative of the outer function as if the inner function were a constant, and then multiply it by the derivative of the inner function.

Can you give an example of the chain rule?

Sure, let's say we have the function f(x) = (x^2 + 3x)^2. To find the derivative of this function, we can use the chain rule. First, we identify the outer function as ( )^2 and the inner function as x^2 + 3x. Then, we take the derivative of the outer function, which is 2( ) and multiply it by the derivative of the inner function, which is 2x + 3. This gives us the final derivative of f(x) = 2( ) * (2x + 3) = 4x^2 + 12x + 6.

Is the chain rule applicable to all functions?

Yes, the chain rule can be applied to any composite function, as long as the derivatives of the individual functions exist. However, it can become more complex when dealing with multivariable functions or functions with multiple layers of composition. In these cases, it may require more advanced techniques, but the basic principle of the chain rule still applies.

Similar threads

Replies
5
Views
2K
Replies
2
Views
2K
Replies
6
Views
3K
Replies
4
Views
1K
Replies
3
Views
3K
Replies
4
Views
2K
Replies
5
Views
2K
Back
Top