- #1
dalcde
- 166
- 0
Is there an elegant and simple proof of the Chain Rule? Every proof I've found is complex and mind-boggling
HallsofIvy said:Right. You swept all the dirt under the "definition of infinitesmals" carpet!
In single variable calculus things usually work out if you just assume infinitesimals work like ordinary numbers. But you have to be more careful in multivariable calculus. For instance let's say you wanted to find the infinitesimal area element in polar coordinates, which isI like Serena said:I like to think of it as intuitive shorthand notation.
Seriously, do you know of an example where a proof based on infinitesimals may be wrong?
No, I have no problem with a proof based on infinitesmals- it is simply that to use infinitesmals you have to first rigorously define "infinitesmals"- and that requires some very deep logical steps.I like Serena said:I like to think of it as intuitive shorthand notation.
Seriously, do you know of an example where a proof based on infinitesimals may be wrong?
lugita15 said:In single variable calculus things usually work out if you just assume infinitesimals work like ordinary numbers. But you have to be more careful in multivariable calculus. For instance let's say you wanted to find the infinitesimal area element in polar coordinates, which is
[itex]dA=dxdy=d(rcos\theta)d(rsin\theta)=(drcos\theta-rsin\theta d\theta)(drsin\theta+rcos\theta d\theta)[/itex]. If you treat [itex]dr[/itex] and [itex]d\theta[/itex] like ordinary numbers, you will get [itex]dA=\frac{1}{2}sin2\theta(dr^{2}-(rd\theta)^{2})+cos2\theta rdrd\theta[/itex], which is completely wrong. It's only when you remember the fact that [itex]drd\theta=-d\theta dr[/itex] for differential forms (which I've always found really strange) that you get the right answer [itex]dA=rdrd\theta[/itex].
I haven't looked at the details of this argument and your counterargument, but if you need an example of when cancellation gives you the wrong results, how about this version of the chain rule? [tex]\frac{\partial f}{\partial u}=\frac{\partial f}{\partial x}\frac{\partial x}{\partial u}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial u}[/tex]I like Serena said:I'm afraid you're defining two different versions of dA here.
The first expression defines dxdy exactly, but it is not very useful, because your actual coordinates are still x and y.
Your second expression for dA is the surface element as it is in polar coordinates, but it has a different surface area.
I think the ratio between the two is the Jacobian determinant.
Basically your second expression shows how the Jacobian determinant can be calculated in a very intuitive and simple manner (another score for infinitesimals! ).
You should be able to find your minus sign somewhere in the Jacobian determinant.
Fredrik said:I haven't looked at the details of this argument and your counterargument, but if you need an example of when cancellation gives you the wrong results, how about this version of the chain rule? [tex]\frac{\partial f}{\partial u}=\frac{\partial f}{\partial x}\frac{\partial x}{\partial u}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial u}[/tex]
Fredrik said:By the way, one of the problems with the dx, dy arguments is that even when you get the right result, it doesn't tell you at what point in the domain to evaluate the function. For example, [tex]\frac{dy}{dx}=\frac{1}{\frac{dx}{dy}}[/tex] isn't the wrong result, but it's certainly less accurate than [tex](f^{-1})'(x)=\frac{1}{f'(f^{-1}(x))}.[/tex]
Your notation is appropriate when the left-hand side is the derivative of the function [itex]u\mapsto f(x(u),y(u))[/itex]. Mine is appropriate when the left-hand side is the partial derivative with respect to the first variable of the function [itex](u,v)\mapsto f(x(u,v),y(y,v))[/itex].I like Serena said:You're pulling in partial derivatives here, which are not quite infinitesimals.
Btw, I know the formula as
[tex]\frac{df}{du}=\frac{\partial f}{\partial x}\frac{dx}{du}+\frac{\partial f}{\partial y}\frac{dy}{du}[/tex]
which shows the difference between partials and infinitesimals.
Basically this shows a more intuitive notation for multivariate derivatives.
Is it? Maybe it is, but I don't think that can follow from the definition of "infinitesimal". I don't know that definition, but obviously dx and dy need to depend on each other in some way for these calculations to be valid, and I don't think that's going to be a part of the definition. You're going to need some pretty fancy definitions of dx and dy to justify interpreting dy/dx=1/(dx/dy) as a proof of the formula I posted for the derivative of an inverse function.I like Serena said:Note that the infinitesimal notation is also the proof, which is not the case with the functional notation.
Fredrik said:By the way, the notation I like the best (by far) is [tex](f\circ g)_{,i}(x) =f_{,j}(g(x))g_{j,i}(x).[/tex] (I'm using Einstein's summation convention, so there's a sum over j).
Einstein's summation convention is supposed to be used in the context of differential geometry, where the vertical position of the index informs us what type of tensor we're dealing with. In this context (no tensors involved), there's no harm in putting all the indices downstairs. The convention I'm using here is really just to not write any summation sigmas, since we can remember to always sum over those indices that appear twice. So yes, you could argue that it's not exactly Einstein's summation convention, but it's a convention that isn't different enough to deserve its own name.dalcde said:I'm not familiar with Einstein's summation convention, but I think one j should be a superscript and one subscript.
Fredrik said:Your notation is appropriate when the left-hand side is the derivative of the function [itex]u\mapsto f(x(u),y(u))[/itex]. Mine is appropriate when the left-hand side is the partial derivative with respect to the first variable of the function [itex](u,v)\mapsto f(x(u,v),y(y,v))[/itex].
Fredrik said:Why are partial derivatives "not quite infinitesimals"? Note for example that [itex]\partial f(x,y)/\partial x[/itex], the partial derivative of f with respect to the first variable, evaluated at (x,y), is equal to the ordinary derivative of the function [itex]x\mapsto f(x,y)[/itex], evaluated at x. Hm, I suppose you could say that even though we can write z=f(x,y) in both cases, dz and [itex]\partial z[/itex] would refer to two different functions. But we're still dealing with a small change in z divided by a small change in x, in both cases.
Fredrik said:By the way, the notation I like the best (by far) is [tex](f\circ g)_{,i}(x) =f_{,j}(g(x))g_{j,i}(x).[/tex] (I'm using Einstein's summation convention, so there's a sum over j).
Fredrik said:Is it? Maybe it is, but I don't think that can follow from the definition of "infinitesimal". I don't know that definition, but obviously dx and dy need to depend on each other in some way for these calculations to be valid, and I don't think that's going to be a part of the definition. You're going to need some pretty fancy definitions of dx and dy to justify interpreting dy/dx=1/(dx/dy) as a proof of the formula I posted for the derivative of an inverse function.
It's just a notation, so it can't be an enormous improvement over any notation that works. All I can tell you is what it means and what I like about it. If f is a function, then [itex]f_{,i}[/itex] denotes its partial derivative with respect to the ith variable. This is an alternative to [itex]D_if[/itex]. I don't like the notation [itex]\partial f/\partial x_i[/itex] because it gives the impression that the variable symbols we're using are somehow relevant, which they're not of course. Note that [itex]f_{,i}[/itex] is a function and [itex]f_{,i}(x)[/itex] it's value at x.I like Serena said:I don't know this notation (yet).
The wiki page on derivative shows a number of notations, but not this one.
What does it say?
Why is it your preferred notation?
And where can I find more information on it?
I don't understand what you're saying. If you meant that for each positive infinitesimal dy, there's a positive infinitesimal dx such that dy/dx=f'(x), then my questions are "what's an infinitesimal?" and "how do you know this?". What you said doesn't answer either of those questions. It also doesn't explain why dx/dy should have anything to do with the derivative of [itex]f^{-1}[/itex].I like Serena said:Let's give it a try.
I'm keeping it a bit informal, referring to x and y as scalar values as well as functions.
If necessary I can make it more formal and introduce more symbols, but I only want to know if the reasoning, possibly after some extensions, is valid as a proof.
Let y be an invertible function of x, given by y(x), and let x(y) be its inverse function.
For any points x where the function y is differentiable, and where the inverse function x is differentiable, and both are non-zero, the following holds.
For any 0 < |epsilon|, we can define a dy=epsilon, such that there is a 0 < |dx|, such that the ratio dy/dx is equal to y'(x).
In this case the inverse ratio given by dx/dy is equal to x'(y).
Qed.
Shoot!
Fredrik said:[tex]\frac{\partial (f\circ g)(x)}{\partial x_i}=\sum_{j=1}^n\frac{\partial f(g(x))}{\partial g_j}\frac{\partial g_j(x)}{\partial x_i},[/tex] where the [itex]g_j[/itex] are defined by [itex]g(x)=(g_1(x),\dots,g_n(x))[/itex]. I really don't like this notation. For example, why is the partial derivative of f with respect to the jth variable denoted by [itex]\partial f/\partial g_j[/itex] all of a sudden. The only answer I can think of is extremely ugly to me: Because we intend to evaluate that function at g(x).
But g is another function. It's not "like f" in the sense that it's not real valued, except in the special case m=1. But it's certainly a function. And in the most general case, f isn't real valued either. Suppose e.g. that [itex]f:\mathbb R^m\rightarrow\mathbb R^k[/itex] and [itex]g:\mathbb R^n\rightarrow\mathbb R^m[/itex]. The chain rule satisfied by these functions is a trivial consequence of the one discussed in my previous post. We have [tex](f\circ g)^i{}_{,\,j}(x)=(f^i\circ g)_{,\,j}(x) =f^i{}_{,\,k}(g(x)) g^k{}_{,\,j}(x).[/tex] This is by the way another reason why I like to write the chain rule in that form. It makes it trivial to derive this even more general version, and of course we can recover the one for real-valued functions of one real variable simply by setting n=m=1.I like Serena said:If that's the case I would prefer to use different symbols.
Say f(x(u)), where x and u denote vectors.
Your formula would become:
[tex]\frac{\partial f(x(u))}{\partial u_i}=\sum_{j=1}^n \frac{\partial f(x(u))}{\partial x_j}\frac{\partial x_j(u)}{\partial u_i}[/tex]
or simply:
[tex]\frac{\partial f}{\partial u_i}=\frac{\partial f}{\partial x_j}\frac{\partial x_j}{\partial u_i}[/tex]
I like this notation, because it shows that you take partial derivatives of f, which must be corrected by multiplying with the appropriate ratio between coordinates.
The use of the symbols x and u instead of g and x is also more intuitive, because using g suggests that g is a function like f, instead of just another set of coordinates.
Actually, if we're not going to do it rigorously, then we might as well use an even simpler argument. It follows immediately from the definition of the derivative that when h is small, [tex]f(x+h)\approx f(x)+hf'(x).[/tex] Let's just use this formula twice, once on g and then once on f. [tex]f(g(x+h))\approx f\big(g(x)+hg'(x)\big)\approx f(g(x))+hg'(x)f'(g(x))[/tex] This implies that [tex]\begin{align}(f\circ g)'(x) &\approx \frac{f(g(x+h))-f(g(x))}{h}\approx \frac{f(g(x))+hg'(x)f'(g(x))-f(g(x))}{h}\\ &\approx f'(g(x))g'(x).\end{align}[/tex] What's missing here is of course a proof that the error in this approximation really goes to zero when h goes to zero. But this is still a good way to see that the chain rule is "likely" to be true.Fredrik said:If we're going to suggest non-rigorous arguments instead of proofs,...
Fredrik said:Actually, if we're not going to do it rigorously, then we might as well use an even simpler argument. It follows immediately from the definition of the derivative that when h is small, [tex]f(x+h)\approx f(x)+hf'(x).[/tex] Let's just use this formula twice, once on g and then once on f. [tex]f(g(x+h))\approx f\big(g(x)+hg'(x)\big)\approx f(g(x))+hg'(x)f'(g(x))[/tex] This implies that [tex]\begin{align}(f\circ g)'(x) &\approx \frac{f(g(x+h))-f(g(x))}{h}\approx \frac{f(g(x))+hg'(x)f'(g(x))-f(g(x))}{h}\\ &\approx f'(g(x))g'(x).\end{align}[/tex] What's missing here is of course a proof that the error in this approximation really goes to zero when h goes to zero. But this is still a good way to see that the chain rule is "likely" to be true.
That's what I like the most about it. If you make a non-rigorous argument, you need to make sure that no one will mistake it for an actual proof.I like Serena said:Yes, this works too, although I dislike the approximately-symbols.
The use of those symbols makes it specifically non-rigorous.
Ah, yes, this is almost an actual proof of the formula for the derivative of an inverse function. But I don't see an equally convincing argument of that sort for the chain rule.I like Serena said:My "proof" is based on the graphical interpretation of ratios, from which it is immediately evident that the inverse has the ratio inversed.
Fredrik said:Actually, if we're not going to do it rigorously, then we might as well use an even simpler argument. It follows immediately from the definition of the derivative that when h is small, [tex]f(x+h)\approx f(x)+hf'(x).[/tex]
dimension10 said:That doesn't seem to work for all functions. Let f(x)=tan x. Then using that formula, we will get the tangent of pi/2 to be something like 3.8264459099620716 ( here )
The chain rule is a fundamental concept in calculus that explains how to take the derivative of a composite function. It states that the derivative of a composite function is equal to the derivative of the outer function multiplied by the derivative of the inner function.
The chain rule is important because it allows us to find the rate of change of complex functions that are made up of simpler functions. This is essential in many areas of science and engineering, such as physics, economics, and engineering.
The chain rule can be derived using the concept of the limit. By taking smaller and smaller intervals, we can approximate the derivative of a composite function and show that it is equal to the derivative of the outer function multiplied by the derivative of the inner function.
The elegant and simple approach to proving the chain rule involves using the limit definition of the derivative and simplifying the expression to show that it is equal to the derivative of the outer function multiplied by the derivative of the inner function.
Yes, the chain rule can be applied to any composite function, regardless of how complex it may be. As long as the function can be broken down into simpler functions, the chain rule can be used to find its derivative.