Derivatives in Rn: Exploring Linear Transformation Limits

ak416 · Jan 21, 2006

It says in my book that F:Rn->Rm is differentiable at a if there is a linear transformation l(h):Rn->Rm st lim h->o [f(a+h) - f(a) - l(h)] / [h] = 0

But can you say something like this?: The derivative of f at a is
lim h->0 ( f(a+h) - f(a) ) / [h] if this limit exists. I've seen this done in one of my solutions to an assignment, just wondering how it follows and why you can avoid the norms at the numerator of this limit.

Hurkyl · Jan 21, 2006

if there is a linear transformation l(h):Rn->Rm

You meant "if there is a linear transformation l:Rn->Rm".

The derivative of f at a is
lim x->h ( f(a+h) - f(a) ) / [h]

That limit is simply equal to (f(a+h)-f(a)) / [h]. I suspect you've omitted something essential to what you're trying to say!

twoflower · Jan 21, 2006

ak416 said:

It says in my book that F:Rn->Rm is differentiable at a if there is a linear transformation l(h):Rn->Rm st lim h->o [f(a+h) - f(a) - l(h)] / [h] = 0
But can you say something like this?: The derivative of f at a is
lim x->h ( f(a+h) - f(a) ) / [h] if this limit exists. I've seen this done in one of my solutions to an assignment, just wondering how it follows and why you can avoid the norms at the numerator of this limit.

The derivative is equal to

[tex]
D_{\overrightarrow{v}}f(a) = \lim_{t \rightarrow 0} \frac{f(a+t\overrightarrow{v}) - f(a)}{t}
[/tex]

benorin · Jan 21, 2006

ak416, are working out of Rudin? If so, read Definition 9.11.

benorin · Jan 21, 2006

Definition 9.11 of Baby Rudin reads...

Definition 9.11 Suppose E is an open set in Rⁿ, [tex]\vec{f}[/tex] maps E into R^m, and [tex]\vec{x}\in E[/tex]. If there exists a linear transformation A of Rⁿ into R^m such that

[tex]\lim_{\vec{h}\rightarrow 0}\frac{|\vec{f}(\vec{x}+\vec{h})-\vec{f}(\vec{x})-A\vec{h}|}{|\vec{h}|}=0[/tex]

then we say that [tex]\vec{f}[/tex] is differentiable a [tex]\vec{x}[/tex], and we write [tex]\vec{f}^{\prime}(\vec{x})=A.[/tex]

benorin · Jan 21, 2006

It is important to note that the R^m-norm is used in the numerator while the Rⁿ-norm is used in the denominator.

ak416 · Jan 21, 2006

Ya i just edited it, its supposed to read lim h->0 . I am working out of Spivak. What twoflower said is actually called the directional derivative of f at a. Theres actually a problem asking to show that if f is differentiable at a then the directional derivative of f at a, D_xf(a) = Df(a)(x) (which is the linear transformation at x or does he mean Df(a) times x; I am not sure). Havent attempted this yet. But just wondering if Df(a) is equal to the limit i stated. Any clarification would be greatly appreciated.

Hurkyl · Jan 21, 2006

The derivative of f at a is
lim h->0 ( f(a+h) - f(a) ) / [h] if this limit exists.

You must have read it wrong, since this cannot possibly be right. This limit is a member of Rm, whereas the derivative of f at a is a linear transformation Rn -> Rm.

ak416 · Jan 21, 2006

The example I am reffering to was some function from R^2 to R. df/dx(0,0) = 0 and df/dy(0,0) = 0. And the solution says, the function is differentiable at (0,0) iff lim(h,k)->(0,0) ( f(h,k)-f(0,0) ) / [h,k] = 0. So I am wondering if this means that in general, that limit would be the derivative.

Hurkyl said:

You must have read it wrong, since this cannot possibly be right. This limit is a member of Rm, whereas the derivative of f at a is a linear transformation Rn -> Rm.

But when you apply this linear transformation to a point in Rn, its value is a point in Rm, so is there some way of connecting this limit with the derivative?

benorin · Jan 21, 2006

Sort of; let me explain: "the derivative" (also called the total derivative) of f:Rⁿ->R^m at a is a linear transformation of Rⁿ->R^m, i.e., an nxm matrix containing mn values. The limit to which you refer is a single value. It was confusing to me, but, as I understand it, each of the nm values contained in that mxn matrix is, in fact, a certian "partial derivative" of f at a, each of which (individually) can be defined by such a limit. In particular, if you consider f:Rⁿ->R^m as being represented by a vector whose m components are real-valued functions [tex]f_i :R^n\rightarrow R[/tex] so that [tex]\vec{f}(\vec{a}) = \left<f_1(\vec{a}),f_2(\vec{a}),\ldots ,f_m(\vec{a})\right>,\mbox{ for } \vec{a}\in R^n,[/tex] then

[tex]\left( D_j f_i\right) (\vec{a}) = \lim_{t\rightarrow 0}\frac{f_i \left( \vec{a}+t\vec{e}_j\right)-f_i \left( \vec{a}\right)}{t}[/tex]

where [tex]\vec{e}_j[/tex] is the j^th standard basis vector of Rⁿ, (i.e., an n-dimensional vector having n-1 zeros and 1 in the j^th entry.

Then "the total derivative" of f at a is given by the nxm matrix [a_ij] whose entries are "partial derivatives" of f at a given by [tex]a_{ij}=\left( D_j f_i\right) (\vec{a})[/tex].

ak416 · Jan 21, 2006

Well how I understand it is that the matrix representation of Df(a) wrt to the standard normal basis is an mxn (not nxm) matrix and this matrix is called f '(a). Each entry of this matrix corresponds to a partial derivative Djfi(a). This is proved later on. But just using the definition of differentiability and derivative (the linear transformation), I am curious if the limit h->0 ( f(a+h) - f(a) ) / [h] has any connection to Df(a) or f'(a) as it does in the one dimensional case. And what is this connection?

Hurkyl · Jan 21, 2006

Well, you're taking calculus right? You (should) know how to relate f(a+h) - f(a) to (Df)(a). You also (should) know that limit, when h is restricted to certain paths.

benorin · Jan 21, 2006

I'll try...

If [tex]\vec{f} :R^n\rightarrow R^m,[/tex] then

i. the total derivative of f at a, namely

[tex]\vec{f} \ ^{\prime} (\vec{a}) = \left[\begin{array}{ccc}\left( D_1 f_1\right) (\vec{a})&\cdots & \left( D_n f_1\right) (\vec{a})\\ \vdots \ & \ddots & \vdots\\ \left( D_1 f_m\right) (\vec{a})&\cdots & \left( D_n f_m\right) (\vec{a}) \end{array}\right][/tex]

is a matrix whose entries are the partial derivatives (ii).

ii. For [tex]1\leq i\leq m,1\leq j\leq n[/tex] the partial derivative of the i^th component of [tex]\vec{f}[/tex] with respect to the j^th component of [tex]\vec{x}[/tex] evaluated at [tex]\vec{a}[/tex] is given by

[tex]\left( D_j f_i\right) (\vec{a}) = \lim_{t\rightarrow 0}\frac{f_i \left( \vec{a}+t\vec{e}_j\right)-f_i \left( \vec{a}\right)}{t}[/tex]

is a scalar.

iii. The limit you described,

[tex]\lim_{h\rightarrow 0}\frac{\vec{f}\left( \vec{a}+\vec{h}\right) - \vec{f}\left( \vec{a}\right)}{|\vec{h}|}[/tex]

is an m-dimensional vector whose entries are scalars.

ak416 · Jan 21, 2006

Thanks benorin, I have a slighly different, but equivalent definition of a partial derivative, i.e. lim h->0 ( f(a1,...,ai+h,...,an) - f(a1,...,an) ) / h (thats when f:Rn->R).

Hurkyl: I am guessing you're talking about the directional derivative, lim t->0 ( f(a+tx) - f(a) ) / t . But in general, without the "t" scalar, I would like to know how to relate that limit to Df(a). Also it seems very close to the directional derivative, what are the differences? Yes i am taking a second year calc now, so i would like to understand this. Thanks for the help btw.

Hurkyl · Jan 21, 2006

I hate spoiling answers...

You're close to my second thing. Can you find a path along which the limit of interest becomes a directional derivative?

For my first thing, doesn't the numerator look like something you could use a differential approximation for?

mathwonk · Jan 21, 2006

see if this flies: my notes from 1991:
Math 410/610: (2/20-22/91) The Total Differential of a Function

You may not realize it, but we have not yet discussed the concept of differentiability for a function of more than one variable! How can I say this when we have already discussed partial derivatives, directional derivatives and even finding the tangent plane to parametrized surfaces and to graphs of functions of several variables? The point is a subtle one, but to put it in a way that may make it easy to remember, the existence of partial derivatives only says the function is "partially differentiable", whereas to say a function is (fully) diferentiable requires that the (total) differential exists. Geometrically, the existence of a total derivative says roughly that the graph is a smooth surface, while the existence of partials only says that two special curves in the graph are smooth.

(These are not exactly the true meanings, since the word "smooth" is imprecise and has a slightly stronger connotation than we will require for the existence of a derivative, but it is reasonably close.)
The difference between having a total differential and having partial derivatives is slightly confusing, because if the total differential does exist then it is completely determined by the partial derivatives. Think of an analogous question, that of determining the equation for a surface S which passes through the origin in R^3. Suppose we know that S contains both the x-axis and the y-axis and suppose we know that S is a plane. Then it follows that S must be the (x,y) plane. But suppose we only know that S contains the x-axis and the y-axis, but we do not know that S is a plane. Does S have to be the (x,y) plane? Or is there some other surface which is not a plane but still contains the x and y-axes? In fact there are very many other such surfaces as you will realize if you think about it for a while. [For example the surface in R^3 with equation z = xy contains all points with both z=0 and y =0, hence contains the x-axis, and also all points with z=0 and x=0, hence contains the y-axis. It is not a plane however, since its equation is not linear, and in particular it is not the (x,y) plane since it does not contain the point (1,1,0).]

The point is that if you already know a surface is a plane then just knowing two lines in it tells you completely which plane it is, and allows you to write an equation for that plane. On the other hand if you don't really know whether it is a plane or not, then just knowing that there are two lines in it does not really help you know what the surface looks like in other directions. Although not perfect, this is a partial analogy of the difference between a function which has partial derivatives and a function which is (fully) differentiable. I.e. if the function is differentiable then just knowing the partial derivatives tells you what the (total) derivative is, but a function can have partial derivatives and not be differentiable at all.

The connection is this: for a function f:R^2-->R to be "differentiable at p" will turn out to mean that the graph "has a tangent plane at the point (p,f(p))". This must be carefully defined but when this done it will imply that (i) every curve in the graph of f, passing through the point (p,f(p)) and lying directly over (and parametrized by) a line through p in the (x,y) plane, has a velocity vector at (p,f(p)); and (ii) the set of all such velocity vectors lie in a common plane.

Thus at least two things can go wrong and prevent the existence of a total derivative: either some such curves have velocity vectors at (p,f(p)) and some do not, or all such curves have velocity vectors but those vectors do not all lie in a common plane. Neither of these shortcomings however need prevent the existence of partial derivatives at p, as we will see next.

For f to have partial derivatives at p simply means that at least two curves through (p,f(p)) have velocity vectors at (p,f(p)), namely the two curves in graph(f) which lie directly over the two lines parallel to the x-axis and the y axis. This does not at all guarantee that curves over lines in other directions will have velocity vectors, much less that all of the velocity vectors will lie in a common plane.

For example, if we define f(x,y) = 0 when either x or y is 0, and f(x,y) =1 when neither x nor y is 0, then this f has partial derivatives at (0,0), namely ?f/?x =0, and ?f/?y =0, (these partial detrivatives will probably not print correctly) but f is not even continuous at (0,0), and no curve through the origin in any direction except along the two axes has a velocity vector there.

\The graph looks like the set you would get if you tried to lift the (x,y) plane up one unit but somebody had glued the x and y axes down, so they stuck where they were while the rest of the plane ripped loose and came up a distance of one unit. In particular f is not differentiable at p = (0,0). Therefore this function has partials at (0,0) but does not satisfy either condition i) or condition ii) above. We shall see next that condition (i) corresponds to the existence of directional derivatives, but still need not force the existence of a total derivative.

For a function f to have a directional derivative at p in the direction v means that the curve (p+tv,f(p+tv)) in graph(f) through (p,f(p)), which lies over (and is parametrized as shown by) the line in the (x,y) plane through the vector v, has a velocity vector at t=0. Thus even if we ask more of our function f, for instance if we ask that it have directional derivatives in every direction, we get condition (i) above but not necessarily condition (ii). For instance, if we define f(x,y) = rq, where 0<=q<<pi> is the angle and -?<r<? is the radius, we get a function which has partials at (0,0), and even has directional derivatives at (0,0) in every direction, but which is not even continuous at (0,0), and such that the velocity vectors at (0,0) to curves in graph(f) in different directions do not all lie in the same plane.

This graph is even made up entirely of lines through the origin. Namely, to build it, start with the x-axis as the first line and nail it down at the origin but don't nail it anywhere else. Then take hold of the line at the point (1,0) and walk with it counterclockwise around the unit circle in the upper half of the x,y plane, lifting up on the line as you go. In every position the line still passes through the origin, but it has height q over the point at angle q on the upper half of the unit circle.

This function is not continuous, since the line drops suddenly down from height <pi> to height 0 (the x-axis) as you reach the point (-1,0), hence it cannot be differentiable. Nonetheless it has directional derivatives in every direction, since in each direction through (0,0) the graph is simply a line! The velocity vectors at ((0,0),f(0,0)) to curves in different directions in the graph also do not lie in a plane, so that again f is not differentiable since condition (ii) fails. You can make a variation on this example which is continuous but still fails to satisfy condition (ii), by starting out the same but when you get to angle <pi>/4, start letting the line down again so that at angle <pi>/2 it becomes the y axis.

Then just do exactly the same thing over again in the second quadrant. This gives a function which is continuous at (0,0), whose partials are both zero at (0,0), which has (non zero) directional derivatives at (0,0) in every other direction, but which is not differentiable at (0,0).

The upshot of all this is that we must define carefully what it means for f(x,y) to be differentiable at p, essentially by requiring that the graph have a tangent plane at (p,f(p)). It will then follow that every curve in the graph through (p,f(p)), and lying directly over a line through p in the (x,y) plane, has a tangent line at (p,f(p)) and that all these lines lie in the tangent plane. The basic facts are the following:

If f:R^k-->R^n is "differentiable at p", then it also has partial derivatives at p. The matrix [f'(p)] whose columns are the vector partials, is called the Jacobian matrix of f at p. f will also have directional derivatives at p in every direction v in R^k, and in fact Dvf(p) can be computed by multiplying the column vector v by the Jacobian matrix; i.e. we have the formula:

Dvf(p) = [f'(p)][v], for every v in R^k, where [f'(p)] is the matrix whose columns are the vector partials of f. Since multiplication by a matrix is a "homomorphism" i.e. a linear map, so that [f'(p)][v+w] = [f'(p)][v] + [f'(p)][w], we get as a corollary the formula Dv+wf(p) = Dvf(p) + Dwf(p) for the directional derivatives of a differentiable function f.

The affine linear function A(x) = f(p) + [f'(p)][x-p], is "tangent to f at p", in the sense that its graph is the unique k-plane in R^(k+n) which is tangent at (p,f(p)) to the graph of f. For values of x near p, this is the best affine linear function to use for approximating f.

It still remains for us to give a precise mathematical definition of the statement "f is differentiable at p" in a way that lives up to our intuition that it should mean that the graph of f has a tangent space at (p,f(p)). We do this as follows:

1) Define a function ø(t) to be "tangent to zero" (at t=0), if the ratio
||ø(t)|| / ||t|| -->0, as ||t||-->0. By looking at a picture in two variables we can see that this means that the graph is tangent to the (x,y) axis, which is the graph of the zero function.

2) Next define two functions f,g to be tangent to each other at p, if the difference f(p+t)-g(p+t) is tangent to zero. This means essentially that f(p) = g(p) and their graphs are tangent to each other at the common point (p,f(p)).

3) Last of all define a function f to be "differentiable at p" if f is tangent to some affine linear function at p, i.e. if and only if there is some homogeneous linear function L(t), such that f(x) is tangent to the affine linear function A(x) = f(p) + L(x-p) at p. This says that for some linear function L, the ratio ||f(x)-f(p)-L(x-p)||/||x-p|| -->0, as x-->p. If we use the symbol t with t = x-p, then x = p+t, and the statement says that ||f(p+t)-f(p)-L(t)||/||t||-->0, as t-->0.

We could have given definition 3) without any of the previous definitions or any of the earlier discussion, but I hope this way of doing things has made it more understandable. Be aware however, that when it comes to the question of memorizing the definition of "differentiable", it is all contained in this sentence:

Definition: f:R^k-->R^n is "differentiable at p" if and only if there is a (homogeneous) linear function L:R^k-->R^n such that the ratio ||f(p+t)-f(p)-L(t)||/||t||-->0, as t-->0.

If this is the case then L is called the (total) differential of f at p. L is denoted by the symbol dpf. The function A(x) = f(p) + L(x-p) = f(p) + (dpf)(x-p), is called the best affine approximation to f at p. The graph of A is the tangent space to the graph of f at (p,f(p)).

In analogy with the notation Dy from one variable, we can define Dpf(t) = f(p+t)-f(p). Then f is differentiable at p if there is a linear map dpf(t) which is tangent to Dpf(t) at t=0.

ak416 · Jan 21, 2006

Well is there any difference between, lim h->0 f(a+h) and lim t->0 f(a+th) ? and what is the difference between making the denominator a scalar and making it the magnitude of a vector? Because other than that theyre the same.

Also, thanks for the notes mathwonk, Ill try to read them if i have further problems.

mathwonk · Jan 21, 2006

Well is there any difference between, lim h->0 f(a+h) and lim t->0 f(a+th) ?

yes.

and what is the difference between making the denominator a scalar and making it the magnitude of a vector?

well picture it. in a derivative you are comparing the change in x to the change in f(x). so if x is a vector you have to take its norm, but if you are fixing x and looking only at multiples of x as tx, then the size of x can be measured by looking at the size of t.

try to understand what the symbols mean, instead if just staring at t's and v's and x's, and everything will be revealed to you.

Because other than that theyre the same.

Also, thanks for the notes mathwonk, Ill try to read them if i have further problems.

my pleasure but as usual when i provide too much, it does not get read i fear.

may i suggest you read them whether or not you feel you have problems. maybe you will learn something.

or do it as a favor to me, as I would gbreatly appreciate the feedback.

thanks

*

ak416 · Jan 22, 2006

Ok here's how i see it. The directional derivative lim t->0 ( f(a+tx) - f(a) ) / t only applies to f:Rn->R and its the rate of change of f as you move in the direction given by the vector x (arrow from the origin to the point x shows this direction). And the lim h->0 (f(a+h)-f(a))/h only applies to f:R->Rn and it gives you the tangent vector to the curve in Rn at a. Now if you take this and go from R2->Rn, and take the lim h->0 (f(a+h)-f(a)) / [h] you would get some tangent vector to a surface, but in what direction would this vector be pointing in? Or could you still think of it as a curve where the direction in which the curve is going is determined by a function that takes R2->R ? Anyways that's just to visualize. What I really want to know is if this general limit h->0 (f(a+h)-f(a))/[h] where f:Rm->Rn is it equal to f'(a)? and if not then what is its algebraic connection to the derivative which as you know is the linear transformation Df(a) from Rm->Rn making the limit h->0 [f(a+h) - f(a) - Df(a)(h)]/[h] equal 0. Hope this thread doesn't get too long...

Derivatives in Rn: Exploring Linear Transformation Limits

FAQ: Derivatives in Rn: Exploring Linear Transformation Limits

What are derivatives in Rn?

How are derivatives in Rn calculated?

What is the significance of derivatives in Rn?

How are derivatives in Rn related to linear transformations?

What is the limit of a derivative in Rn?

Similar threads

Hot Threads

Recent Insights