How Do I Apply the Chain Rule for Second Order Partial Derivatives?

Chuck37 · Mar 8, 2010

I have a function F(u,v) that I need to get first and second order partial derivatives for (Gradient and Hessian). F(u,v) is very ugly, so I'm thinking of it like F(x,y,z) where I have another function [x,y,z]=G(u,v).

So, I got my first orders, e.g.:

dF/du = dF/dx*dx/du + dF/dy*dy/du + dF/dz*dz/du

Defining X=[x y z] and U=[u v] I can formulate this in vector notation:

dF/dU = dF/dX * Jacobian(X(u,v))

at least I think I can. It seems to be working.

Now I need the second orders of F with respect to [u,v]. What I really need is the 2x2 Hessian matrix. I'm not totally sure how to proceed. I plowed through and got all my partials of F with respect to [x,y,z], but I'm not sure how to apply the chain rule or its equivalent either in scalar or matrix/vector notations.

Can anyone help? (If nothing else, how do you write out ddF/dudv in terms of partials of F(x,y,z) and G(u,v)?)

Chuck37 · Mar 9, 2010

I got notification that someone replied, but I don't see anything here. I wonder if it got moderated away for some reason?

I think I worked out this much, sort of an extended chain rule for partials (in lousy text notation):

ddF/dudv = (ddF/dxx*dx/du*dx/dv + dF/dx*ddx/dudv) + (same for y) + (same for z)

It's a pain but I can compute all these terms. I can't really see a way to formulate this in terms of gradients, Jacobians and Hessians.

arildno · Mar 9, 2010

Okay, Chuck!
I did make a reply, but it was riddled with errors, so I deleted it.

I'm working on a better one!

arildno · Mar 9, 2010

The simplest way to do this is by using summation/index notation.

Suppose the index i=1,2,3. Then [itex]F_{i}[/itex] denotes a three-dimensional vector.
If the index j=1,2, then [itex]G_{ij}[/itex] is a 3*2 matrix.

The notation [tex]F_{,i}, i=1,2,3[/tex] means the vector with components:
[tex]F_{,1}=\frac{\partial{F}}{\partial{x}_{1}}[/tex]
and so on (the comma in front of the index means you differentiate with respect to x_1!)

We may therefore identify [itex]G_{i,j}[/itex] as an i*j MATRIX, with, for example the (1,2)-entry being:
[tex]G_{1,2}=\frac{\partial{G}_{1}}{\partial{x}_{2}}[/tex]

Furthermore, products with the same index is SUMMED over that index, for example (i=1,2,3):
[tex]u_{i}v_{i}=u_{1}v_{1}+u_{2}v_{2}+u_{3}v_{3}[/tex]

Thus, in your case, we have:
[tex]F(u_{i})=f(x_{j}(u_{i})), i=1,2, j=1,2,3[/tex]
Differentiating this wrt to the u_i yields, with the chain rule:
[tex]F_{,i}=f_{,j}x_{j,i} (*)[/tex]

Now, by setting n=1,2, m=1,2,3, we may find the full second derivative (a matrix!) of F as follows:
[tex]F_{,in}=f_{,jm}x_{j,i}x_{m,n}+f_{,j}x_{j,in}[/tex]
Note that the first product, running over BOTH j and m consists of 9 terms, whereas the latter product, running merely over the j's, consist of 3 terms.

Each of these 12 "terms" are themselves 2*2-matrices in "i" and "n"

arildno · Mar 10, 2010

How is it working out, Chuck?
If you have any questions to the above, just post them.

Chuck37 · Mar 10, 2010

arildno said:

How is it working out, Chuck?
If you have any questions to the above, just post them.

I'm still absorbing it a little. I think it makes sense, but I won't waste your time until I give it the proper attention. Thanks very much for your help. I'll post my questions a little later if it doesn't sink in.

Chuck37 · Mar 11, 2010

Is there a way to think of this as plain old matrix and vector multiplies? For example, your second to the last equation I can view as a multiply of the gradient of f() w.r.t X (a 1x3 row vector) multiplied by the Jacobian of X w.r.t [u,v] (a 3x2 matrix). Multiplying these results in a 1x2 vector gradient of f() w.r.t [u,v].

Can the last equation be thought of in similar terms? I'm having a hard time seeing it.

Part of my confusion is that I'm not certain how to interpret this:

[tex]
f_{,jm}
[/tex]

Is that a second order derivative?

Thanks.

arildno · Mar 12, 2010

We have:
[tex]f_{,jm}=\frac{\partial^{2}f}{\partial{x}_{j}\partial{x}_{m}}[/tex]
which you certainly can regard as a 3*3 matrix.

Thus, the first set of terms can, essentially be regarded as a triple matrix product, of dimensions:
(2*3)(3*3)(3*2), yielding a (2*2) matrix.

This could also, in your terminology be regarded as the product between the transpose of the "Jacobian", the Hessian of "f" and the Jacobian.

The last set of terms involves the gradient of "f" multiplied with the respective "Hessians" of the x-variables with respect to "u" and "v"

Chuck37 · Mar 12, 2010

A few questions. If the last term is a gradient times a Jacobian (vector times matrix), how can it come out to be a 4x4 matrix in the end? Seems like it would result in a vector.

The other thing I'm trying to resolve is that I believe I solved this problem, though not in cleaner matrix format, and it was done without having to compute any "cross partials" in F(x), e.g. ddF/dxdy. For example, as in my previous post, I believe:

ddF/dudv = (ddF/dxdx*dx/du*dx/dv + dF/dx*ddx/dudv) + (same for y) + (same for z)

So I'd need to compute all the second order partials in x, e.g. ddx/dudv, but not the full Hessian in F(x).

It looks like your solution requires the full Hessian in F(x), so I wonder if the solutions contradict each other, or if they are simply different ways of getting to the same answers, or if some terms cancel out?

arildno · Mar 12, 2010

ddF/dudv = (ddF/dxdx*dx/du*dx/dv + dF/dx*ddx/dudv) + (same for y) + (same for z)

This is simply wrong, there are a couple of other terms involved in the first parenthesis., for example:
[tex]\frac{\partial^{2}F}{\partial{y}\partial{x}}\frac{\partial{y}}{\partial{u}}\frac{\partial{x}}{\partial{v}}[/tex]

In total, you will have twelve terms, as stated.

Chuck37 · Mar 12, 2010

I believe you are correct. Those terms must be small in my present application since everything is working properly without them... Nonetheless, I do want to make it right.

Can you answer my other question about your second term. How does it end up as a 4x4 matrix?

Thanks for the help.

arildno · Mar 13, 2010

It does NOT end up as a 4*4 matrix, but as a 2*2 matrix.
We have, by the mentioned summation notation:
[tex]f_{,j}x_{j,in}=\frac{\partial{f}}{\partial{x}_{1}}\frac{\partial^{2}x_{1}}{\partial{u}_{i}\partial{u}_{n}}+\frac{\partial{f}}{\partial{x}_{2}}\frac{\partial^{2}x_{2}}{\partial{u}_{i}\partial{u}_{n}}+\frac{\partial{f}}{\partial{x}_{3}}\frac{\partial^{2}x_{3}}{\partial{u}_{i}\partial{u}_{n}},i,n=1,2[/tex]

Chuck37 · Mar 16, 2010

Thanks for all the help. In my matrix vector terminology I think this is the best I can do.

In slightly abusive notation:

H{F(u,v)} = J{X(u,v)}'*H{F(x,y,z)}*J{X(u,v)} + G(x)*H{x(u,v)} + G(y)*H{y(u,v)} + G(z)*H{z(u,v)}

where H is the Hessian, J is Jacobian and G is the (3x1) gradient of F(x,y,z). X(u,v) is x,y,z as a function of u,v.

I couldn't see a way to write the last 3 terms as an elegant matrix/vector multipy.

I think this is what you have been saying all along, I just wanted to get it in a familiar notation.

Note:
J{X(u,v)} is (3x2)
H{F(x,y,z)} is (3x3)
G is (3x1) (G(x)=G(1), e.g. is scalar)
H{x(u,v)} is (2x2)

arildno · Mar 16, 2010

That's right!

How Do I Apply the Chain Rule for Second Order Partial Derivatives?

Related to How Do I Apply the Chain Rule for Second Order Partial Derivatives?

1. What is the chain rule?

2. How do you use the chain rule?

3. Why is the chain rule important?

4. What is the difference between the chain rule and the product rule?

5. How does the chain rule relate to partial derivatives?

Similar threads

Hot Threads

Recent Insights