How Do I Apply the Chain Rule for Second Order Partial Derivatives?

In summary: F/dudv = (ddF/dxdx*dx/du*dx/dv + ddF/dxdy*dy/du*dx/dv + ddF/dxdz*dz/du*dx/dv + ddF/dydx*dx/du*dy/dv + ddF/dydy*dy/du*dy/dv + ddF/dydz*dz/du*dy/dv + ddF/dzdx*dx/du*dz/dv + ddF/dzdy*dy/du*dz/dv + ddF/dzdz*dz/du*dz/dv) + (same for y) + (same for z)But, in my last post, I showed that
  • #1
Chuck37
52
0
I have a function F(u,v) that I need to get first and second order partial derivatives for (Gradient and Hessian). F(u,v) is very ugly, so I'm thinking of it like F(x,y,z) where I have another function [x,y,z]=G(u,v).

So, I got my first orders, e.g.:

dF/du = dF/dx*dx/du + dF/dy*dy/du + dF/dz*dz/du

Defining X=[x y z] and U=[u v] I can formulate this in vector notation:

dF/dU = dF/dX * Jacobian(X(u,v))

at least I think I can. It seems to be working.

Now I need the second orders of F with respect to [u,v]. What I really need is the 2x2 Hessian matrix. I'm not totally sure how to proceed. I plowed through and got all my partials of F with respect to [x,y,z], but I'm not sure how to apply the chain rule or its equivalent either in scalar or matrix/vector notations.

Can anyone help? (If nothing else, how do you write out ddF/dudv in terms of partials of F(x,y,z) and G(u,v)?)
 
Physics news on Phys.org
  • #2
I got notification that someone replied, but I don't see anything here. I wonder if it got moderated away for some reason?

I think I worked out this much, sort of an extended chain rule for partials (in lousy text notation):

ddF/dudv = (ddF/dxx*dx/du*dx/dv + dF/dx*ddx/dudv) + (same for y) + (same for z)

It's a pain but I can compute all these terms. I can't really see a way to formulate this in terms of gradients, Jacobians and Hessians.
 
  • #3
Okay, Chuck!
I did make a reply, but it was riddled with errors, so I deleted it.

I'm working on a better one! :smile:
 
  • #4
The simplest way to do this is by using summation/index notation.

Suppose the index i=1,2,3. Then [itex]F_{i}[/itex] denotes a three-dimensional vector.
If the index j=1,2, then [itex]G_{ij}[/itex] is a 3*2 matrix.

The notation [tex]F_{,i}, i=1,2,3[/tex] means the vector with components:
[tex]F_{,1}=\frac{\partial{F}}{\partial{x}_{1}}[/tex]
and so on (the comma in front of the index means you differentiate with respect to x_1!)

We may therefore identify [itex]G_{i,j}[/itex] as an i*j MATRIX, with, for example the (1,2)-entry being:
[tex]G_{1,2}=\frac{\partial{G}_{1}}{\partial{x}_{2}}[/tex]

Furthermore, products with the same index is SUMMED over that index, for example (i=1,2,3):
[tex]u_{i}v_{i}=u_{1}v_{1}+u_{2}v_{2}+u_{3}v_{3}[/tex]


Thus, in your case, we have:
[tex]F(u_{i})=f(x_{j}(u_{i})), i=1,2, j=1,2,3[/tex]
Differentiating this wrt to the u_i yields, with the chain rule:
[tex]F_{,i}=f_{,j}x_{j,i} (*)[/tex]

Now, by setting n=1,2, m=1,2,3, we may find the full second derivative (a matrix!) of F as follows:
[tex]F_{,in}=f_{,jm}x_{j,i}x_{m,n}+f_{,j}x_{j,in}[/tex]
Note that the first product, running over BOTH j and m consists of 9 terms, whereas the latter product, running merely over the j's, consist of 3 terms.

Each of these 12 "terms" are themselves 2*2-matrices in "i" and "n"
 
Last edited:
  • #5
How is it working out, Chuck?
If you have any questions to the above, just post them.
 
Last edited:
  • #6
arildno said:
How is it working out, Chuck?
If you have any questions to the above, just post them.

I'm still absorbing it a little. I think it makes sense, but I won't waste your time until I give it the proper attention. Thanks very much for your help. I'll post my questions a little later if it doesn't sink in.
 
  • #7
Is there a way to think of this as plain old matrix and vector multiplies? For example, your second to the last equation I can view as a multiply of the gradient of f() w.r.t X (a 1x3 row vector) multiplied by the Jacobian of X w.r.t [u,v] (a 3x2 matrix). Multiplying these results in a 1x2 vector gradient of f() w.r.t [u,v].

Can the last equation be thought of in similar terms? I'm having a hard time seeing it.

Part of my confusion is that I'm not certain how to interpret this:

[tex]
f_{,jm}
[/tex]

Is that a second order derivative?

Thanks.
 
  • #8
We have:
[tex]f_{,jm}=\frac{\partial^{2}f}{\partial{x}_{j}\partial{x}_{m}}[/tex]
which you certainly can regard as a 3*3 matrix.

Thus, the first set of terms can, essentially be regarded as a triple matrix product, of dimensions:
(2*3)(3*3)(3*2), yielding a (2*2) matrix.

This could also, in your terminology be regarded as the product between the transpose of the "Jacobian", the Hessian of "f" and the Jacobian.

The last set of terms involves the gradient of "f" multiplied with the respective "Hessians" of the x-variables with respect to "u" and "v"
 
Last edited:
  • #9
A few questions. If the last term is a gradient times a Jacobian (vector times matrix), how can it come out to be a 4x4 matrix in the end? Seems like it would result in a vector.

The other thing I'm trying to resolve is that I believe I solved this problem, though not in cleaner matrix format, and it was done without having to compute any "cross partials" in F(x), e.g. ddF/dxdy. For example, as in my previous post, I believe:

ddF/dudv = (ddF/dxdx*dx/du*dx/dv + dF/dx*ddx/dudv) + (same for y) + (same for z)

So I'd need to compute all the second order partials in x, e.g. ddx/dudv, but not the full Hessian in F(x).

It looks like your solution requires the full Hessian in F(x), so I wonder if the solutions contradict each other, or if they are simply different ways of getting to the same answers, or if some terms cancel out?
 
  • #10
ddF/dudv = (ddF/dxdx*dx/du*dx/dv + dF/dx*ddx/dudv) + (same for y) + (same for z)

This is simply wrong, there are a couple of other terms involved in the first parenthesis., for example:
[tex]\frac{\partial^{2}F}{\partial{y}\partial{x}}\frac{\partial{y}}{\partial{u}}\frac{\partial{x}}{\partial{v}}[/tex]

In total, you will have twelve terms, as stated.
 
  • #11
I believe you are correct. Those terms must be small in my present application since everything is working properly without them... Nonetheless, I do want to make it right.

Can you answer my other question about your second term. How does it end up as a 4x4 matrix?

Thanks for the help.
 
  • #12
It does NOT end up as a 4*4 matrix, but as a 2*2 matrix.
We have, by the mentioned summation notation:
[tex]f_{,j}x_{j,in}=\frac{\partial{f}}{\partial{x}_{1}}\frac{\partial^{2}x_{1}}{\partial{u}_{i}\partial{u}_{n}}+\frac{\partial{f}}{\partial{x}_{2}}\frac{\partial^{2}x_{2}}{\partial{u}_{i}\partial{u}_{n}}+\frac{\partial{f}}{\partial{x}_{3}}\frac{\partial^{2}x_{3}}{\partial{u}_{i}\partial{u}_{n}},i,n=1,2[/tex]
 
Last edited:
  • #13
Thanks for all the help. In my matrix vector terminology I think this is the best I can do.

In slightly abusive notation:

H{F(u,v)} = J{X(u,v)}'*H{F(x,y,z)}*J{X(u,v)} + G(x)*H{x(u,v)} + G(y)*H{y(u,v)} + G(z)*H{z(u,v)}

where H is the Hessian, J is Jacobian and G is the (3x1) gradient of F(x,y,z). X(u,v) is x,y,z as a function of u,v.

I couldn't see a way to write the last 3 terms as an elegant matrix/vector multipy.

I think this is what you have been saying all along, I just wanted to get it in a familiar notation.

Note:
J{X(u,v)} is (3x2)
H{F(x,y,z)} is (3x3)
G is (3x1) (G(x)=G(1), e.g. is scalar)
H{x(u,v)} is (2x2)
 
  • #14
That's right! :smile:
 

Related to How Do I Apply the Chain Rule for Second Order Partial Derivatives?

1. What is the chain rule?

The chain rule is a mathematical rule that allows us to find the derivative of a composite function. In other words, it helps us find the rate of change of a function that is made up of smaller functions.

2. How do you use the chain rule?

To use the chain rule, you first need to identify the composite function and its smaller functions. Then, you can use the formula: d/dx[f(g(x))] = f'(g(x)) * g'(x), where f'(x) represents the derivative of the outer function and g'(x) represents the derivative of the inner function.

3. Why is the chain rule important?

The chain rule is important because it allows us to find the derivative of complex functions that cannot be easily solved using other methods. It is also a fundamental concept in calculus and is used in many real-world applications, such as physics, engineering, and economics.

4. What is the difference between the chain rule and the product rule?

The chain rule is used to find the derivative of a composite function, while the product rule is used to find the derivative of a product of two or more functions. In other words, the chain rule deals with functions that are composed of smaller functions, while the product rule deals with functions that are multiplied together.

5. How does the chain rule relate to partial derivatives?

The chain rule is used in partial derivatives when dealing with multivariable functions. In this case, the chain rule becomes: ∂z/∂x = (∂z/∂u) * (∂u/∂x), where z is the dependent variable, u is the intermediate variable, and x is the independent variable. This allows us to find the rate of change of a multivariable function with respect to one of its variables.

Similar threads

Replies
6
Views
2K
Replies
5
Views
1K
  • Calculus
Replies
3
Views
3K
Replies
1
Views
978
Replies
3
Views
1K
Replies
1
Views
2K
Replies
6
Views
2K
Replies
2
Views
1K
Replies
1
Views
1K
Replies
1
Views
1K
Back
Top