What is the Coordinate-Free Formulation of the Hessian?

  • I
  • Thread starter ergospherical
  • Start date
  • Tags
    Hessian
In summary: This new derivative should be similar to the original derivative except that the derivatives should be taken with respect to the new coordinates. This suggests that the covariant derivative of a function at a point should be the same as the derivative of the function with respect to the new coordinates that are related to the original coordinates by a connection. In summary, local coordinates give us a hessian of the function, which is covariant if the affine connection is compatible with a Riemannian metric. The hess
  • #1
ergospherical
1,055
1,347
In local coordinates, the hessian of the function ##f## at point ##p## is ##H = \partial_i \partial_k f dx^i \otimes dx^k##. A coordinate-free generalisation is (see) ##H = \nabla df##, or explicitly ##H = \nabla_i (df)_k dx^i \otimes dx^k = \nabla_i \partial_k f dx^i \otimes dx^k##. How is this motivated?
 
  • Like
Likes lavinia
Physics news on Phys.org
  • #2
ergospherical said:
In local coordinates, the hessian of the function ##f## at point ##p## is ##H = \partial_i \partial_k f dx^i \otimes dx^k##. A coordinate-free generalisation is (see) ##H = \nabla df##, or explicitly ##H = \nabla_i (df)_k dx^i \otimes dx^k = \nabla_i \partial_k f dx^i \otimes dx^k##. How is this motivated?
How is what motivated?
 
  • #3
I'm curious to know why ##H = \nabla df## is the correct generalisation of the usual expression (for local coordinates).
 
  • #4
I am not sure if this qualifies but from Lee's Introduction to Riemannian Manifolds, if we have a covariant derivative operator, ##\nabla##, then ##\nabla f## is just the 1-form ##df## because both have the same action on tangent vectors.

$$ \nabla f(X) = \nabla_X f = Xf = df(X) $$

The 2- tensor ##\nabla^2 f## is called the covariant Hessian of f and

$$\nabla^2 f = \nabla(df)$$

The last two formulas are just local coordinate formulas for the above which can be computed from the standard formulas for the covariant derivative.

Lastly, it's action on two tangent vectors is given by,
$$\nabla^2f(Y,X) = \nabla^2_{X,Y}f = \nabla_X(\nabla_Y f)-\nabla_{(\nabla_X Y)}f = Y(Xf)-(\nabla_Y X)f$$
 
Last edited by a moderator:
  • #5
jbergman said:
Lastly, it's action on two tangent vectors is given by,
$$\nabla^2f(Y,X) = \nabla^2_{X,Y}f = \nabla_X(\nabla_Y f)-\nabla_{(\nabla_X Y)}f = Y(Xf)-(\nabla_Y X)f$$
Correction to the last line, it should be,

$$\nabla^2f(Y,X) = \nabla^2_{X,Y}f = \nabla_X(\nabla_Y f)-\nabla_{(\nabla_X Y)}f = X(Yf)-(\nabla_Y X)f$$
 
  • Like
Likes ergospherical
  • #6
jbergman said:
Correction to the last line, it should be,

$$\nabla^2f(Y,X) = \nabla^2_{X,Y}f = \nabla_X(\nabla_Y f)-\nabla_{(\nabla_X Y)}f = X(Yf)-(\nabla_Y X)f$$
Messed it up again... There must be a better way to remember the formula.
$$\nabla^2f(Y,X) = \nabla^2_{X,Y}f = \nabla_X(\nabla_Y f)-\nabla_{(\nabla_X Y)}f = X(Yf)-(\nabla_X Y)f$$
 
  • #7
ergospherical said:
In local coordinates, the hessian of the function ##f## at point ##p## is ##H = \partial_i \partial_k f dx^i \otimes dx^k##. A coordinate-free generalisation is (see) ##H = \nabla df##, or explicitly ##H = \nabla_i (df)_k dx^i \otimes dx^k = \nabla_i \partial_k f dx^i \otimes dx^k##. How is this motivated?
I may be off, and I apologize if I am but my read on this question is that you've overlooked the fact that ##\nabla df## is not just a fancy coordinate-free generalization of the hessian, it is a coordinate independent version of the hessian. I.e. sure you can pick coordinates around a point and look at ##H = \partial_i \partial_k f dx^i \otimes dx^k## but this quantity is meaningless because you can change the coordinates and have it turn into whatever you want. Try it yourself: how does the quantity ##\partial_i \partial_k f dx^i \otimes dx^k## transforms when you look at it from another set of coordinates, say ##(y^i)##?

But the hessian was a pretty good ally... its index at the critical points of f told us about their nature (Morse Lemma). So maybe the hessian is worth fighting for. And by that I mean find a coordinate-independent entity which generalizes the Hessian and maybe share some of its nice properties. We could try looking at ##d^2f## since we know this will be coordinate independent, but no luck, ##d^2=0## always. Turns out we need a connection to make sense of second derivatives. Not too surprising since a connection is the apparatus whose raison d'être is to relate tangent vectors at different points and taking a second derivative involves looking at derivatives defined at different points. So the natural candidate seems to be ##\nabla df##. This a covariant 2-tensor, it is symmetric if ##\nabla## is symmetric and, in local coordinates, it is ##\partial_i \partial_k f dx^i \otimes dx^k## if ##\nabla## is flat.
 
  • Like
Likes ergospherical and jbergman
  • #8
This thread got me thinking about how one might arrive conceptually at the definition of the covariant derivative of a 1 form given an affine connection on the tangent bundle. Here are some thoughts. I apologize in advance for any errors.

One can perhaps motivate the covariant derivative of a 1 form by first taking the case where the affine connection is compatible with a Riemannian metric.

If ##w## is parallel along a curve fitting ##X## and the vector field ##Y## is also parallel then one would want ##w(Y)## to be constant. So the derivative ##X⋅<w_{*},Y>## must equal zero and metric compatibility implies that ##<∇_{X}w_{*},Y>## is also zero. So the dual of ##w## under the Riemannian metric must be parallel along the curve. This suggests that a good definition of the covariant derivative of ##w## is the metric dual of the covariant derivative of its metric dual vector field.

This yields the formula ## (∇_{X}w)(Y)= X⋅w(Y)-w(∇_{X}Y)##

Since this formula does not involve a metric it suggests a definition of covariant derivative of a 1 form for any affine connection.

A similar line of thought might be to start with a set of coordinates for the tangent space at a point and ask how does one extend these coordinates along a curve so that measurement of coordinates of a vector field will not depend on changes in the coordinate 1 forms themselves but only on changes in the vector field. As in standard coordinates in Euclidean space, this can be done by setting the coordinates of the derivative of a vector field to be the derivative of its coordinates. Formally, ##X.w(Y)=w(∇_{X}Y)## for each coordinate 1 form.

When ##w## is not parallel ##X⋅w(Y) - w(∇_{X}Y)## is a 1 form in ##Y## and this again suggests the formula ## (∇_{X}w)(Y)= X⋅w(Y)-w(∇_{X}Y)##

To justify this, it must be shown that ##∇_{X}w## satisfies the definition of a covariant derivative.

Definition

A covariant derivative at a point ##p## on the cotangent bundle of a manifold assigns for each tangent vector ##X_{p}## and each 1 form ##w## a cotangent vector at ##p##. This assignment is linear in both tangent vectors at ##p## and in 1 forms and satisfies the Leibniz rule, ##
_(X_{p})fw = (X_{p}.f)w_{p} + f_{X_{p}}w##. An affine connection is a covariant derivative at each point and is smooth in the sense that the covariant derivatives of a smooth 1 form with respect to a smooth vector field is also a smooth 1 form.

These properties are easily verified.

E.g. by the Leibniz rule, ##(∇_{X}fw)(Y) =X⋅(fw)(Y)- fw(∇_{X}Y)= (X⋅f)w(Y)+f(X⋅w(Y) - fw(∇_{X}Y##

##∇_{X}w## also determines an affine connection on the cotangent bundle.
 
Last edited:
  • #9
lavinia said:
This thread got me thinking about how one might arrive conceptually at the definition of the covariant derivative of a 1 form given an affine connection on the tangent bundle. Here are some thoughts. I apologize in advance for any errors.

One can perhaps motivate the covariant derivative of a 1 form by first taking the case where the affine connection is compatible with a Riemannian metric.

If ##w## is parallel along a curve fitting ##X## and the vector field ##Y## is also parallel then one would want ##w(Y)## to be constant. So the derivative ##X⋅<w_{*},Y>## must equal zero and metric compatibility implies that ##<∇_{X}w_{*},Y>## is also zero. So the dual of ##w## under the Riemannian metric must be parallel along the curve. This suggests that a good definition of the covariant derivative of ##w## is the metric dual of the covariant derivative of its metric dual vector field.

This yields the formula ## (∇_{X}w)(Y)= X⋅w(Y)-w(∇_{X}Y)##

Since this formula does not involve a metric it suggests a definition of covariant derivative of a 1 form for any affine connection.

A similar line of thought might be to start with a set of coordinates for the tangent space at a point and ask how does one extend these coordinates along a curve so that measurement of coordinates of a vector field will not depend on changes in the coordinate 1 forms themselves but only on changes in the vector field. As in standard coordinates in Euclidean space, this can be done by setting the coordinates of the derivative of a vector field to be the derivative of its coordinates. Formally, ##X.w(Y)=w(∇_{X}Y)## for each coordinate 1 form.

When ##w## is not parallel ##X⋅w(Y) - w(∇_{X}Y)## is a 1 form in ##Y## and this again suggests the formula ## (∇_{X}w)(Y)= X⋅w(Y)-w(∇_{X}Y)##

To justify this, it must be shown that ##∇_{X}w## satisfies the definition of a covariant derivative.

Definition

A covariant derivative at a point ##p## on the cotangent bundle of a manifold assigns for each tangent vector ##X_{p}## and each 1 form ##w## a cotangent vector at ##p##. This assignment is linear in both tangent vectors at ##p## and in 1 forms and satisfies the Leibniz rule, ##
_(X_{p})fw = (X_{p}.f)w_{p} + f_{X_{p}}w##. An affine connection is a covariant derivative at each point and is smooth in the sense that the covariant derivatives of a smooth 1 form with respect to a smooth vector field is also a smooth 1 form.

These properties are easily verified.

E.g. by the Leibniz rule, ##(∇_{X}fw)(Y) =X⋅(fw)(Y)- fw(∇_{X}Y)= (X⋅f)w(Y)+f(X⋅w(Y) - fw(∇_{X}Y##

##∇_{X}w## also determines an affine connection on the cotangent bundle.
You only need the Leibniz rule and the action of the covariant derivative on scalars and tangent vectors. There is nowhere where metric compatibility enters since cotangent space is the dual of the tangent space. You do not need a metric to do anything of what you describe. Simply define
$$
(\nabla_Y w)(X) = Y[w(X)] - w(\nabla_Y X)
$$

quasar987 said:
This a covariant 2-tensor, it is symmetric if ∇ is symmetric and, in local coordinates, it is ∂i∂kfdxi⊗dxk if ∇ is flat.
This is not the case. You are thinking of the case where local coordinates are chosen such that the connection coefficients vanish. While this can be accomplished in a flat space, it is not necessary for all local coordinate systems in a flat space. For example, consider polar coordinates in the two-dimensional Euclidean space.
 

FAQ: What is the Coordinate-Free Formulation of the Hessian?

What is a "coordinate-free Hessian"?

A coordinate-free Hessian is a mathematical concept used in multivariable calculus and optimization. It is a matrix of second-order partial derivatives that describes the curvature of a function or surface at a given point, without reference to any specific coordinate system.

How is a coordinate-free Hessian different from a regular Hessian?

A regular Hessian is calculated using specific coordinates, while a coordinate-free Hessian is calculated using a coordinate-free approach. This means that the coordinate-free Hessian is independent of any particular coordinate system, making it more versatile and applicable to a wider range of problems.

What are the advantages of using a coordinate-free Hessian?

One advantage of using a coordinate-free Hessian is that it simplifies the calculations involved in multivariable calculus and optimization problems. It also allows for a more general and abstract understanding of the underlying concepts, making it easier to apply to different scenarios.

How is a coordinate-free Hessian used in optimization?

In optimization, the coordinate-free Hessian is used to determine whether a critical point is a minimum, maximum, or saddle point. It is also used to find the direction of steepest descent or ascent, which can then be used to optimize a function.

Can a coordinate-free Hessian be used in any coordinate system?

Yes, a coordinate-free Hessian can be used in any coordinate system, as it is independent of any specific coordinates. This makes it a powerful tool in mathematics and science, as it can be applied to a wide range of problems regardless of the coordinate system used.

Similar threads

Replies
5
Views
2K
Replies
36
Views
2K
Replies
1
Views
2K
Replies
48
Views
5K
Replies
41
Views
3K
Replies
11
Views
1K
Replies
2
Views
2K
Replies
33
Views
5K
Back
Top