Understanding the Hellman-Feynman Theorem

Derivator · Jun 30, 2011

Hi,

according to the Hellamnn-Feynman-Theorem:

if
[itex]E(\lambda):=<\Psi(\lambda)|H(\lambda)|\Psi(\lambda)>[/itex]
then
[itex]\frac{d E}{d\lambda} = <\Psi(\lambda)|\frac{d H(\lambda)}{d\lambda}|\Psi(\lambda)>[/itex]

if [itex]\lambda[/itex] is an atomic coordinate, for instance the positions of a nucleus and H is the electronic Hamiltonian, the negative force on the nucleus is given by the above expression.

Classically one calculates the force on a particle by the negative gradient of the potential energy, thus the above formula is equivalent to classical physics (since in the Born-Oppenheimer approximation, the potential in which the nuclei move is given by the expectation value of the electronic Hamiltonian.)
Despite this analogy, it is not clear to me, why the quantum mechanical force is given by [itex]\frac{d E}{d\lambda}[/itex]. Can one justify this?

alxm · Jun 30, 2011

http://en.wikipedia.org/wiki/Hellmann–Feynman_theorem

SpectraCat · Jun 30, 2011

Derivator said:

Hi,

according to the Hellamnn-Feynman-Theorem:

if
[itex]E(\lambda):=<\Psi(\lambda)|H(\lambda)|\Psi(\lambda)>[/itex]
then
[itex]\frac{d E}{d\lambda} = <\Psi(\lambda)|\frac{d H(\lambda)}{d\lambda}|\Psi(\lambda)>[/itex]

if [itex]\lambda[/itex] is an atomic coordinate, for instance the positions of a nucleus and H is the electronic Hamiltonian, the negative force on the nucleus is given by the above expression.

Classically one calculates the force on a particle by the negative gradient of the potential energy, thus the above formula is equivalent to classical physics (since in the Born-Oppenheimer approximation, the potential in which the nuclei move is given by the expectation value of the electronic Hamiltonian.)
Despite this analogy, it is not clear to me, why the quantum mechanical force is given by [itex]\frac{d E}{d\lambda}[/itex]. Can one justify this?

That's not how the Hellman Feynman theorem works. The [itex]\lambda[/itex] is a constant *parameter* in the Hamiltonian, on which the energy (and possibly the wavefunction) depends. The expression on the RHS in your example is an integral over position (typically). The parameter that you are differentiating with respect to cannot be the variable of integration, nor can it be a function of the variable of integration. The Hellman-Feynman theorem is an example of "differentiating under the integral", a mathematical technique that says basically what I described above .. the expression:

[tex]\frac{dA}{d\lambda}=\int{\frac{dB(x)}{d\lambda}dx}[/tex]

is valid provided the parameter [itex]\lambda[/itex] doesn't depend on the variable of integration x (there are some other technical mathematical requirements as well, but since we are already talking about wavefunctions relevant to physical systems those are generally satisfied).

So your example never arises, because you can't differentiate with respect to position. Furthermore, the LHS is the total energy, not the potential energy, so the gradient (which it is worth noting is NOT the same as a derivative) wouldn't give the force anyway.

Derivator · Jul 1, 2011

hi alxm and SpectraCat,

i fear, you got my post totally wrong. I know what the Hellmann-Feynman-Theorem is, and how it works. That's not the point. Unfortunately from knowing the Hellmann-Feynman-Theorem, it doesn't follow, that interatomic forces (to be more precise: forces acting on the nuclei) are given by the derivative of E=<H> with respect to the nuclear coordinates.

SpectraCat said:

...The [itex]\lambda[/itex] is a constant *parameter* in the Hamiltonian, on which the energy (and possibly the wavefunction) depends.

You are right, and that's exactly what I said in my first post.

SpectraCat said:

The expression on the RHS in your example is an integral over position (typically). The parameter that you are differentiating with respect to cannot be the variable of integration, nor can it be a function of the variable of integration.

Yes, in an electronic problem, the wavefunction and the hamiltonian depend parametrically on the coordinates of the nuclei.

SpectraCat said:

The Hellman-Feynman theorem is an example of "differentiating under the integral", a mathematical technique that says basically what I described above

No, that's not correct. (At least, how I understand, what you have written). The Hellmann-Feynman-Theorem is not just an application of ''interchanging the order of integration and differentation''.

In the end, the HF-Theorem says, that you can treat the wave function as if it wouldn't depend on the variable you are differentiation to:
[itex]\frac{d}{d\lambda}\int\Psi^*(x;\lambda)H(\lambda) \Psi(x;\lambda)dx=\int\Psi^*(x;\lambda)\frac{d H(\lambda)}{d\lambda}\Psi(x;\lambda)dx[/itex]
where the semicolon indicates the parametrical dependence of Psi on lambda.

SpectraCat said:

...
So your example never arises, because you can't differentiate with respect to position.

That is wrong. Let's say, we are in the Born-Oppenheimer approximation. The electronic energy eigenvalue is [itex]E(\vec R_1 ... \vec R_N)[/itex], it depends parametrically on the coordinates [itex]\vec R_i[/itex] of the N nuclei. [itex]\Psi(\{\vec r_i\} ;\{\vec R_j\})[/itex] shall be the solution the the electronic problem. If [itex]X_1[/itex] is the first component of the vector[itex]\vec R_1[/itex], then the first component of the force acting on nucleus no. 1 is given by:
[itex]\frac{d}{d X_1}E(\vec R_1 ... \vec R_N) = \frac{d}{d X_1}\int\Psi(\{\vec r_i\} ;\{\vec R_j\})H(\{\vec R_i\}) \Psi(\{\vec r_i\} ;\{\vec R_j\})d\vec r_1 ... d\vec r_n=\int\Psi(\{\vec r_i\} ;\{\vec R_j\})\frac{d H(\{\vec R_i\})}{d X_1} \Psi(\{\vec r_i\} ;\{\vec R_j\})d\vec r_1 ... d\vec r_n[/itex]

SpectraCat said:

Furthermore, the LHS is the total energy, not the potential energy, so the gradient (which it is worth noting is NOT the same as a derivative) wouldn't give the force anyway.

Since I was referring to the Born-Oppenheimer approximation, your statement is wrong. The nuclei (better: nucleonic wave function) move(s) in an effective potential given by the total energy of the electrons.To make a long story short, my question was simply:

Why is the force on the i-th nuclei given by [itex]\frac{d}{d \vec{R_i}}E(\vec R_1 ... \vec R_N) [/itex]? ( [itex]\frac{d}{d \vec{R_i}}[/itex] should be read as a gradient). Please keep in mind, that I'm aware of the classical analogy of this.

SpectraCat · Jul 1, 2011

Derivator said:

hi alxm and SpectraCat,

i fear, you got my post totally wrong. I know what the Hellmann-Feynman-Theorem is, and how it works. That's not the point. Unfortunately from knowing the Hellmann-Feynman-Theorem, it doesn't follow, that interatomic forces (to be more precise: forces acting on the nuclei) are given by the derivative of E=<H> with respect to the nuclear coordinates.
You are right, and that's exactly what I said in my first post.Yes, in an electronic problem, the wavefunction and the hamiltonian depend parametrically on the coordinates of the nuclei.

ok .. I didn't read your OP carefully enough, and I did miss the significance of your question. I also didn't notice your PF-name, or I would have assumed your question was coming from a higher level, based on your posts in the Computational forum. Mea culpa.

No, that's not correct. (At least, how I understand, what you have written). The Hellmann-Feynman-Theorem is not just an application of ''interchanging the order of integration and differentation''.

In the end, the HF-Theorem says, that you can treat the wave function as if it wouldn't depend on the variable you are differentiation to:
[itex]\frac{d}{d\lambda}\int\Psi^*(x;\lambda)H(\lambda) \Psi(x;\lambda)dx=\int\Psi^*(x;\lambda)\frac{d H(\lambda)}{d\lambda}\Psi(x;\lambda)dx[/itex]
where the semicolon indicates the parametrical dependence of Psi on lambda.

How is that different from differentiation under the integral? The conditions under which the Hellman-Feynman theorem applies are identical to those under which differentiation under the integral is allowed.

That is wrong. Let's say, we are in the Born-Oppenheimer approximation. The electronic energy eigenvalue is [itex]E(\vec R_1 ... \vec R_N)[/itex], it depends parametrically on the coordinates [itex]\vec R_i[/itex] of the N nuclei. [itex]\Psi(\{\vec r_i\} ;\{\vec R_j\})[/itex] shall be the solution the the electronic problem. If [itex]X_1[/itex] is the first component of the vector[itex]\vec R_1[/itex], then the first component of the force acting on nucleus no. 1 is given by:
[itex]\frac{d}{d X_1}E(\vec R_1 ... \vec R_N) = \frac{d}{d X_1}\int\Psi(\{\vec r_i\} ;\{\vec R_j\})H(\{\vec R_i\}) \Psi(\{\vec r_i\} ;\{\vec R_j\})d\vec r_1 ... d\vec r_n=\int\Psi(\{\vec r_i\} ;\{\vec R_j\})\frac{d H(\{\vec R_i\})}{d X_1} \Psi(\{\vec r_i\} ;\{\vec R_j\})d\vec r_1 ... d\vec r_n[/itex]

Well, what I wrote was not wrong .. you still can't differentiate with respect to the electronic coordinates, which was the context of my comment. I just missed the point that you were referring exclusively to the nuclear coordinates in your post.

Since I was referring to the Born-Oppenheimer approximation, your statement is wrong. The nuclei (better: nucleonic wave function) move(s) in an effective potential given by the total energy of the electrons.

Agreed.

To make a long story short, my question was simply:

Why is the force on the i-th nuclei given by [itex]\frac{d}{d \vec{R_i}}E(\vec R_1 ... \vec R_N) [/itex]? ( [itex]\frac{d}{d \vec{R_i}}[/itex] should be read as a gradient). Please keep in mind, that I'm aware of the classical analogy of this.

I think everything you said is correct, so I guess I don't understand what your question is now. You have basically just derived the expression for the forces on the nuclei in the limit of the BO approximation. So what is unclear exactly? Remember that the electronic wavefunction includes the nuclear-nuclear potential energy already, so the parametric expression for the electronic energy expectation value you give above has that worked in. Therefore, the electronic energy expectation value defines the effective potential in which the nuclei move. Does that answer your question? I am guessing that it doesn't, because you seem to already know most of what I said, but I figured I'd ask anyway.

Derivator · Jul 1, 2011

hi SpectraCat,

thanks for your effort.

Indeed, I fear, my question is a very basic one...

For the sake of brevity let's go in the following to 1 dimension.

I think, we all agree, that the force on a classical particle moving in a potential E(r) is given by [itex]-\frac{dE(r)}{dr}[/itex] (Essentially, this is just a consequence of the defintion of E(r), for conservative forces)

Now, I don't see any reason why the force on a quantum mechanical particle / wave function moving in a potential E(r) is also given by [itex]-\frac{dE(r)}{dr}[/itex]. Is the truth of this buried in theorems like Ehrenfest's theorem?

SpectraCat · Jul 1, 2011

Derivator said:

hi SpectraCat,

thanks for your effort.

Indeed, I fear, my question is a very basic one...

For the sake of brevity let's go in the following to 1 dimension.

I think, we all agree, that the force on a classical particle moving in a potential E(r) is given by [itex]-\frac{dE(r)}{dr}[/itex] (Essentially, this is just a consequence of the defintion of E(r), for conservative forces)

Now, I don't see any reason why the force on a quantum mechanical particle / wave function moving in a potential E(r) is also given by [itex]-\frac{dE(r)}{dr}[/itex]. Is the truth of this buried in theorems like Ehrenfest's theorem?

Ok ... I think I see. Actually, I think it is such a basic part of the BO approximation that we are both overlooking it. As far as I am aware the BO approximation considers the nuclei to be classical particles ... so the answer to your question may be as simple as that. Perhaps a more careful phrasing is appropriate: the de Broglie wavelengths of the nuclei are so much smaller than those of the electrons, that within the context of the electronic part of the problem (which is where the Hellman-Feynman theorem applies in your example), the nuclei can be considered to be classical particles.

However, your point about the Ehrenfest theorem is well-taken .. if you want or need to consider the quantum character of the nuclei, then the LHS of the equation needs to be an expectation value as well, where the integral is over the nuclear coordinates.

jambaugh · Jul 1, 2011

I think the ultimate answer to your question is that a generalized force is defined to be the derivative of the energy w.r.t. the corresponding parameter.

Change in energy is work is (generalized) force times (configurational) displacement.

In so far as this definition of force corresponds to force directing motion we look at the Hamiltonian's role as the time evolution generator.

The derivation is the same in both classical canonical mechanics and QM.

roughly...
[tex] \dot{p_i} =-[H,p_i] = -\frac{dH}{dx^j}[x^j,p_i] = \frac{dH}{dx^j}\delta^i_j = \frac{dH}{dx^i}=F_i[/tex]
(modulo conventions of unit sign and assuming H = H(x,p) )

Note that the Jacobi identity for commutator/Poisson brackets (Lie products) are essentially the Leibniz rule for derivations so obey the "chain rule" in a fashion that extends into operator algebras and operator derivatives.

The relationship between "force", "coordinate", "momentum" and the Hamiltonian is pretty much fixed in the semantics of Lie groups used to describe system kinematics and dynamics.

Derivator · Jul 1, 2011

thank you for your input, guys.

jambaugh said:

roughly...
[tex] \dot{p_i} =-[H,p_i] = -\frac{dH}{dx^j}[x^j,p_i] = \frac{dH}{dx^j}\delta^i_j = \frac{dH}{dx^i}=F_i[/tex]
(modulo conventions of unit sign and assuming H = H(x,p) )

ahh, I see. But don't you define this way a ''force-operator'' [itex]\hat{\frac{dH}{dx^i}}=\hat{F}_i[/itex] and thus would calculate the expectation value of the force as [itex]<\Psi|\hat{\frac{dH}{dx^i}}|\Psi>[/itex] anyway? That is, defining the force as done by you, one would never think of calculating the force as [itex]\frac{d}{dx^i}<\Psi|\hat{H}|\Psi>[/itex] and hence you would never need to use the Hellmann-Feynman-Theorem for calculating forces, because the force would not be given (by definition) by [itex]\frac{d}{dx^i}<\Psi|\hat{H}|\Psi>[/itex].

Edit:
Hmm, what people actually do, they don't calculate the forces on the nuclei as [itex]-\nabla< H>[/itex] or [itex]-<\nabla H>[/itex], if H is the nucleonic Hamiltonian, but they calulate [itex]-\nabla< V>[/itex] if the nucleonic Hamiltonain is given by H = T + V. (in BO-Approximation, thus V is the electronic total energy, parametrically dependent on the nucleonic coordinates)

Edit 2:
A result from Ehrenfest's theorem also says:
[itex]m\frac{d^2}{dt^2} <\hat x> = \frac{d}{dt} <\hat p> = -<\nabla V(x)>[/itex]
So from this point of view, you would also never think of calculating the forces as [itex]-\nabla< V(x)>[/itex] and hence would also never need the Hellmann-Feynman-Theorem...

SpectraCat · Jul 1, 2011

Derivator said:

thank you for your input, guys.

ahh, I see. But don't you define this way a ''force-operator'' [itex]\hat{\frac{dH}{dx^i}}=\hat{F}_i[/itex] and thus would calculate the expectation value of the force as [itex]<\Psi|\hat{\frac{dH}{dx^i}}|\Psi>[/itex] anyway? That is, defining the force as done by you, one would never think of calculating the force as [itex]\frac{d}{dx^i}<\Psi|\hat{H}|\Psi>[/itex] and hence you would never need to use the Hellmann-Feynman-Theorem for calculating forces, because the force would not be given (by definition) by [itex]\frac{d}{dx^i}<\Psi|\hat{H}|\Psi>[/itex].

Edit:
A result from Ehrenfest's theorem also says:
[itex]m\frac{d^2}{dt^2} <\hat x> = \frac{d}{dt} <\hat p> = -<\nabla V(x)>[/itex]
So from this point of view, you would also never think of calculating the forces as [itex]-\nabla< V(x)>[/itex] and hence would also never need the Hellmann-Feynman-Theorem...

Ok .. it looks like you lost the distinction between nuclear and electronic coords again in your last post. For example, wrt your example of nuclear forces within the BO approx., that last expression is only valid for differentiating the electronic expectation value wrt the nuclear coordinates. If you are differentiating the nuclear wavefunction wrt nuclear coordinates, then the nabla needs to be inside the expectation value.

[EDIT: see red text above]

Derivator · Jul 1, 2011

SpectraCat said:

Ok .. it looks like you lost the distinction between nuclear and electronic coords again in your last post. For example, wrt your example of nuclear forces within the BO approx., that last expression is only valid for differentiating the electronic expectation value wrt the nuclear coordinates. If you are differentiating wrt nuclear coordinates, then the nabla needs to be inside the expectation value.

yeah that's true. Eherenfest's theorem is a bad ''example'' since the integration of the expectation value and the derivative are both w.r.t. the same coordinate.

jambaugh · Jul 3, 2011

Derivator said:

[...]ahh, I see. But don't you define this way a ''force-operator'' [itex]\hat{\frac{dH}{dx^i}}=\hat{F}_i[/itex] and thus would calculate the expectation value of the force as [itex]<\Psi|\hat{\frac{dH}{dx^i}}|\Psi>[/itex] anyway?

Getting very technical, the force is in this context a super-operator i.e. it is an operator acting on the operator algebra, not on the Hilbert space itself.

You can let x be an operator instead of a parameter (so that [itex][p,x]=1[/itex] makes sense.) In that context the Hamiltonian is a bit more abstract than an operator, it is an operator valued function of operators (so we can take derivatives) and we should use a different notation for its derivative w.r.t. x.

[itex]\frac{\partial H}{\partial x^i} \to \frac{\Delta H}{\Delta x^i}[/itex]

Simply think of H as a polynomial over a non-commutative ring. Then for example
[itex]H(x,y) = xyx[/itex] has x-derivative [itex]\frac{\Delta H}{\Delta x}[/itex] which is the super-operator mapping an arbitrary operator via: [itex]z\mapsto zyx + xyz[/itex] (replacing one x with z everywhere it occurs and summing the result).

Note that once you allow the variables to commute this is just multiplication by [itex]2xy[/itex] the "super-operator'' is just multiplication by the the "operator" which we usually think of as the derivative. The problem in the non-commutative case is that we can't always write that super-operator as simply left (or right) multiplication by an operator but rather a combination of both. Fundamentally these derivatives reside in a bigger space than the operator algebra. Note that we can write Heisenberg's equation as:
[tex]\frac{d}{dt} = \frac{i}{\hbar}\Delta H[/tex]
where [itex]\Delta H[/itex] is the commutator super-operator mapping operators via:[itex]\Delta H: G\mapsto [H,G][/itex].

Anyway, these operator derivatives for polynomials can then be extended to convergent limits of polynomials i.e. analytic functions. You can then define a "force" as an operator derivative in the case where coordinates are operators. This however is a bad way of doing it IMNSHO. Especially in relativistic mechanics coordinates become parameters rather than observables.

In the context of this thread you work with parameters instead of operators then the trouble is with the "commutators" [x,p] in my "derivation". I was using that as a shortcut which glosses over a great deal of mathematical theory.
Let's see...

How I would say it is that you expand the Hamiltonian in terms of some basis of of the operator algebra:
[tex]H = \omega_k A^k[/tex]
where the A's are the operators and the omegas are the parametric velocities both of which may depend on the parameters [itex]\theta_k[/itex].

One then extends from the operator algebra of the quantum theory into the operator algebra plus the differential algebra of the parameter calculus and define the extended Hamiltonian:
[tex]\eta \doteq i\hbar \frac{d}{dt} + \Delta H[/tex]
([itex]\Delta H: A\mapsto [H,A][/itex] is again a super-operator specifically "take the comutator with H of".)
The Heisenberg equation becomes the constraint:[itex]\eta = 0[/itex]
(Note general covariance typically leads to a zero extended Hamiltonian, the dotted equality reads "is constrained to be equal to")

All this technical business then allows us to write for arbitrary operator B (depending on the parameters):
[itex]\eta B(\theta)= i\hbar \frac{d}{dt}B +[H,B]\doteq 0 [/itex]

or[itex] \frac{d}{dt}{B} \doteq \frac{i}{\hbar} \omega_k [A^k,B][/itex]

but since [itex]\frac{dB}{dt} = \dot{\theta}_k \frac{\partial B}{\theta_k}= \omega_k \frac{\partial B}{\theta_k}[/itex]

one has:[itex] \frac{\partial}{\partial \theta_k}B \doteq \frac{i}{\hbar}[A^k,B] = \frac{i}{\hbar}\Delta A^k[/itex] for any operator B!
(again [itex]\Delta A: B\mapsto [A,B][/itex] is the commutator super-operator.)
So
[tex] \frac{\partial}{\partial \theta_k}\doteq \frac{i}{\hbar}\Delta A^k[/tex]
We view the [itex]A^k[/itex] as generalized momenta canonically dual to the parameters [itex]\theta_k[/itex].

Then the "force" equations become...wait for it...here is the real punch line...

[tex]\frac{d}{dt}A^k = \frac{i}{\hbar}[H,A^k] = -\frac{i}{\hbar}[A^k,H] \doteq -\frac{\partial}{\theta^k} H[/tex]
And applying the Hellmann-Feynman theorem we have:
[tex]\frac{d}{dt}\langle A^k\rangle \doteq -\frac{\partial}{\theta^k}\langle H\rangle = -\frac{\partial}{\theta_k} E[/tex]
In summary, when we constrain our parameters to evolve over time according to the dynamics of the system, and assuming the Hamiltonian is the generator of time evolution, then the forces = the rate of change of the generators of parameter evolution (momenta) must equal the negative of the parameter derivatives of the Hamiltonian.

The above is very general. In practice the Hamiltonian is some function of the usual momenta [itex]p^\mu[/itex] plus some other operators but you can choose a basis of the operator algebra which includes the usual momenta [itex]\{A^k\} = \{p^\mu\}\cup\{C^1,C^2,\cdots\}[/itex]. We of course use the usual coordinates [itex]x_\mu[/itex] as parameters dual to the usual momenta.

Thus if your Hamiltonian is some function of the momentum operators, it will --upon expansion in the basis-- take the form:
[tex]H(p)=\dot{x}_\mu p^\mu + o.t.[/tex]
where o.t. means other terms linearly independent from the momenta but which still depend on the x's. Those terms may arise from quadratic and higher order terms of the momenta as well as other components of the Hamiltonian. Especially note that in relativistic mechanics the mass operator [itex] M=g_{\mu\nu}p^\mu p^\nu[/itex] which for a single particle is must it's mass times the identity operator is only proportional to the identity when we impose the dynamic constraints:
[tex]M\doteq m\mathbf{1}[/tex]
We can still take parametric derivatives of this operator "off shell" and get non-trivial results, then impose the dynamic constraint.

What is happening is that we are introducing a gauge extension via the parameter algebra and then imposing the dynamics as a gauge constraint (and dynamic evolution is the unfolding of a gauge transformation). This allows us to talk about the classical c-numbers and quantum system variables in the same context. This is done implicitly in all the standard QM where we "choose a picture" and introduce parameters.

Understanding the Hellman-Feynman Theorem

FAQ: Understanding the Hellman-Feynman Theorem

What is the Hellman-Feynman Theorem?

Who developed the Hellman-Feynman Theorem?

What is the significance of the Hellman-Feynman Theorem?

How is the Hellman-Feynman Theorem applied in practical situations?

What are the limitations of the Hellman-Feynman Theorem?

Similar threads

Hot Threads

Recent Insights