Calculus of Variations on Kullback-Liebler Divergence

Master1022 · Oct 26, 2021

Hi,

This isn't a homework question, but a side task given in a machine learning class I am taking.

Question: Using variational calculus, prove that one can minimize the KL-divergence by choosing ##q## to be equal to ##p##, given a fixed ##p##.

Attempt:

Unfortunately, I have never seen calculus of variations (it was suggested that we teach ourselves). I have been trying to watch some videos online, but I mainly just see references to Euler-Lagrange equations which I don't think are of much relevance here (please correct me if I am wrong) and not much explanation of the functional derivatives.

Nonetheless, I think this shouldn't be too hard, but am struggling to understand how to use the tools.

If we start with the definition of the KL-divergence we get:
[tex] \text{KL}[p||q] = \int p(x) log(\frac{p(x)}{q(x)}) dx = I [/tex]

Would it be possible for anyone to help me get started on the path? I am not sure how to proceed really after I write down ## \frac{\delta I}{\delta q} ##?

Thanks in advance

Office_Shredder · Oct 26, 2021

Euler Lagrange is what you want, but you also have to worry about the conditions that you have on q that come from it being a probability distribution, namely that the integral is 1 and it's always nonnegative. I think the integral constraint is the important part

http://liberzon.csl.illinois.edu/teaching/cvoc/node38.html

Has some notes on how to add constraints to the euler Lagrange equations.

RuiGao · May 12, 2022

Maybe you should start with fixed p, and then try to get optimal q to maximize or minimize KL.

RuiGao · May 13, 2022

RuiGao said:

Maybe you should start with fixed p, and then try to get optimal q to maximize or minimize KL.

https://math.stackexchange.com/ques...11#3319311?s=fc62320956c049a8a77e1a9666c97b91

Calculus of Variations on Kullback-Liebler Divergence

FAQ: Calculus of Variations on Kullback-Liebler Divergence

What is the Kullback-Liebler Divergence?

How is the Kullback-Liebler Divergence used in Calculus of Variations?

What is the relationship between KL Divergence and Information Theory?

Can KL Divergence be used for continuous distributions?

What are some applications of Calculus of Variations on KL Divergence?

Similar threads

Hot Threads

Recent Insights