Mutual information. concave/convex

  • Thread starter PetitPrince
  • Start date
  • Tags
    Information
In summary, the mutual information between two random variables X and Y can be expressed as a function of the relative entropy D(p||q), where p and q are two different probability distributions. The proof of the convexity of I(X,Y) as a function of p(x|y) relies on the convexity of D(p||q) and its unique minimum point.
  • #1
PetitPrince
1
0
hi everybody, :smile:

while looking on the mutual information of two variables, one find that it is concave of p(x) given p(x|y) and convex of p(x|y) given p(x).

the first statement is okey, but when it comes to proving the second, i get stuck, even when i find proofs already done i didn't get how they can conclude the convexity of I(x,y) as a function of p(x|y) from the convexity of the relative entropy D(p||q).

here is a piece of the proof i didnt understand
http://ocw.usu.edu/Electrical_and_Computer_Engineering/Information_Theory/lecture3.pdf

if you have any idea, i'd very much appreciate it.

thank you in advance.
 
Physics news on Phys.org
  • #2


Hello there,

It is great to see that you are actively engaging with the concept of mutual information and its properties. The proof you have shared is indeed quite complex and requires a solid understanding of information theory and convex optimization. Let me try to break it down for you in simpler terms.

First, let's define some terms. The mutual information between two random variables X and Y is denoted by I(X,Y) and is defined as the difference between the joint entropy of X and Y and the sum of their individual entropies, as follows:

I(X,Y) = H(X) + H(Y) - H(X,Y)

Where H(X) and H(Y) are the entropies of X and Y respectively, and H(X,Y) is the joint entropy of X and Y.

Now, the relative entropy between two probability distributions p and q is denoted by D(p||q) and is defined as follows:

D(p||q) = E[p(x) log(p(x)/q(x))]

Where E[.] denotes the expectation operator.

Now, let's look at the proof you shared. The main idea behind the proof is that the mutual information I(X,Y) can be written as a function of the relative entropy D(p||q), where p and q are two different probability distributions. In other words, we can express I(X,Y) as a function of p(x|y) and p(x), which are two different probability distributions for the random variable X.

Now, the relative entropy D(p||q) is a convex function of p(x|y) given p(x), which means that it has a unique minimum point. This minimum point can be found by setting the derivative of D(p||q) with respect to p(x|y) equal to zero. This minimum point can then be used to find the minimum value of I(X,Y).

I hope this explanation helps you understand the proof better. If you have any further questions, please do not hesitate to ask. Keep up the good work!
 

FAQ: Mutual information. concave/convex

1. What is mutual information and why is it important in science?

Mutual information is a measure of how much information is shared between two random variables. It is important in science because it can help determine the relationship and dependency between variables, and can be used for feature selection and data compression.

2. How is mutual information calculated?

Mutual information is calculated using the joint probability distribution of two variables and the individual probability distributions of each variable. It is calculated as the sum of the products of each event's probability and the logarithm of the ratio of the joint probability to the product of the individual probabilities.

3. What does it mean for mutual information to be concave or convex?

Concave and convex refer to the shape of the mutual information curve. A concave mutual information curve indicates a negative relationship between the variables, while a convex curve indicates a positive relationship. The shape of the curve can provide insight into the strength and direction of the relationship between variables.

4. How is mutual information used in machine learning?

Mutual information is often used in machine learning for feature selection, as it can help identify the most relevant and informative features for a given dataset. It can also be used as a similarity measure between two datasets, which can be helpful in tasks such as clustering and classification.

5. Are there any limitations to using mutual information?

One limitation of mutual information is that it assumes a linear relationship between variables and may not capture non-linear dependencies. It also does not account for causal relationships between variables. Additionally, mutual information may not be suitable for high-dimensional data and can be sensitive to noise and outliers.

Back
Top