- #1
Master1022
- 611
- 117
- Homework Statement
- Prove the chain rule for mutual information
- Relevant Equations
- Entropy and mutual information
Hi,
I was attempting the following problem and am a bit confused because I don't quite understand the structure/notation of how mutual information and entropy work when we have conditionals (a|b) and multiple events (a, b).
Question: Prove the chain rule for mutual information.
[tex] I(X_1, X_2, ... , X_n ; Y) = \sum_{j=1}^{n} I(X_j; Y|X_{j-1}, ..., X_1) [/tex]
Attempt:
I started by doing:
[tex] I(X_1, X_2, ... , X_n ; Y) = H(X_1, X_2, ..., X_n) - H(X_1, X_2, ... , X_n | Y) [/tex]
And I know the expression on the right can be expanded using the chain rule for entropy which is [itex] H(X_1, X_2, ..., X_n) = \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1) [/itex], and thus the expression becomes:
[tex] I(X_1, X_2, ... , X_n ; Y) = \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1) - H(X_1, X_2, ... , X_n | Y) [/tex]
Now I am thinking about how to deal with the second term. I think I should be able to use the chain rule for entropy as well, then then combine both sums to get something useful. However, I am stuck because I am confused by the presence of the condition and multiple things before the condition bar... Nonetheless, if I think about it from a Venn diagram standpoint then I think I get to something like:
[tex] H(X_1, X_2, ... , X_n | Y) = H(X_n | Y, X_{n - 1}, ... , X_1) \cdot H(X_{n - 1}, ... , X_1) = H(X_n | Y, X_{n - 1}, ... , X_1) \prod_{j = 1}^{n-1} H(X_j | X_{j - 1}, ... , X_1) [/tex]
However, I am not sure whether this is correct. Any help would be greatly appreciated.
I was attempting the following problem and am a bit confused because I don't quite understand the structure/notation of how mutual information and entropy work when we have conditionals (a|b) and multiple events (a, b).
Question: Prove the chain rule for mutual information.
[tex] I(X_1, X_2, ... , X_n ; Y) = \sum_{j=1}^{n} I(X_j; Y|X_{j-1}, ..., X_1) [/tex]
Attempt:
I started by doing:
[tex] I(X_1, X_2, ... , X_n ; Y) = H(X_1, X_2, ..., X_n) - H(X_1, X_2, ... , X_n | Y) [/tex]
And I know the expression on the right can be expanded using the chain rule for entropy which is [itex] H(X_1, X_2, ..., X_n) = \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1) [/itex], and thus the expression becomes:
[tex] I(X_1, X_2, ... , X_n ; Y) = \sum_{j=1}^{n} H(X_j | X_{j-1}, ... , X_1) - H(X_1, X_2, ... , X_n | Y) [/tex]
Now I am thinking about how to deal with the second term. I think I should be able to use the chain rule for entropy as well, then then combine both sums to get something useful. However, I am stuck because I am confused by the presence of the condition and multiple things before the condition bar... Nonetheless, if I think about it from a Venn diagram standpoint then I think I get to something like:
[tex] H(X_1, X_2, ... , X_n | Y) = H(X_n | Y, X_{n - 1}, ... , X_1) \cdot H(X_{n - 1}, ... , X_1) = H(X_n | Y, X_{n - 1}, ... , X_1) \prod_{j = 1}^{n-1} H(X_j | X_{j - 1}, ... , X_1) [/tex]
However, I am not sure whether this is correct. Any help would be greatly appreciated.