- #1
pawelch
- 11
- 0
Hi all,
I am trying to devise a mathematical model for my project I am working at. Description is as follows:
we have a sample space
[tex]
\Omega=\{w_1,w_2,\cdots, w_N\}
[/tex]
It is very large. Suppose further, that we have some assumption of frequency of occurrence of each [itex] w_i[/itex] , stored in probability vector [itex]\pi[/itex] .
In general, suppose we observe occurrences of [itex] w_i[/itex] , stored as a sequence [itex] \{S\}[/itex] . Now, we would like to compare [itex] \{S\} [/itex] against [itex] \pi[/itex] .
One of the solution would be application of Kullback–Leibler divergence. However, the problem is that [itex] |S| < |\Omega| [/itex] (by [itex] || [/itex] I mean cardinality) and as a result, we will observe that for some [itex] w_i \in \Omega [/itex] , we have corresponding [itex] w_s \in S [/itex] that are 0 (it stems from the fact that [itex] \{S\} [/itex] has not managed to explore [itex] \Omega [/itex] throughout). In this case we have undefined element [itex] 0\cdot \log \frac{0}{s_i} = \infty[/itex] .
In principal, I generate multiple [itex] \{S\} [/itex] that are of two types, say, [itex] \{1,0\}[/itex] . The underlying concept for my project is that [itex] \{S\}_1 [/itex] of the first type will hit [itex] w_i [/itex] that posses high frequency in [itex] \Omega[/itex] , on the contrary [itex] \{S\}_2 [/itex] of the second type would generate [itex] w_i [/itex] that have low frequency. And, it is unlikely that [itex] |S| = |\Omega|[/itex]
Thus, again, I thought I would compare [itex] \{S\}_1 [/itex] against [itex] \pi [/itex] and then [itex] \{S\}_2 [/itex] against [itex] \pi [/itex] and observe the differences. But, because of the assumption of [itex] 0\cdot \log \frac{0}{s_i} = \infty[/itex] , I could not get it right. Thus, I thought that maybe I "normalise" (i.e. shrink) [itex] \Omega[/itex], so that the new [itex] \Omega [/itex] contains only elements that have occurred in [itex] \{S\}[/itex]. but I have been told it is not a good idea either.
So hmm.. well, the question is how should I compare both types [itex] \{S\}_1 [/itex] and [itex] \{S\}_2 [/itex] against [itex] \pi [/itex] if their are of different length ?
Thank you for any suggestions, and accept my apology for poor mathematical language of this description,
cheers!
I am trying to devise a mathematical model for my project I am working at. Description is as follows:
we have a sample space
[tex]
\Omega=\{w_1,w_2,\cdots, w_N\}
[/tex]
It is very large. Suppose further, that we have some assumption of frequency of occurrence of each [itex] w_i[/itex] , stored in probability vector [itex]\pi[/itex] .
In general, suppose we observe occurrences of [itex] w_i[/itex] , stored as a sequence [itex] \{S\}[/itex] . Now, we would like to compare [itex] \{S\} [/itex] against [itex] \pi[/itex] .
One of the solution would be application of Kullback–Leibler divergence. However, the problem is that [itex] |S| < |\Omega| [/itex] (by [itex] || [/itex] I mean cardinality) and as a result, we will observe that for some [itex] w_i \in \Omega [/itex] , we have corresponding [itex] w_s \in S [/itex] that are 0 (it stems from the fact that [itex] \{S\} [/itex] has not managed to explore [itex] \Omega [/itex] throughout). In this case we have undefined element [itex] 0\cdot \log \frac{0}{s_i} = \infty[/itex] .
In principal, I generate multiple [itex] \{S\} [/itex] that are of two types, say, [itex] \{1,0\}[/itex] . The underlying concept for my project is that [itex] \{S\}_1 [/itex] of the first type will hit [itex] w_i [/itex] that posses high frequency in [itex] \Omega[/itex] , on the contrary [itex] \{S\}_2 [/itex] of the second type would generate [itex] w_i [/itex] that have low frequency. And, it is unlikely that [itex] |S| = |\Omega|[/itex]
Thus, again, I thought I would compare [itex] \{S\}_1 [/itex] against [itex] \pi [/itex] and then [itex] \{S\}_2 [/itex] against [itex] \pi [/itex] and observe the differences. But, because of the assumption of [itex] 0\cdot \log \frac{0}{s_i} = \infty[/itex] , I could not get it right. Thus, I thought that maybe I "normalise" (i.e. shrink) [itex] \Omega[/itex], so that the new [itex] \Omega [/itex] contains only elements that have occurred in [itex] \{S\}[/itex]. but I have been told it is not a good idea either.
So hmm.. well, the question is how should I compare both types [itex] \{S\}_1 [/itex] and [itex] \{S\}_2 [/itex] against [itex] \pi [/itex] if their are of different length ?
Thank you for any suggestions, and accept my apology for poor mathematical language of this description,
cheers!