Limit of compression (source coding theorem)

AI Thread Summary
The source coding theorem establishes that encoding a message of length N with entropy H requires at least N*H bits, representing a theoretical limit on data compression. This limit primarily applies when only the frequency of symbols is known, without considering conditional probabilities. If certain symbols are likely to follow others, as in the case where symbol x is often followed by y, this can influence the entropy and potentially allow for more efficient compression. However, the discussion clarifies that if conditional probabilities are involved, a different model and method for calculating the entropy rate must be used. Understanding these nuances is crucial for developing effective compression algorithms.
gop
Messages
55
Reaction score
0
Hi

The source coding theorem says that one needs at least N*H bits to encode a message of length N and entropy H. This supposedly is the theoretical limit of data compression.

But is it? Or does it only apply to situations where only the frequency (probability) of a given symbol is known?

For example, If I know that the symbol x is always followed by the symbol y (or that this happens with a very high probability) couldn't I use this to construct a compression algorithm that needs fewer bits than N*H?

thx
 
Mathematics news on Phys.org
gop said:
For example, If I know that the symbol x is always followed by the symbol y (or that this happens with a very high probability) couldn't I use this to construct a compression algorithm that needs fewer bits than N*H?

If x is (almost) always followed by y, then that lowers the entropy. It's already taken into account.
 
But if I have something like this

p(x,y,z)= (1/3,1/3,1/3)

then I have entropy 1.585. But now I could have p(y|x) = 1/2 or p(y|x) = 1 as long as p(y|z)=1-p(y|x) the overall probability p(x,y,z) stays the same. So I have the same entropy. But in the case of p(y|x)=1 I can only use two symbols say p(xy,z) = (1/2,1/2) which has entropy 1.

I guess I'm missing something obvious here but...

thx
 
gop said:
I guess I'm missing something obvious here but...

Yes, your formula doesn't apply unless the choices are made independently.
 
ok I got it if I use conditional probabilities I need to use another model and another way to compute the entropy rate.

thx
 
Seemingly by some mathematical coincidence, a hexagon of sides 2,2,7,7, 11, and 11 can be inscribed in a circle of radius 7. The other day I saw a math problem on line, which they said came from a Polish Olympiad, where you compute the length x of the 3rd side which is the same as the radius, so that the sides of length 2,x, and 11 are inscribed on the arc of a semi-circle. The law of cosines applied twice gives the answer for x of exactly 7, but the arithmetic is so complex that the...
Is it possible to arrange six pencils such that each one touches the other five? If so, how? This is an adaption of a Martin Gardner puzzle only I changed it from cigarettes to pencils and left out the clues because PF folks don’t need clues. From the book “My Best Mathematical and Logic Puzzles”. Dover, 1994.
Back
Top