- #1
Paul Uszak
- 84
- 7
I'm trying to measure now much non redundant (actual) information my file contains. Some call this the amount of entropy.
Of course there is the standard p(x) log{p(x)}, but I think that Shannon was only considering it from the point of view of transmitting though a channel. Hence the formula requires a block size (say in bits, 8 typically). For a large file, this calculation is fairly useless, ignoring short to long distance correlations between symbols.
There are binary tree and Ziv-Lempel methods, but these seem highly academic in nature.
Compressibility is also regarded as a measure of entropy, but there seems to be no lower limit as to the degree of compression. For my file hiss.wav,
- original hiss.wav = 5.2 MB
- entropy via the Shannon formula = 4.6 MB
- hiss.zip = 4.6 MB
- hiss.7z = 4.2 MB
- hiss.wav.fp8 = 3.3 MB
Is there some reasonably practicable method of measuring how much entropy exists within hiss.wav?
Of course there is the standard p(x) log{p(x)}, but I think that Shannon was only considering it from the point of view of transmitting though a channel. Hence the formula requires a block size (say in bits, 8 typically). For a large file, this calculation is fairly useless, ignoring short to long distance correlations between symbols.
There are binary tree and Ziv-Lempel methods, but these seem highly academic in nature.
Compressibility is also regarded as a measure of entropy, but there seems to be no lower limit as to the degree of compression. For my file hiss.wav,
- original hiss.wav = 5.2 MB
- entropy via the Shannon formula = 4.6 MB
- hiss.zip = 4.6 MB
- hiss.7z = 4.2 MB
- hiss.wav.fp8 = 3.3 MB
Is there some reasonably practicable method of measuring how much entropy exists within hiss.wav?