Huffman coding is currently being replaced with ANS coding

  • Thread starter jarekduda
  • Start date
  • Tags
    Coding
In summary: This equivalence is verified experimentally.In summary, zstd is a fast but inaccurate coding system that was recently replaced with ANS coding. ANS coding is both fast and accurate, and provides a huge speedup over traditional compression methods.
  • #1
jarekduda
82
5
Huffman coding (e.g. A -> 010) is fast but approximates probabilities with powers of 1/2, while e.g. symbol of probability 0.99 carries only ~0.014 bits of information. This inaccuracy was repaired by arithmetic coding, but it is an order magnitude slower (more costly).
Above compromise has been recently ended with ANS coding, which is both fast(cheap) and accurate:
wiki: https://en.wikipedia.org/wiki/Asymmetric_Numeral_Systems
benchmarks: https://sites.google.com/site/powturbo/entropy-coder

It is used in compressors since 2014, like the currently default Apple LZFSE or great and free Facebook Zstd - it is 2-5x faster than gzip and provides much better compression:
https://github.com/facebook/zstd/
7-zip version: https://mcmilk.de/projects/7-Zip-zstd/
https://dl.dropboxusercontent.com/u/12405967/ZSTD.png
 
Last edited by a moderator:
  • Like
Likes Filip Larsen
Technology news on Phys.org
  • #2
Thanks for sharing. Are you using this coding in a project? What benefits are you seeing?
 
  • #3
Benefits from zstd itself are huge: 2-5x faster compression than standard gzip and much better compression ratio - it has a real chance to finally replace the ancient gzip.
For example the Guardian is already officially using it: https://www.theguardian.com/info/de...-compression-iinovations-brotli-and-zstandard
And the family of such new generation compressors (ANS-based) is quickly growing, most are free and open-source:
http://encode.ru/threads/2078-List-of-Asymmetric-Numeral-Systems-implementations

I have experience with ANS coding, the basic concept is to encode information into a single natural number x, what is much more convenient for implementations than the standard arithmetic coding: encoding information into a range ( https://en.wikipedia.org/wiki/Arithmetic_coding ).
In numbers, in 2013 state-of-art arithmetic decoder processed ~50MB/s/core, now analogous task is made by ANS with ~1500MB/s ... we got ~30x implementation-only speedup for the basic transformation of information: the bottleneck of data compressors, and compression in modern world is practically default for all data.
benchmarks: https://sites.google.com/site/powturbo/entropy-coder
fastest open-source implementation: https://github.com/jkbonfield/rans_static

ANS uses philosophy that a natural number x contains lg(x) bits of information, what is equivalent to the Zipf law: Pr(x) ~ 1/x.
Symbol of probability p contains lg(1/p) bits of information, so adding this information to information already stored in x makes we should go to number (state) x' such that
lg(x') ~ lg(x) + lg(1/p)
or equivalently: x' ~ x/p.
Like x' = 2x + s in standard binary system, which is optimal for p = 1/2.
 

Related to Huffman coding is currently being replaced with ANS coding

What is Huffman coding?

Huffman coding is a data compression algorithm that assigns shorter codes to frequently used symbols in a data set in order to reduce the overall size of the data.

What is ANS coding?

ANS coding, or Asymmetric Numeral Systems coding, is a more efficient data compression algorithm that uses a variable-length code to represent data and has been shown to outperform Huffman coding in terms of compression ratio.

Why is Huffman coding being replaced with ANS coding?

ANS coding is being considered as a replacement for Huffman coding because it has been shown to be more efficient in terms of compression ratio. This means that it can reduce the size of data files more effectively than Huffman coding.

Is ANS coding completely replacing Huffman coding?

While ANS coding has shown to be more efficient than Huffman coding, it is not yet being completely replaced. Huffman coding is still used in some applications where the data is highly structured and can be compressed more effectively with this method.

Are there any drawbacks to using ANS coding?

One potential drawback of ANS coding is that it requires more computational resources compared to Huffman coding. This means that it may not be suitable for all applications, especially those with limited computing power.

Back
Top