How Can I Efficiently Build a Huffman Tree for Huffman Encoding?

In summary, the conversation discusses the difficulties and solutions of implementing Huffman encoding. The person has trouble efficiently building the tree and initially encounters problems with the large number of initial factors. Through trial and error, they discover that they were not properly grouping values after adding their ranks, resulting in a faster encoding time.
  • #1
ktoz
171
12
Hi

I'm playing around with Huffman encoding and am having a major problem getting my head around how to efficiently build the tree. I Googled "huffman coding" and have a basic grasp of what needs to be done but the actual process of walking through a list of sorted values has turned into a big headache.

Assuming I start with a list of factored values and their rank, in either descending (or ascending) order, the algorithm I came up with takes almost 3 minutes on a dual core 2.16 ghz Macintosh. This is clearly ridiculous as it takes roughly half a second for the rest of my code to read the file and create the factored list of values and their frequencies.

I tried writing to the model http://en.wikipedia.org/wiki/Image:Huffman_tree_2.svg" but due to the large number of initial factors (3000+) I ran into a problem where the sum of the ranks grew so large over subsequent iterations that it didn't even fit in an unsigned long long value without truncation.

Here's a snippet showing a section of the sorted, pre-Huffman data I'm starting with

Code:
87 = ({rank = 87; word = hate; }, {rank = 87; word = GONZALEZ; }); 
88 = ({rank = 88; word = one; }); 
89 = ({rank = 89; word = mainstream; }, {rank = 89; word = "don't"; }); 
90 = (
	{rank = 90; word = like; }, 
	{rank = 90; word = "it's"; }, 
	{rank = 90; word = his; }
); 
91 = ({rank = 91; word = he; }, {rank = 91; word = all; }); 
92 = ({rank = 92; word = legislation; }); 
93 = (
	{rank = 93; word = by; }, 
	{rank = 93; word = more; }, 
	{rank = 93; word = had; }, 
	{rank = 93; word = report; }, 
	{rank = 93; word = at; }
); 
94 = ({rank = 94; word = being; }); 
95 = ({rank = 95; word = them; }); 
96 = ({rank = 96; word = right; }, {rank = 96; word = crimes; });

What this shows is an associative array sorted and grouped by rank of each word in the source file. For example, under the key "93" we see that the five words "by," "more," "had," "report," and "at" all have the same frequency in the document

From this, could someone help me out with a step by step on how to create the huffman tree?

Thanks for any help

Ken
 
Last edited by a moderator:
Technology news on Phys.org
  • #2
Never mind. I figured it out

Turns out I wasn't properly grouping values after adding their ranks. I was just changing their rank without combining symbol pairs into a new node.

Got the total encoding time down to .5 seconds for a 90K file. 5 seconds for a 10 MB file etc... Much better.
 
Last edited:
  • #3


Hi Ken,

Building a Huffman tree can definitely be a tricky process, especially when dealing with a large number of initial factors. Here are some steps that may help guide you:

1. Start by creating a node for each unique word in the source file. These nodes will initially have a frequency of 1.

2. Sort these nodes in ascending order by frequency.

3. Take the two nodes with the lowest frequencies and combine them into a single node. This new node will have a frequency equal to the sum of the two nodes' frequencies, and its left and right children will be the two original nodes.

4. Remove the two original nodes from the list and add the new node.

5. Repeat this process until there is only one node left in the list. This node will be the root of your Huffman tree.

6. To assign codes to each word, start at the root of the tree and assign a 0 to the left child and a 1 to the right child. Continue this process for each node until you reach a leaf node, which will represent a word in the source file.

7. The code for a word will be the series of 0s and 1s that you encounter as you traverse from the root to the leaf node.

I hope this helps! Let me know if you have any further questions or if you need clarification on any of the steps. Good luck with your project!
 

FAQ: How Can I Efficiently Build a Huffman Tree for Huffman Encoding?

What is a huffman tree and why is it important in data compression?

A huffman tree is a type of binary tree used in data compression to efficiently store and transmit data. It is important because it assigns shorter codes to frequently used characters, resulting in a more compact representation of data.

How do you build a huffman tree?

To build a huffman tree, you first need to determine the frequency of each character in the data. Then, you sort the characters in ascending order based on their frequencies. Next, you combine the two least frequent characters into a single node and assign it a weight equal to the sum of their frequencies. This process is repeated until all characters are combined into a single tree.

What is the time complexity of building a huffman tree?

The time complexity of building a huffman tree is O(nlog(n)), where n is the number of characters in the data. This is because the sorting process takes O(nlog(n)) time and the tree construction takes O(n) time.

Can a huffman tree be used for both compression and decompression?

Yes, a huffman tree can be used for both compression and decompression. When compressing data, the huffman tree is used to convert characters into shorter codes. When decompressing data, the huffman tree is used to convert the codes back into characters.

Are there any limitations to using a huffman tree for data compression?

One limitation of using a huffman tree for data compression is that it can only compress data that has repeating patterns or characters. If the data is already compressed or contains random characters, a huffman tree may not result in significant compression. Additionally, huffman trees can become less efficient if the character frequencies change over time.

Similar threads

Replies
1
Views
2K
Replies
2
Views
1K
Replies
3
Views
2K
Replies
18
Views
12K
Replies
6
Views
3K
Replies
12
Views
2K
Back
Top