Main page
   Data encoding
   Communications channel
   Protocols
   Errors
   Error correction

Huffman codes

A Huffman code is an optimal uniquely decodable (UD) code. This means two things:

1) You can decode a message by looking at each incoming character in sequence. At each step either the characters you have seen do not yet correspond to any letter, or they correspond to exactly one.

2) The encoded message is the shortest possible one among all encoding schemes that have property

Binary trees can be used in an interesting way to construct minimal length encodings for messages when the frequency of letters used in the messages is known. A special kind of binary tree, called a Huffman coding tree is used to accomplish this.

The idea behind this code is evident in the table: characters that appear more frequently are assigned shorter codes. In order for a string to be uniquely decodable, there must be no ambiguity in the decoding. Thus, the code for one character must never be the start of a code for another. For instance, assigning "000" to the space character meant that no other code could begin with "000", as you can verify.

Such a code is called a prefix code. It corresponds to a tree. The tree describes the sequence of choices a decoder makes as it processes an encoded message.

The tree for a binary Huffman code is constructed in a straightforward way. The letters with the two smallest frequencies are located, combined as leaves at a node, and then that node is treated as if it were a letter having the combined frequencies of its two leaves. The process is repeated until all letters have been put into the tree.

This algorithm requires one to find the smallest frequency in a potentially long list, to do that repeatedly, and to insert each combined frequency back into the list. An efficient data structure for accomplishing this is a priority queue, which is easily implemented with arrays or lists.

The biggest problem in implementing a Huffman code concerns the frequencies. When you are encoding a single message over and over again, such as a copyright, that's simple. But what if you are putting a long message into a bunch of features? Or embedding data into the features themselves. And what do you do about the frequencies when decoding: you don't have the characters available to count! In that case, a good solution is to use frequencies typical of the messages you expect to write. To this end, is good implemented routines for analyzing a set of text files for their character frequencies. The results are recorded in a special frequency file that the Huffman code creator can use both for reading and writing hidden messages. This frequency file has to be referenced by the user whenever a Huffman code is to be used.


   back

Interactive map; Free Web Hosting - Free PHP Hosting, MYSQL, FTP, Ad-Free and free hosting from Doxfor; canada interactive map; uk job; job in Bristol; Belfast job; jobs Liverpool; jobs Glasgow; jobs Kent