A linear-time algorithm for constructing Huffman codes from a list of symbols ordered by probability. This method first appeared in an obscure conference that I have now lost the reference to. Andre Trudel, a PhD student at Waterloo, independently discovered it several years ago but we didn't bother publishing it because of the obscure prior reference. I suppose this was a mistake and I'll generate a tech. report about it. If anyone knows the reference, I'd appreciate hearing about it. In any event, here is our statement of the algorithm (copyright 1995, Cormack & Trudel): Input: list of in ascending order by priority Intermediate data structure: queue of Algorithm: while input remains or sizeof(queue) > 1 loop select pair of elements from input and/or queue with least probability (remove these elements from their respective data structures) build new treenode with the two above elements as children, and with probability equal to the sum of the children's add new treenode to queue end loop the single element in the queue is the root of the Huffman tree Key observations: - tree nodes are generated in increasing order of probability, therefore the queue, although implemented as a simple linear queue with constant- time operations, is a priority queue - selecting the pair of elements involves inspecting both the input and the queue. This requires constant time. Permission is granted to reproduce this text provided the author's name and address are cited: Gordon Cormack Department of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1, Canada gvcormac@plg.uwaterloo.ca http://plg.uwaterloo.ca/~gvcormac From cs.mu.OZ.AU!alistair Tue Apr 4 19:50:10 1995 Received: from mulga.cs.mu.OZ.AU ([128.250.1.22]) by plg.uwaterloo.ca with SMTP id <23548>; Tue, 4 Apr 1995 19:50:03 -0400 Received: from muloora.cs.mu.OZ.AU by mulga.cs.mu.OZ.AU with SMTP (5.83--+1.3.1+0.50); id AA23353 Wed, 5 Apr 1995 09:49:19 +1000 (from alistair@cs.mu.OZ.AU) Message-Id: <9504042349.23353@mulga.cs.mu.OZ.AU> To: gvcormac@plg.uwaterloo.ca Subject: Linear time Huffman codes Date: Tue, 4 Apr 1995 19:49:18 -0400 From: Alistair Moffat Status: RO Gordon, Someone told me that you were interested in linear-time generation of Huffman codes for sorted input probability distributions. As far as I know, the earliest description of this idea is in @inproceedings{lee76, author = "J. Van {Leeuwen}", title = "On the construction of {H}uffman trees", booktitle = "Proc. 3rd International Conference on Automata, Languages, and Programming", address = "Edinburgh University", pages = "382-410", year = 1976, } Recently, Jyrki Katajainen (Copenhagen University) and I showed how this method can be implemented in-place using the n words of the input vector, and O(1) extra space. That is, we take an n-item array A containing sorted symbol frequencies, use O(1) extra space and O(n) time, and overwrite A[i] by the length in bits of the codeword for symbol i. (This codeword length can then be easily turned into the actual codeword in a further O(n)-time pass; for all of our work it is more useful to generate the lengths so that we can calculate a canonical code rather than risk our luck with the codewords produced by Huffmans method, and so we tend to regard the job as being done when we have the lengths). We have also been studying the problem of space-efficient generation of optimal length-limited Huffman codes, and presented a paper on the generation of these two types of prefix codes last week at the Data Compression Conference in Utah. If you are interested, I can send some more information through the post. Cheers,... Alistair Moffat, alistair@cs.mu.oz.au Department of Computer Science, The University of Melbourne Parkville, Victoria 3052, Australia.