16.3 Huffman codes


Explain why, in the proof of Lemma 16.2, if $x.freq = b.freq$, then we must have $a.freq = b.freq = x.freq = y.freq$.

If we have that $x.freq = b.freq$, then we know that $b$ is tied for lowest frequency. In particular, it means that there are at least two things with lowest frequency, so $y.freq = x.freq$. Also, since $x.freq \le a.freq \le b.freq = x.freq$, we must have $a.freq = x.freq$.


Prove that a binary tree that is not full cannot correspond to an optimal prefix code.

Let $T$ be a binary tree that is not full. $T$ represents a binary prefix code for a file composed of characters from alphabet $C$, where $c \in C$, $f(c)$ is th number of occurrences of $c$ in the file. The cost of tree $T$, or the number of bits in the encoding, is $\sum_{c \in C} d_T(c) \cdot f(c)$, where $d_T(c)$ is the depth of character $c$ in tree $T$.

Let $N$ be a node of greatest depth that has exactly one child. If $N$ is the root of $T$, $N$ can be removed and the deepth of each node reduced by one, yielding a tree representing the same alphabet with a lower cost. This mean the original code was not optimal.

Otherwise, let $M$ be the parent of $N$, let $T_1$ be the (possibly non-existent) sibling of $N$, and let $T_2$ be the subtree rooted at the child of $N$. Replace $M$ by $N$, making the children of $N$ the roots of subtrees $T_1$ and $T_2$. If $T_1$ is empty, repeat the process. We have a new prefix code of lower cost, so the original was not optimal.


What is an optimal Huffman code for the following set of frequencies, based on
the first $8$ Fibonacci numbers?

$$a:1 \quad b:1 \quad c:2 \quad d:3 \quad e:5 \quad f:8 \quad g:13 \quad h:21$$

Can you generalize your answer to find the optimal code when the frequencies are the first $n$ Fibonacci numbers?

$$ \begin{array}{c|l} a & 1111111 \\ b & 1111110 \\ c & 111110 \\ d & 11110 \\ e & 1110 \\ f & 110 \\ g & 10 \\ h & 0 \end{array} $$


Prove that we can also express the total cost of a tree for a code as the sum, over all internal nodes, of the combined frequencies of the two children of the node.

Let tree be a full binary tree with $n$ leaves. Apply induction hypothesis on the number of leaves in $T$. When $n = 2$ (the case $n = 1$ is trivially true), there are two leaves $x$ and $y$ with the same parent $z$, then the cost of $T$ is

$$ \begin{aligned} B(T) & = f(x) d_T(x) + f(y) d_T(y) \\ & = f(x) + f(y) & \text{since $d_T(x) = d_T(y) = 1$} \\ & = f(\text{child}_1\text{ of }z) + f(\text{child}_2\text{ of }z). \end{aligned} $$

Thus, the statement of theorem is true. Now suppose $n > 2$ and also suppose that theorem is true for trees on $n - 1$ leaves. Let $c_1$ and $c_2$ are two sibling leaves in $T$ such that they have the same parent $p$. Letting $T'$ be the tree obtained by deleting $c_1$ and $c_2$, by induction we know that

$$ \begin{aligned} B(T) & = \sum_{\text{leaves } l'\in T'} f(l')d_T(l') \\ & = \sum_{\text{internal nodes } i'\in T'} f(\text{child}_1\text{ of }i') + f(\text{child}_2\text{ of }i'). \end{aligned} $$

Using this information, calculates the cost of $T$.

$$ \begin{aligned} B(T) & = \sum_{\text{leaves }l \in T} f(l)d_T(l) \\ & = \sum_{l \ne c_1, c_2} f(l)d_T(l) + f(c_1)d_T(c_1) - 1 + f(c_2)d_T(c_2) - 1 + f(c_1) + f(c_2) \\ & = \sum_{\text{internal nodes }i'\in T'} f(\text{child}_1\text{ of }i') + f(\text{child}_2\text{ of }i') + f(c_1) + f(c_2) \\ & = \sum_{\text{internal nodes }i\in T} f(\text{child}_1\text{ of }i) + f(\text{child}_1\text{ of }i). \end{aligned} $$

Thus the statement is true.


Prove that if we order the characters in an alphabet so that their frequencies are monotonically decreasing, then there exists an optimal code whose codeword lengths are monotonically increasing.

It were a contradiction to have an optimal tree whose frequencies and codewords were monotonically increasing in the strict sense; since, given $f(x_1) > \ldots > f(x_n) \wedge d_T(x_1) > \cdots > d_T(x_n)$, it follows that (where $n$ is odd):

$$ \begin{aligned} & f(x_1)d_T(x_1) + \cdots + f(x_n)d_T(x_n) > f(x_1)d_T(x_n) + \cdots + f(x_n)d_T(x_1) > 0 \\ & f(x_1)(d_T(x_1) - d_T(x_n)) + \cdots + f(x_n)(d_T(x_n) - d_T(x_1)) > 0 \\ & f(x_1)(d_T(x_1) - d_T(x_n)) + \cdots + f(x_{\lfloor \frac{n}{2} - 1 \rfloor})(d_T(x_{\lfloor \frac{n}{2} \rfloor - 1}) - d_T(x_{\lfloor \frac{n}{2} \rfloor + 1})) \\ & > f(x_{\lfloor \frac{n}{2} + 1 \rfloor})(d_T(x_{\lfloor \frac{n}{2} \rfloor - 1}) - d_T(x_{\lfloor \frac{n}{2} \rfloor + 1}))+ \cdots + f(x_n)(d_T(x_1) - d_T(x_n)). \end{aligned} $$

That is, where $i$ and $j$ are the upper and lower median, respectively; and $c_i = d_T(x_i) - d_T(x_{n - i + 1})$:

$$f(x_1)c_1 + \cdots + f(x_i)c_i > f(x_j)c_i + \cdots + f(x_n)c_1$$


$$ \begin{aligned} f(x_i) & > f(x_{n - i + 1}) & 1 \le i \le \Big\lfloor \frac{n}{2} \Big\rfloor. \end{aligned} $$


Suppose we have an optimal prefix code on a set $C = \{0, 1, \ldots, n - 1 \}$ of characters and we wish to transmit this code using as few bits as possible. Show how to represent any optimal prefix code on $C$ using only $2n - 1 + n \lceil \lg n \rceil$ bits. ($\textit{Hint:}$ Use $2n - 1$ bits to specify the structure of the tree, as discovered by a walk of the tree.)

First observe that any full binary tree has exactly $2n - 1$ nodes. We can encode the structure of our full binary tree by performing a preorder traversal of $T$. For each node that we record in the traversal, write a $0$ if it is an internal node and a $1$ if it is a leaf node. Since we know the tree to be full, this uniquely determines its structure.

Next, note that we can encode any character of $C$ in $\lceil \lg n \rceil$ bits. Since there are $n$ characters, we can encode them in order of appearance in our preorder traversal using $n\left\lceil \lg n \right\rceil$ bits.


Generalize Huffman's algorithm to ternary codewords (i.e., codewords using the symbols $0$, $1$, and $2$), and prove that it yields optimal ternary codes.

Instead of grouping together the two with lowest frequency into pairs that have the smallest total frequency, we will group together the three with lowest frequency in order to have a final result that is a ternary tree. The analysis of optimality is almost identical to the binary case. We are placing the symbols of lowest frequency lower down in the final tree and so they will have longer codewords than the more frequently occurring symbols.


Suppose that a data file contains a sequence of $8$-bit characters such that all $256$ characters are about equally common: the maximum character frequency is less than twice the minimum character frequency. Prove that Huffman coding in this case is no more efficient than using an ordinary $8$-bit fixed-length code.

For any $2$ characters, the sum of their frequencies exceeds the frequency of any other character, so initially Huffman coding makes $128$ small trees with $2$ leaves each. At the next stage, no internal node has a label which is more than twice that of any other, so we are in the same setup as before. Continuing in this fashion, Huffman coding builds a complete binary tree of height $\lg 256 = 8$, which is no more efficient than ordinary $8$-bit length codes.


Show that no compression scheme can expect to compress a file of randomly chosen $8$-bit characters by even a single bit. ($\textit{Hint:}$ Compare the number of possible files with the number of possible encoded files.)

If every possible character is equally likely, then, when constructing the Huffman code, we will end up with a complete binary tree of depth $7$. This means that every character, regardless of what it is will be represented using $7$ bits.

This is exactly as many bits as was originally used to represent those characters, so the total length of the file will not decrease at all.