12-2 Radix trees

Given two strings $a = a_0a_1 \ldots a_p$ and $b = b_0b_1 \ldots b_q$, where each $a_i$ and each $b_j$ is in some ordered set of characters, we say that string $a$ is lexicographically less than string $b$ if either

  1. there exists an integer $j$, where $0 \le j \le \min(p, q)$, such that $a_i = b_i$ for all $i = 0, 1, \ldots j - 1$ and $a_j < b_j$, or
  2. $p < q$ and $a_i = b_i$ for all $i = 0, 1, \ldots, p$.

For example, if $a$ and $b$ are bit strings, then $10100 < 10110$ by rule 1 (letting $j = 3$) and $10100 < 101000$ by rule 2. This ordering is similar to that used in English-language dictionaries.

The radix tree data structure shown in Figure 12.5 stores the bit strings $1011, 10, 011, 100$, and $0$. When searching for a key $a = a_0a_1 \ldots a_p$, we go left at a node of depth $i$ if $a_i = 0$ and right if $a_i = 1$. Let $S$ be a set of distinct bit strings whose lengths sum to $n$. Show how to use a radix tree to sort $S$ lexicographically in $\Theta(n)$ time. For the example in Figure 12.5, the output of the sort should be the sequence $0, 011, 10, 100, 1011$.

To sort the strings of $S$, we first insert them into a radix tree, and then use a preorder tree walk to extract them in lexicographically sorted order. The tree walk outputs strings only for nodes that indicate the existence of a string (i.e., those that are lightly shaded in Figure 12.5 of the text).

Correctness: The preorder ordering is the correct order because:

  • Any node's string is a prefix of all its descendants' strings and hence belongs before them in the sorted order (rule 2).
  • A node's left descendants belong before its right descendants because the corresponding strings are identical up to that parent node, and in the next position the left subtree's strings have $0$ whereas the right subtree's strings have $1$ (rule 1).

Time: $\Theta(n)$.

  • Insertion takes $\Theta(n)$ time, since the insertion of each string takes time proportional to its length (traversing a path through the tree whose length is the length of the string), and the sum of all the string lengths is $n$.
  • The preorder tree walk takes $O(n)$ time. It is just like $\text{INORDER-TREE-WALK}$ (it prints the current node and calls itself recursively on the left and right subtrees), so it takes time proportional to the number of nodes in the tree. The number of nodes is at most $1$ plus the sum $(n)$ of the lengths of the binary strings in the tree, because a length-$i$ string corresponds to a path through the root and $i$ other nodes, but a single node may be shared among many string paths.