18-2 Joining and splitting 2-3-4 trees

The join operation takes two dynamic sets $S'$ and $S''$ and an element $x$ such that for any $x' \in S'$ and $x'' \in S''$, we have $x'.key < x.key < x''.key$. It returns a set $S = S' \cup \{x\} \cup S''$. The split operation is like an "inverse" join: given a dynamic set $S$ and an element $x \in S$, it creates a set $S'$ that consists of all elements in set $S$ and an element $x \in S$, it creates a set $S'$ that consists of all elements in $S - \{x\}$ whose keys are less than $x.key$ and a set $S''$ that consists of all elements in $S - \{x\}$ whose keys are greater than $x.key$. In this problem, we investigate how to implement these operations on 2-3-4 trees. We assume for convenience that elements consist only of keys and that all key values are distinct.

a. Show how to maintain, for every node $x$ of a 2-3-4 tree, the height of the subtree rooted at $x$ as an attribute $x.height$. Make sure that your implementation does not affect the asymptotic running times of searching, insertion, and deletion.

b. Show how to implement the join operation. Given two 2-3-4 trees $T'$ and $T''$ and a key $k$, the join operation should run in $O(1 + |h' - h''|)$ time, where $h'$ and $h''$ are the heights of $T'$ and $T''$, respectively.

c. Consider the simple path $p$ from the root of a 2-3-4 tree $T$ to a given key $k$, the set $S'$ of keys in $T$ that are less than $k$, and the set $S''$ of keys in $T$ that are greater than $k$. Show that $p$ breaks $S'$ into a set of trees $\{T'_0, T'_1, \ldots, T'_m\}$ and a set of keys $\{k'_1, k'_2, \ldots, k'_m\}$, where, for $i = 1, 2, \ldots, m$, we have $y < k'_i < z$ for any keys $y \in T'_{i - 1}$ and $z \in T'_i$. What is the relationship between the heights of $T'_{i - 1}$ and $T'_i$? Describe how $p$ breaks $S''$ into sets of trees and keys.

d. Show how to implement the split operation on $T$. Use the join operation to assemble the keys in $S'$ into a single 2-3-4 tree $T'$ and the keys in $S''$ into a single 2-3-4 tree $T''$. The running time of the split operation should be $O(\lg n)$, where $n$ is the number of keys in $T$. ($\textit{Hint:}$ The costs for joining should telescope.)

a. For insertion it will suffice to explain how to update height when we split a node. Suppose node $x$ is split into nodes $y$ and $z$, and the median of $x$ is merged into node $w$. The height of $w$ remains unchanged unless $x$ was the root (in which case $w.height = x.height + 1$).

The height of $y$ or $z$ will often change. We set

$$y.height = \max_i y.c_i .height + 1$$

and

$$z.height = \max_i z.c_i.height + 1.$$

Each update takes $O(t)$. Since a call to $\text{B-TREE-INSERT}$ makes at most $h$ splits where $h$ is the height of the tree, the total time it takes to update heights is $O(th)$, preserving the asymptotic running time of insert. For deletion the situation is even simple. The only time the height changes is when the root has a single node and it is merged with its subtree nodes, leaving an empty root node to be deleted. In this case, we update the height of the new node to be the (old) height of the root minus $1$.

b. Without loss of generality, assume $h' \ge h''$. We essentially wish to merge $T''$ into $T'$ at a node of height $h''$ using node $x$. To do this, find the node at depth $h' - h''$ on the right spine of $T'$. Add $x$ as a key to this node, and $T''$ as the additional child. If it should happen that the node was already full, perform a split operation.

c. Let $x_i$ be the node encountered after $i$ steps on path $p$. Let $l_i$ be the index of the largest key stored in $x_i$ which is less than or equal to $k$. We take $k_i' = x_i.key_{l_i}$ and $T'_{i - 1}$ to be the tree whose root node consists of the keys in $x_i$ which are less than $x_i.key_{l_i}$, and all of their children. In general, $T'_{i - 1}.height \ge T'_i.height$.

For $S''$, we take a similar approach. They keys will be those in nodes passed on $p$ which are immediately greater than $k$, and the trees will be rooted at a node consisting of the larger keys, with the associated subtrees. When we reach the node which contains $k$, we don't assign a key, but we do assign a tree.

d. Let $T_1$ and $T_2$ be empty trees. Consider the path $p$ from the root of $T$ to $k$. Suppose we have reached node $x_i$. We join tree $T'_{i - 1}$ to $T_1$, then insert $k' i$ into $T_1$. We join $T''_{i - 1}$ to $T_2$ and insert $k''_i$ into $T_2$. Once we have encountered the node which contains $k$ at $x_m.key_k$, join $x_m.c_k$ with $T_1$ and $x_m.c_{k + 1}$ with $T_2$.

We will perform at most $2$ join operations and $1$ insert operation for each level of the tree. Using the runtime determined in part (b), and the fact that when we join a tree $T'$ to $T_1$ (or $T''$ to $T_2$ respectively) the height difference is

$$T'.height - T_1.height.$$

Since the heights are nondecreasing of successive tree that are joined, we get a telescoping sum of heights. The first tree has height $h$, where $h$ is the height of $T$, and the last tree has height $0$. Thus, the runtime is

$$O(2(h + h)) = O(\lg n).$$