# 20.3 The van Emde Boas tree

## 20.3-1

Modify vEB trees to support duplicate keys.

To support duplicate keys, for each $u = 2$ vEB tree, instead of storing just a bit in each of the entries of its array, it should store an integer representing how many elements of that value the vEB contains.

## 20.3-2

Modify vEB trees to support keys that have associated satellite data.

For any key which is a minimum on some vEB, we'll need to store its satellite data with the min value since the key doesn't appear in the subtree. The rest of the satellite data will be stored alongside the keys of the vEB trees of size $2$. Explicitly, for each non-summary vEB tree, store a pointer in addition to min. If min is $\text{NIL}$, the pointer should also point to $\text{NIL}$. Otherwise, the pointer should point to the satellite data associated with that minimum. In a size $2$ vEB tree, we'll have two additional pointers, which will each point to the minimum's and maximum's satellite data, or $\text{NIL}$ if these don't exist. In the case where $\min = \max$, the pointers will point to the same data.

## 20.3-3

Write pseudocode for a procedure that creates an empty van Emde Boas tree.

We define the procedure for any $u$ that is a power of $2$. If $u = 2$, then, just slap that fact together with an array of length $2$ that contains $0$ in both entries.

If $u = 2k > 2$, then, we create an empty vEB tree called Summary with $u = 2^{\lceil k / 2 \rceil}$. We also make an array called cluster of length $2^{\lceil k / 2 \rceil}$ with each entry initialized to an empty vEB tree with $u = 2^{\lfloor k / 2 \rfloor}$. Lastly, we create a min and max element, both initialized to $\text{NIL}$.

## 20.3-4

What happens if you call $\text{VEB-TREE-INSERT}$ with an element that is already in the vEB tree? What happens if you call $\text{VEB-TREE-DELETE}$ with an element that is not in the vEB tree? Explain why the procedures exhibit the behavior that they do. Show how to modify vEB trees and their operations so that we can check in constant time whether an element is present.

Suppose that $x$ is already in $V$ and we call $\text{INSERT}$. Then we can't satisfy lines 1, 3, 6, or 10, so we will enter the else case on line 9 every time, causing an infinite loop. Now suppose we call $\text{DELETE}$ when $x$ isn't in $V$ . If there is only a single element in $V$, lines 1 through 3 will delete it, regardless of what element it is. To enter the elseif of line 4, $x$ can't be equal to $0$ or $1$ and the vEB tree must be of size $2$. In this case, we delete the max element, regardless of what it is. Since the recursive call always puts us in this case, we always delete an element we shouldn't. To avoid these issue, keep and updated auxiliary array $A$ with $u$ elements. Set $A[i] = 0$ if $i$ is not in the tree, and $1$ if it is. Since we can perform constant time updates to this array, it won't affect the runtime of any of our operations. When inserting $x$, check first to be sure $A[x] = 0$. If it's not, simply return. If it is, set $A[x] = 1$ and proceed with insert as usual. When deleting $x$, check if $A[x] = 1$. If it isn't, simply return. If it is, set $A[x] = 0$ and proceed with delete as usual.

## 20.3-5

Suppose that instead of $\sqrt[\uparrow]u$ clusters, each with universe size $\sqrt[\downarrow]u$, we constructed vEB trees to have $u^{1 / k}$ clusters, each with universe size $u^{1 - 1 / k}$, where $k > 1$ is a constant. If we were to modify the operations appropriately, what would be their running times? For the purpose of analysis, assume that $u^{1 / k}$ and $u^{1 - 1 / k}$ are always integers.

Similar to the analysis of $\text{(20.4)}$, we will analyze

$$T(u) \le T(u^{1 - 1 / k}) + T(u^{1 / k}) + O(1).$$

This is a good choice for analysis because for many operations we first check the summary vEB tree, which will have size $u^{1 / k}$ (the second term). And then possible have to check a vEB tree somewhere in cluster, which will have size $u^{1 - 1/k}$ (the first term). We let $T(2^m) = S(m)$, so the equation becomes

$$S(m) \le S(m(1 - 1/k)) + S(m/k) + O(1).$$

If $k > 2$ the first term dominates, so by master theorem, we'll have that $S(m)$ is $O(\lg m)$, this means that T will be $O(\lg(\lg u))$ just as in the original case where we took squareroots.

## 20.3-6

Creating a vEB tree with universe size $u$ requires $O(u)$ time. Suppose we wish to explicitly account for that time. What is the smallest number of operations $n$ for which the amortized time of each operation in a vEB tree is $O(\lg\lg u)$?

Set $n = u / \lg\lg u$. Then performing $n$ operations takes $c(u + n\lg\lg u)$ time for some constant $c$. Using the aggregate amortized analysis, we divide by $n$ to see that the amortized cost of each operations is $c(\lg\lg u + \lg\lg u) = O(\lg\lg u)$ per operation. Thus we need $n \ge u/ \lg \lg u$.