17-2 Making binary search dynamic

Binary search of a sorted array takes logarithmic search time, but the time to insert a new element is linear in the size of the array. We can improve the time for insertion by keeping several sorted arrays.

Specifically, suppose that we wish to support $\text{SEARCH}$ and $\text{INSERT}$ on a set of $n$ elements. Let $k = \lceil \lg(n + 1) \rceil$, and let the binary representation of $n$ be $\langle n_{k - 1}, n_{k - 2}, \ldots, n_0 \rangle$. We have $k$ sorted arrays $A_0, A_1, \ldots, A_{k - 1}$, where for $i = 0, 1, \ldots, k - 1$, the length of array $A_i$ is $2^i$. Each array is either full or empty, depending on whether $n_i = 1$ or $n_i = 0$, respectively. The total number of elements held in all $k$ arrays is therefore $\sum_{i = 0}^{k - 1} n_i 2^i = n$. Although each individual array is sorted, elements in different arrays bear no particular relationship to each other.

a. Describe how to perform the $\text{SEARCH}$ operation for this data structure. Analyze its worst-case running time.

b. Describe how to perform the $\text{INSERT}$ operation. Analyze its worst-case and amortized running times.

c. Discuss how to implement $\text{DELETE}$.

a. We linearly go through the lists and binary search each one since we don't know the relationship between one list an another. In the worst case, every list is actually used. Since list $i$ has length $2^i$ and it's sorted, we can search it in $O(i)$ time. Since $i$ varies from $0$ to $O(\lg n)$, the runtime of SEARCH is $O(\lg^2 n)$.

b. To insert, we put the new element into $A_0$ and update the lists accordingly. In the worst case, we must combine lists $A_0, A_1, \dots, A_{m − 1}$ into list Am. Since merging two sorted lists can be done linearly in the total length of the lists, the time this takes is $O(2^m)$. In the worst case, this takes time $O(n)$ since $m$ could equal $k$.

We'll use the accounting method to analyse the amoritized cost. Assign a cost of $\lg n$ to each insertion. Thus, each item carries $\lg n$ credit to pay for its later merges as additional items are inserted. Since an individual item can only be merged into a larger list and there are only $\lg n$ lists, the credit pays for all future costs the item might incur. Thus, the amortized cost is $O(\lg n)$.

c. Find the smallest $m$ such that $n_m \ne 0$ in the binary representation of $n$. If the item to be deleted is not in list $A_m$, remove it from its list and swap in an item from Am, arbitrarily. This can be done in $O(\lg n)$ time since we may need to search list $A_k$ to find the element to be deleted. Now simply break list $A_m$ into lists $A_0, A_1, \dots, A_{m − 1}$ by index. Since the lists are already sorted, the runtime comes entirely from making the splits, which takes $O(m)$ time. In the worst case, this is $O(\lg n)$.