# 17-2 Making binary search dynamic

Binary search of a sorted array takes logarithmic search time, but the time to insert a new element is linear in the size of the array. We can improve the time for insertion by keeping several sorted arrays.

Specifically, suppose that we wish to support $\text{SEARCH}$ and $\text{INSERT}$ on a set of $n$ elements. Let $k = \lceil \lg(n + 1) \rceil$, and let the binary representation of $n$ be $\langle n_{k - 1}, n_{k - 2}, \ldots, n_0 \rangle$. We have $k$ sorted arrays $A_0, A_1, \ldots, A_{k - 1}$, where for $i = 0, 1, \ldots, k - 1$, the length of array $A_i$ is $2^i$. Each array is either full or empty, depending on whether $n_i = 1$ or $n_i = 0$, respectively. The total number of elements held in all $k$ arrays is therefore $\sum_{i = 0}^{k - 1} n_i 2^i = n$. Although each individual array is sorted, elements in different arrays bear no particular relationship to each other.

a. Describe how to perform the $\text{SEARCH}$ operation for this data structure. Analyze its worst-case running time.

b. Describe how to perform the $\text{INSERT}$ operation. Analyze its worst-case and amortized running times.

c. Discuss how to implement $\text{DELETE}$.

a. The $\text{SEARCH}$ operation can be performed by searching each of the individually sorted arrays. Since all the individual arrays are sorted, searching one of them using a binary search algorithm takes $O(\lg m)$ time, where $m$ is the size of the array. In an unsuccessful search, the time is $\Theta(\lg m)$. In the worst case, we may assume that all the arrays $A_0, A_1, \ldots, A_{k - 1}$ are full, $k = \lceil \lg(n + 1) \rceil$, and we perform an unsuccessful search. The total time taken is

\begin{aligned} T(n) & = \Theta(\lg 2^{k - 1} + \lg 2^{k - 2} + \cdots + \lg 2^1 + \lg 2^0) \\ & = \Theta((k - 1) + (k - 2) + \cdots + 1 + 0) \\ & = \Theta(k(k - 1) / 2) \\ & = \Theta(\lceil \lg(n + 1)\rceil(\lceil \lg(n + 1) \rceil) - 1) / 2) \\ & = \Theta(\lg^2 n). \end{aligned}

Thus, the worst-case running time is $\Theta(\lg^2 n)$.

b. We create a new sorted array of size $1$ containing the new element to be inserted. If array $A_0$ (which has size $1$) is empty, then we replace $A_0$ with the new sorted array. Otherwise, we merge sort the two arrays into another sorted array of size $2$. If $A_1$ is empty, then we replace $A_1$ with the new array; otherwise we merge sort the arrays as before and continue. Since array $A_i$ is of size $2^i$, if we merge sort two arrays of size $2^i$ each, we obtain one of size $2^{i + 1}$, which is the size of $A_{i + 1}$. Thus, this method will result in another list of arrays in the same structure that we had before.

Let us analyze its worst-case running time. We will assume that merge sort takes $2m$ time to merge two sorted lists of size $m$ each. If all the arrays $A_0, A_1, \ldots, A_{k - 2}$ are full, then the running time to fill array $A_{k - 1}$ would be

\begin{aligned} T(n) & = 2(2^0 + 2^1 + \cdots + 2^{k - 2}) \\ & = 2(2^{k - 1} - 1) \\ & = 2^k - 2 \\ & = \Theta(n). \end{aligned}

Therefore, the worst-case time to insert an element into this data structure is $\Theta(n)$.

However, let us now analyze the amortized running time. Using the aggregate method, we compute the total cost of a sequence of $n$ inserts, starting with the empty data structure. Let $r$ be the position of the rightmost $0$ in the binary representation $\langle n_{k - 1}, n_{k - 2}, \ldots, n_0 \rangle$ of $n$, so that $n_j = 1$ for $j = 0, 1, \ldots, r - 1$. The cost of an insertion when $n$ items have already been inserted is

$$\sum_{j = 0}^{r - 1} 2 \cdot 2^j = O(2^r).$$

Furthermore, $r = 0$ half the time, $r = 1$ a quarter of the time, and so on. There are at most $\lceil n / 2^r \rceil$ insertions for each value of $r$. The total cost of the $n$ operations is therefore bounded by

$$O\Bigg(\sum_{r = 0}^{\lceil \lg(n + 1) \rceil} \bigg(\bigg\lceil \frac{n}{2^r} \bigg\rceil\bigg) 2^r \Bigg) = O(n\lg n).$$

The amortized cost per $\text{INSERT}$ operation, therefore is $O(\lg n)$.

We can also use the accounting method to analyze the running time. We can charge $k$ to insert an element. $\$1$pays for the insertion, and we put$\$(k - 1)$ on the inserted item to pay for it being involved in merges later on. Each time it is merged, it moves to a higher-indexed array, i.e., from $A_i$ to $A_{i + 1}$. It can move to a higher-indexed array at most $k - 1$ times, and so the $\$(k - 1)$on the item suffices to pay for all the times it will ever be involved in merges. Since$k = \Theta(\lg n)$, we have an amortized cost of$\Theta(\lg n)$per insertion. c.$\text{DELETE}(x)$will be implemented as follows: 1. Find the smallest$j$for which the array$A_j$with$2^j$elements is full. Let$y$be the last element of$A_j$. 2. Let$x$be in the array$A_i$. If necessary, find which array this is by using the search procedure. 3. Remove$x$from$A_i$and put$y$into$A_i$. Then move$y$to its correct place in$A_i$. 4. Divide$A_j$(which now has$2^j - 1$elements left): The first element goes into array$A_0$, the next 2 elements go into array$A_1$, the next 4 elements go into array$A_2$, and so forth. Mark array$A_j$as empty. The new arrays are created already sorted. The cost of$\text{DELETE}$is$\Theta(n)$in the worst case, where$i = k - 1$and$j = k - 2: \Theta(\lg n)$to find$A_j$,$\Theta(\lg^2 n)$to find$A_i$,$\Theta(2^i) = \Theta(n)$to put$y$in its correct place in array$A_i$, and$\Theta(2^j) = \Theta(n)$to divide array$A_j$. The following sequence of$n$operations, where$n / 3$is a power of$2$, yields an amortized cost that is no better: perform$n / 3\text{INSERT}$operations, followed by$n / 3$pairs of$\text{DELETE}$and$\text{INSERT}$. It costs$O(n\lg n)$to do the first$n / 3\text{INSERT}$operations. This creates a single full array. Each subsequent$\text{DELETE}$/$\text{INSERT}$pair costs$\Theta(n)$for the$\text{DELETE}$to divide the full array and another$\Theta(n)$for the$\text{INSERT}$to recombine it. The total is then$\Theta(n^2)$, or$\Theta(n)\$ per operation.