9-3 Small order statistics

We showed that the worst-case number $T(n)$ of comparisons used by $\text{SELECT}$ to select the $i$th order statistic from $n$ numbers satisfies $T(n) = \Theta(n)$, but the constant hidden by the $\Theta$-notation is rather large. When $i$ is small relative to $n$, we can implement a different procedure that uses $\text{SELECT}$ as a subroutine but makes fewer comparisons in the worst case.

a. Describe an algorithm that uses $U_i(n)$ comparisons to find the $i$th smallest of $n$ elements, where

$$ U_i(n) = \begin{cases} T(n) & \text{if $i \ge n / 2$}, \\ \lfloor n / 2 \rfloor + U_i(\lceil n / 2 \rceil) + T(2i) & \text{otherwise}. \end{cases} $$

($\textit{Hint:}$ Begin with $\lfloor n / 2 \rfloor$ disjoint pairwise comparisons, and recurse on the set containing the smaller element from each pair.)

b. Show that, if $i < n / 2$, then $U_i(n) = n + O(T(2i)\lg(n / i))$.

c. Show that if $i$ is a constant less than $n / 2$, then $U_i(n) = n + O(\lg n)$.

d. Show that if $i = n / k$ for $k \ge 2$, then $U_i(n) = n + O(T(2n / k)\lg k)$.

a. Our algorithm relies on a particular property of $\text{SELECT}$: that not only does it return the $i$th smallest element, but that it also partitions the input array so that the first $i$ positions contain the $i$ smallest elements (though not necessarily in sorted order). To see that $\text{SELECT}$ has this property, observe that there are only two ways in which returns a value: when $n = 1$, and when immediately after partitioning in step 4, it finds that there are exactly $i$ elements on the low side of the partition.

Taking the hint from the book, here is our modified algorithm to select the $i$th smallest element of $n$ elements. Whenever it is called with $i \ge n / 2$, it just calls $\text{SELECT}$ and returns its result; in this case, $U_i(n) = T(n)$.

When $i < n / 2$, our modified algorithm works as follows. Assume that the input is in a subarray $A[p + 1..p + n]$, and let $m = \lfloor n / 2 \rfloor$. In the initial call, $p = 1$.

  1. Divide the input as follows. If $n$ is even, divide the input into two parts: $A[p + 1..p + m]$ and $A[p + m + 1..p + n]$. If $n$ is odd, divide the input into three parts: $A[p + 1..p + m]$, $A[p + m + 1..p + n - 1]$, and $A[p + n]$ as a leftover piece.
  2. Compare $A[p + i]$ and $A[p + i + m]$ for $i = 1, 2, \ldots, m$, putting the smaller of the the two elements into $A[p + i + m]$ and the larger into $A[p + i]$.
  3. Recursively find the $i$th smallest element in $A[p + m + 1..p + n]$, but with an additional action performed by the partitioning procedure: whenever it exchanges $A[j]$ and $A[k]$ (where $p + m + 1 \le j$, $k \le p + 2m$), it also exchanges $A[j - m]$ and $A[k - m]$. The idea is that after recursively finding the $i$th smallest element in $A[p + m + 1..p + n]$, the subarray $A[p + m + 1..p + m + i]$ contains the $i$ smallest elements that had been in $A[p + m + 1..p + n]$ and the subarray $A[p + 1..p + i]$ contains their larger counterparts, as found in step 1. The $i$th smallest element of $A[p + 1..p + n]$ must be either one of the $i$ smallest, as placed into $A[p + m + 1..p + m + i]$, or it must be one of the larger counterparts, as placed into $A[p + 1..p + i]$.
  4. Collect the subarrays $A[p + 1..p + i]$ and $A[p + m + 1..p + m + i]$ into a single array $B[1..2i]$, call $\text{SELECT}$ to find the $i$th smallest element of $B$, and return the result of this call to $\text{SELECT}$.

The number of comparisons in each step is as follows:

  1. No comparisons.
  2. $m = \lfloor n / 2 \lfloor$ comparisons.
  3. Since we recurse on $A[p + m + 1..p + n]$, which has $dn / 2e$ elements, the number of comparisons is $U_i(\lceil n / 2 \rceil)$.
  4. Since we call $\text{SELECT}$ on an array with $2i$ elements, the number of comparisons is $T(2i)$.

Thus, when $i < n / 2$, the total number of comparisons is $\lfloor n / 2 \rfloor + U_i(\lceil n / 2 \rceil) + T(2i)$.

b. We show by substitution that if $i < n / 2$, then $U_i(n) = n + O(T(2i)\lg(n / i))$. In particular, we show that

$$ \begin{aligned} U_i(n) & \le n + cT (2i)\lg(n / i) - d(\lg\lg n)T(2i) \\ & = n + cT(2i)\lg n - cT(2i)\lg i - d(\lg\lg n)T(2i) \end{aligned} $$

for some positive constant $c$, some positive constant $d$ to be chosen later, and $n \ge 4$. We have

$$ \begin{aligned} U_i(n) & = \lfloor n / 2 \rfloor + U_i(\lceil n / 2 \rceil) + T(2i) \\ & \le \lfloor n / 2 \rfloor + \lceil n / 2 \rceil + cT(2i)\lg\lceil n / 2 \rceil - cT(2i)\lg i - d(\lg\lg\lceil n / 2 \rceil)T(2i) \\ & = n + cT(2i)\lg\lceil n / 2 \rceil - cT(2i)\lg i - d(\lg\lg\lceil n / 2 \rceil)T(2i) \\ & \le n + cT(2i)\lg(n / 2 + 1) - cT(2i)\lg i - d(\lg\lg(n / 2))T(2i) \\ & = n + cT(2i)\lg(n / 2 + 1) - cT(2i)\lg i - d(\lg(\lg n - 1))T(2i) \\ & \le n + cT(2i)\lg n - cT(2i)\lg i - d(\lg\lg n)T(2i). \end{aligned} $$

If

$$cT(2i)\lg(n)2 + 1) - d(\lg(\lg n - 1))T(2i) \le cT(2i)\lg n - d(\lg\lg n)T(2i),$$

simple algebraic manipulations gives the following sequence of equivalent conditions:

$$ \begin{aligned} cT(2i)\lg(n / 2 + 1) - d(\lg(\lg n - 1))T(2i) & \le cT(2i)\lg n - d(\lg\lg n)T(2i) \\ c\lg(n / 2 + 1) - d(\lg(\lg n - 1)) & \le c\lg n - d(\lg\lg n) \\ c(\lg(n / 2 + 1) - \lg n) & \le d(\lg(\lg n - 1) - \lg\lg n) \\ c\Bigg(\lg\frac{n / 2 + 1}{n}\Bigg) & \le d\lg\frac{\lg n - 1}{\lg n} \\ c\Bigg(\lg\Bigg(\frac{1}{2} + \frac{1}{n}\Bigg)\Bigg) & \le d\lg\frac{\lg n - 1}{\lg n}. \end{aligned} $$

Observe that $1 / 2 + 1 / n$ decreases as $n$ increases, but $(\lg n - 1) / \lg n$ increases as $n$ increases. When $n = 4$, we have $1 / 2 + 1 / n = 3 / 4$ and $(\lg n - 1) = \lg n = 1 / 2$. Thus, we just need to choose $d$ such that $c\lg(3 / 4) \le d\lg(1 / 2)$ or, equivalently, $c\lg(3 / 4) \le -d$. Multiplying both sides by $-1$, we get $d \le -c\lg(3 / 4) = c\lg(4 / 3)$. Thus, any value of $d$ that is at most $c\lg(4 / 3)$ suffices.

c. When $i$ is a constant, $T(2i) = O(1)$ and $\lg(n / i) = \lg n - \lg i = O(\lg n)$.

Thus, when $i$ is a constant less than $n / 2$, we have that

$$ \begin{aligned} U_i(n) & = n + O(T(2i)\lg(n / i)) \\ & = n + O(O(1) \cdot O(\lg n)) \\ & = n + O(\lg n). \end{aligned} $$

d. Suppose that $i = n / k$ for $k \ge 2$. Then $i \le n / 2$. If $k > 2$, then $i < n / 2$, and we have

$$ \begin{aligned} U_i(n) & = n + O(T(2i)\lg(n / i)) \\ & = n + O(T(2n / k)\lg(n/(n / k)) \\ & = n + O(T(2n / k)\lg k). \end{aligned} $$

If $k = 2$, then $n = 2i$ and $\lg k = 1$. We have

$$ \begin{aligned} U_i(n) & = T(n) \\ & = n + (T(n) - n) \\ & \le n + (T(2i) - n) \\ & = n + (T(2n / k) - n) \\ & = n + (T(2n / k)\lg k - n) \\ & = n + O(T(2n / k)\lg k). \end{aligned} $$