9.3 Selection in worst-case linear time

9.3-1

In the algorithm $\text{SELECT}$, the input elements are divided into groups of $5$. Will the algorithm work in linear time if they are divided into groups of $7$? Argue that $\text{SELECT}$ does not run in linear time if groups of $3$ are used.

It will still work if they are divided into groups of $7$, because we will still know that the median of medians is less than at least $4$ elements from half of the $\lceil n / 7 \rceil$ groups, so, it is greater than roughly $4n / 14$ of the elements.

Similarly, it is less than roughly $4n / 14$ of the elements. So, we are never calling it recursively on more than $10n / 14$ elements. $T(n) \le T(n / 7) + T(10n / 14) + O(n)$. So, we can show by substitution this is linear.

We guess $T(n) < cn$ for $n < k$. Then, for $m \ge k$,

$$ \begin{aligned} T(m) & \le T(m / 7) + T(10m / 14) + O(m) \\ & \le cm(1 / 7 + 10 / 14) + O(m), \end{aligned} $$

therefore, as long as we have that the constant hidden in the big-Oh notation is less than $c / 7$, we have the desired result.

Suppose now that we use groups of size $3$ instead. So, For similar reasons, we have that the recurrence we are able to get is $T(n) = T(\lceil n / 3 \rceil) + T(4n / 6) + O(n) \ge T(n / 3) + T(2n / 3) + O(n)$ So, we will show it is $\ge cn \lg n$.

$$ \begin{aligned} T(m) & \ge c(m / 3)\lg (m / 3) + c(2m / 3) \lg (2m / 3) + O(m) \\ & \ge cm\lg m + O(m), \end{aligned} $$

therefore, we have that it grows more quickly than linear.

9.3-2

Analyze $\text{SELECT}$ to show that if $n \ge 140$, then at least $\lceil n / 4 \rceil$ elements are greater than the median-of-medians $x$ and at least $\lceil n / 4 \rceil$ elements are less than $x$.

$$ \begin{aligned} \frac{3n}{10} - 6 & \ge \lceil \frac{n}{4} \rceil \\ \frac{3n}{10} - 6 & \ge \frac{n}{4} + 1 \\ 12n - 240 & \ge 10n + 40 \\ n & \ge 140. \end{aligned} $$

9.3-3

Show how quicksort can be made to run in $O(n\lg n)$ time in the worst case, assuming that all elements are distinct.

We can modify quicksort to run in worst case $n\lg n$ time by choosing our pivot element to be the exact median by using quick select. Then, we are guaranteed that our pivot will be good, and the time taken to find the median is on the same order of the rest of the partitioning.

9.3-4 $\star$

Suppose that an algorithm uses only comparisons to find the $i$th smallest element in a set of $n$ elements. Show that it can also find the $i - 1$ smaller elements and $n - i$ larger elements without performing additional comparisons.

Create a graph with $n$ vertices and draw a directed edge from vertex $i$ to vertex $j$ if the $i$th and $j$th elements of the array are compared in the algorithm and we discover that $A[i] \ge A[j]$. Observe that $A[i]$ is one of the $i - 1$ smaller elements if there exists a path from $x$ to $i$ in the graph, and $A[i]$ is one of the $n - i$ larger elements if there exists a path from $i$ to $x$ in the graph. Every vertex $i$ must either lie on a path to or from $x$ because otherwise the algorithm can't distinguish between $i \le x$ and $i \ge x$. Moreover, if a vertex $i$ lies on both a path to $x$ and a path from $x$, then it must be such that $x \le A[i] \le x$,so $x = A[i]$. In this case, we can break ties arbitrarily.

9.3-5

Suppose that you have a "black-box" worst-case linear-time median subroutine. Give a simple, linear-time algorithm that solves the selection problem for an arbitrary order statistic.

To use it, just find the median, partition the array based on that median.

If $i$ is less than half the length of the original array, recurse on the first half.
If $i$ is half the length of the array, return the element coming from the median finding black box.
If $i$ is more than half the length of the array, subtract half the length of the array, and then recurse on the second half of the array.

9.3-6

The $k$th quantiles of an $n$-element set are the $k - 1$ order statistics that divide the sorted set into $k$ equal-sized sets (to within $1$). Give an $O(n\lg k)$-time algorithm to list the $k$th quantiles of a set.

Pre-calculate the positions of the quantiles in $O(k)$, we use the $O(n)$ select algorithm to find the $\lfloor k / 2 \rfloor$th position, after that the elements are divided into two sets by the pivot the $\lfloor k / 2 \rfloor$th position, we do it recursively in the two sets to find other positions. Since the maximum depth is $\lceil \lg k \rceil$, the total running time is $O(n\lg k)$.

PARTITION(A, p, r)
    x = A[r]
    i = p
    for k = p to r
        if A[k] < x
            i = i + 1
            swap A[i] with A[k]
    i = i + 1
    swap a[i] with a[r]
    return i

RANDOMIZED-PARTITION(A, p, r)
    x = RANDOM(p, r)
    swap A[x] with A[r]
    return PARTITION(A, p, r)

RANDOMIZED-SELECT(A, p, r, i)
    while true
        if p == r
            return p, A[p]
        q = RANDOMIZED-PARTITION(A, p, r)
        k = q - p + 1
        if i == k
            return q, A[q]
        if i < k
            r = q
        else
            p = q + 1
            i = i - k

k-QUANTITLES-SUB(A, p, r, pos, f, e, quantiles)
    if f + 1 > e
        return
    mid = (f + e) / 2
    q, val = RANDOMIZED-SELECT(A, p, r, pos[mid])
    quantiles[mid] = val
    k = q - p + 1
    for i = mid + 1 to e
        pos[i] = pos[i] - k
    k-QUANTILES-SUB(A, q + 1, r, pos, mid + 1, e, quantiles)

k-QUANTITLES(A, k)
    num = A.size() / k
    mod = A.size() % k
    pos = num[1..k]
    for i = 1 to mod
        pos[i] = pos[i] + 1
    for i = 1 to k
        pos[i] = pos[i] + pos[i - 1]
    quantiles = [1..k]
    k-QUANTITLES-SUB(A, 0, A.length, pos, 0, pos.size(), quantiles)
    return quantiles

9.3-7

Describe an $O(n)$-time algorithm that, given a set $S$ of $n$ distinct numbers and a positive integer $k \le n$, determines the $k$ numbers in $S$ that are closest to the median of $S$.

Find the median in $O(n)$; create a new array, each element is the absolute value of the original value subtract the median; find the $k$th smallest number in $O(n)$, then the desired values are the elements whose absolute difference with the median is less than or equal to the $k$th smallest number in the new array.

9.3-8

Let $X[1..n]$ and $Y[1..n]$ be two arrays, each containing $n$ numbers already in sorted order. Give an $O(\lg n)$-time algorithm to find the median of all $2n$ elements in arrays $X$ and $Y$.

Without loss of generality, assume $n$ is a power of $2$.

MEDIAN(X, Y, n)
    if n == 1
        return min(X[1], Y[1])
    if X[n / 2] < Y[n / 2]
        return MEDIAN(X[n / 2 + 1..n], Y[1..n / 2], n / 2)
    return MEDIAN(X[1..n / 2], Y[n / 2 + 1..n], n / 2)

9.3-9

Professor Olay is consulting for an oil company, which is planning a large pipeline running east to west through an oil field of $n$ wells. The company wants to connect a spur pipeline from each well directly to the main pipeline along a shortest route (either north or south), as shown in Figure 9.2. Given the $x$- and $y$-coordinates of the wells, how should the professor pick the optimal location of the main pipeline, which would be the one that minimizes the total length of the spurs? Show how to determine the optimal location in linear time.

If $n$ is odd, we pick the $y$ coordinate of the main pipeline to be equal to the median of all the $y$ coordinates of the wells.
If $n$ is even, we pick the $y$ coordinate of the pipeline to be anything between the $y$ coordinates of the wells with $y$-coordinates which have order statistics $\lfloor (n + 1) / 2 \rfloor$ and the $\lceil (n + 1) / 2 \rceil$. These can all be found in linear time using the algorithm from this section.