Skip to content

7.4 Analysis of quicksort

7.4-1

Show that in the recurrence

$$ \begin{aligned} T(n) & = \max\limits_{0 \le q \le n - 1} (T(q) + T(n - q - 1)) + \Theta(n), \\ T(n) & = \Omega(n^2). \end{aligned} $$

We guess $T(n) \ge cn^2 - 2n$,

$$ \begin{aligned} T(n) & = \max_{0 \le q \le n - 1} (T(q) + T(n - q - 1)) + \Theta(n) \\ & \ge \max_{0 \le q \le n - 1} (cq^2 - 2q + c(n - q - 1)^2 - 2n - 2q -1) + \Theta(n) \\ & \ge c\max_{0 \le q \le n - 1} (q^2 + (n - q - 1)^2 - (2n + 4q + 1) / c) + \Theta(n) \\ & \ge cn^2 - c(2n - 1) + \Theta(n) \\ & \ge cn^2 - 2cn + 2c & (c \le 1) \\ & \ge cn^2 - 2n. \end{aligned} $$

7.4-2

Show that quicksort's best-case running time is $\Omega(n\lg n)$.

To show that quicksort's best-case running time is $\Omega(n\lg n)$, we use a technique similar to the one used in Section 7.4.1 to show that its worst-case running time is $O(n^2)$.

Let $T(n)$ be the best-case time for the procedure $\text{QUICKSORT}$ on an input of size $n$. We have the recurrence

$$T(n) = \min\limits_{1 \le q \le n - 1} (T(q) + T(n - q - 1)) + \Theta(n).$$

We guess that $T(n) \ge cn\lg n$ for some constant $c$. Substituting this guess into the recurrence, we obtain

$$ \begin{aligned} T(n) & \ge \min_{1 \le q \le n - 1} (cq\lg q + c(n - q - 1)\lg(n - q - 1)) + \Theta(n) \\ & = c \cdot \min_{1 \le q \le n - 1} (q\lg q + (n - q - 1)\lg(n - q - 1)) + \Theta(n). \end{aligned} $$

As we'll show below, the expression $q\lg q + (n - q - 1)\lg(n - q - 1)$ achieves a minimum over the range $1 \le q \le n - 1$ when $q = n - q - 1$, or $q = (n - 1) / 2$, since the first derivative of the expression with respect to $q$ is $0$ when $q = (n - 1) / 2$ and the second derivative of the expression is positive. (It doesn't matter that $q$ is not an integer when $n$ is even, since we're just trying to determine the minimum value of a function, knowing that when we constrain $q$ to integer values, the function's value will be no lower.)

Choosing $q = (n - 1) / 2$ gives us the bound

$$ \begin{aligned} \min_{1 \le q \le n - 1} (q\lg q + (n - q - 1)\lg(n - q - 1)) & \ge \frac{n - 1}{2}\lg\frac{n - 1}{2} + \Big(n - \frac{n - 1}{2} - 1\Big)\lg\Big(n - \frac{n - 1}{2} - 1\Big) \\ & = (n - 1)\lg\frac{n - 1}{2}. \end{aligned} $$

Continuing with our bounding of $T(n)$, we obtain, for $n \ge 2$,

$$ \begin{aligned} T(n) & = c(n - 1)\lg\frac{n - 1}{2} + \Theta(n) \\ & = c(n - 1)\lg(n - 1) - c(n - 1) + \Theta(n) \\ & = cn\lg(n - 1) - c\lg(n - 1) - c(n - 1) + \Theta(n) \\ & \ge cn\lg(n / 2) - c\lg(n - 1) - c(n - 1) + \Theta(n) & \text{(since $n \ge 2$)} \\ & = cn\lg n - cn - c\lg(n - 1) - cn + c + \Theta(n) \\ & = cn\lg n - (2cn + c\lg(n - 1) - c) + \Theta(n) \\ & \ge cn\lg n, \end{aligned} $$

since we can pick the constant $c$ small enough so that the $\Theta(n)$ term dominates the quantity $2cn + c\lg(n - 1) - c$. Thus, the best-case running time of quicksort is $\Omega(n\lg n)$.

Letting $f(q) = q\lg q + (n - q - 1)\lg(n - q - 1)$, we now show how to find the minimum value of this function in the range $1 \le q \le n - 1$. We need to find the value of $q$ for which the derivative of $f$ with respect to $q$ is $0$. We rewrite this function as

$$f(q) = \frac{q\ln q + (n - q - 1)\ln(n - q - 1)}{\ln 2},$$

and so

$$ \begin{aligned} f'(q) & = \frac{d}{dq}\Big(\frac{q\ln q + (n - q - 1)\ln(n - q - 1)}{\ln 2}\Big) \\ & = \frac{\ln q + 1 - \ln(n - q - 1) - 1}{\ln 2} \\ & = \frac{\ln q - \ln(n - q - 1)}{\ln 2}. \end{aligned} $$

The derivative $f'(q)$ is $0$ when $q = n - q - 1$, or when $q = (n - 1) / 2$. To verify that $q = (n - 1) / 2$ is indeed a minimum (not a maximum or an inflection point), we need to check that the second derivative of $f$ is positive at $q = (n - 1) / 2$:

$$ \begin{aligned} f''(q) & = \frac{d}{dq}\Big(\frac{\ln q - \ln(n - q - 1)}{\ln 2}\Big) \\ & = \frac{1}{\ln 2}\Big(\frac{1}{q} + \frac{1}{n - q - 1}\Big) \\ f''\Big(\frac{n - 1}{2}\Big) & = \frac{1}{\ln 2}\Big(\frac{2}{n - 1} + \frac{2}{n - 1}\Big) \\ & = \frac{1}{\ln 2} \cdot \frac{4}{n - 1} \\ & > 0. & \text{(since $n \ge 2$)} \end{aligned} $$

7.4-3

Show that the expression $q^2 + (n - q - 1)^2$ achieves a maximum over $q = 0, 1, \ldots, n - 1$ when $q = 0$ and $q = n - 1$.

$$ \begin{aligned} f(q) & = q^2 + (n - q - 1)^2 \\ f'(q) & = 2q - 2(n - q - 1) = 4q - 2n + 2 \\ f''(q) & = 4. \\ \end{aligned} $$

$f'(q) = 0$ when $q = \frac{1}{2}n - \frac{1}{4}$. $f'(q)$ is also continious. $\forall q: f''(q) > 0$, which means that $f'(q)$ is negative left of $f'(q) = 0$ and positive right of it, which means that this is a local minima. In this case, $f(q)$ is decreasing in the beginning of the interval and increasing in the end, which means that those two points are the only candidates for a maximum in the interval.

$$ \begin{aligned} f(0) & = (n - 1)^2 \\ f(n - 1) & = (n - 1)^2 + 0^2. \end{aligned} $$

7.4-4

Show that $\text{RANDOMIZED-QUICKSORT}$'s expected running time is $\Omega(n\lg n)$.

We use the same reasoning for the expected number of comparisons, we just take in in a different direction.

$$ \begin{aligned} \text E[X] & = \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^n \frac{2}{j - i + 1} \\ & = \sum_{i = 1}^{n - 1} \sum_{k = 1}^{n - i} \frac{2}{k + 1} & (k \ge 1) \\ & \ge \sum_{i = 1}^{n - 1} \sum_{k = 1}^{n - i} \frac{2}{2k} \\ & \ge \sum_{i = 1}^{n - 1} \Omega(\lg n) \\ & = \Omega(n\lg n). \end{aligned} $$

Using the master method, we get the solution $\Theta(n\lg n)$.

7.4-5

We can improve the running time of quicksort in practice by taking advantage of the fast running time of insertion sort when its input is "nearly" sorted. Upon calling quicksort on a subarray with fewer than $k$ elements, let it simply return without sorting the subarray. After the top-level call to quicksort returns, run insertion sort on the entire array to finish the sorting process. Argue that this sorting algorithm runs in $O(nk + n\lg(n / k))$ expected time. How should we pick $k$, both in theory and practice?

In the quicksort part of the proposed algorithm, the recursion stops at level $\lg(n / k)$, which makes the expected running time $O(n\lg(n / k))$. However, this leaves $n / k$ non-sorted, non - intersecting subarrays of (maximum) length $k$.

Because of the nature of the insertion sort algorithm, it will first sort fully one such subarray before consider the next one. Thus, it has the same complexity as sorting each of those arrays, that is $\frac{n}{k}O(k^2) = O(nk)$.

In theory, if we ignore the constant factors, we need to solve

$$ \begin{aligned} & n\lg n \ge nk + n\lg{n / k} \\ \Rightarrow & \lg n \ge k + \lg n - \lg k \\ \Rightarrow & \lg k \ge k. \end{aligned} $$

Which is not possible.

If we add the constant factors, we get

$$ \begin{aligned} & c_qn\lg n \ge c_ink + c_qn\lg(n / k) \\ \Rightarrow & c_q\lg n \ge c_ik + c_q\lg n - c_q\lg k \\ \Rightarrow & \lg k \ge \frac{c_i}{c_q}k. \end{aligned} $$

Which indicates that there might be a good candidate. Furthermore, the lower-order terms should be taken into consideration too.

In practice, $k$ should be chosed by experiment.

7.4-6 $\star$

Consider modifying the $\text{PARTITION}$ procedure by randomly picking three elements from array $A$ and partitioning about their median (the middle value of the three elements). Approximate the probability of getting at worst an $\alpha$-to-$(1 - \alpha)$ split, as a function of $\alpha$ in the range $0 < \alpha < 1$.

First, for simplicity's sake, let's assume that we can pick the same element twice. Let's also assume that $0 < \alpha \le 1 / 2$.

In order to get such a split, two out of three elements need need to be in the smallest $\alpha n$ elements. The probability of having one is $\alpha n / n = \alpha$. The probability of having exactly two is $\alpha^2 - \alpha^3$. There are three ways in which two elements can be in the smallest $\alpha n$ and one way in which all three can be in the smallest $\alpha n$ so the probability of getting such a median is $3\alpha^2 - 2\alpha^3$. We will get the same split if the median is in the largest $\alpha n$. Since the two events are mutually exclusive, the probability is

$$\Pr\{\text{OK split}\} = 6\alpha^2 - 4\alpha^3 = 2\alpha^2(3 - 2\alpha).$$