Skip to content

8.4 Bucket sort

8.4-1

Using Figure 8.4 as a model, illustrate the operation of $\text{BUCKET-SORT}$ on the array $A = \langle .79, .13, .16, .64, .39, .20, .89, .53, .71, .42 \rangle$.

$$ \begin{array}{cl} R & \\ \hline 0 & \\ 1 & .13 .16 \\ 2 & .20 \\ 3 & .39 \\ 4 & .42 \\ 5 & .53 \\ 6 & .64 \\ 7 & \\ 8 & .79 .71 \\ 9 & .89 \\ \end{array} $$

$$A = \langle.13, .16, .20, .39, .42, .53, .64, .71, .79, .89 \rangle.$$

8.4-2

Explain why the worst-case running time for bucket sort is $\Theta(n^2)$. What simple change to the algorithm preserves its linear average-case running time and makes its worst-case running time $O(n\lg n)$?

The worst-case running time for the bucket-sort algorithm occurs when the assumption of uniformly distributed input does not hold. If, for example, all the input ends up in the first bucket, then in the insertion sort phase it needs to sort all the input, which takes $O(n^2)$.

A simple change that will preserve the linear expected running time and make the worst-case running time $O(n\lg n)$ is to use a worst-case $O(n\lg n)$-time algorithm, such as merge sort, instead of insertion sort when sorting the buckets.

8.4-3

Let $X$ be a random variable that is equal to the number of heads in two flips of a fair coin. What is $\text E[X^2]$? What is $\text E^2[X]$?

$$ \begin{aligned} \text E[X] & = 2 \cdot \frac{1}{4} + 1 \cdot \frac{1}{2} + 0 \cdot \frac{1}{4} = 1 \\ \text E[X^2] & = 4 \cdot \frac{1}{4} + 1 \cdot \frac{1}{2} + 0 \cdot \frac{1}{4} = 1.5 \\ \text E^2[X] & = \text E[X] \cdot \text E[X] = 1 \cdot 1 = 1. \end{aligned} $$

8.4-4 $\star$

We are given $n$ points in the unit circle, $p_i = (x_i, y_i)$, such that $0 < x_i^2 + y_i^2 \le 1$ for $i = 1, 2, \ldots, n$. Suppose that the points are uniformly distributed; that is, the probability of finding a point in any region of the circle is proportional to the area of that region. Design an algorithm with an average-case running time of $\Theta(n)$ to sort the $n$ points by their distances $d_i = \sqrt{x_i^2 + y_i^2}$ from the origin. ($\textit{Hint:}$ Design the bucket sizes in $\text{BUCKET-SORT}$ to reflect the uniform distribution of the points in the unit circle.)

Bucket sort by radius,

$$ \begin{aligned} \pi r_i^2 & = \frac{i}{n} \cdot \pi 1^2 \\ r_i & = \sqrt{\frac{i}{n}}. \end{aligned} $$

8.4-5 $\star$

A probability distribution function $P(x)$ for a random variable $X$ is defined by $P(x) = \Pr\{X \le x\}$. Suppose that we draw a list of $n$ random variables $X_1, X_2, \ldots, X_n$ from a continuous probability distribution function $P$ that is computable in $O(1)$ time. Give an algorithm that sorts these numbers in linear average-case time.

Bucket sort by $p_i$, so we have $n$ buckets: $[p_0, p_1), [p_1, p_2), \dots, [p_{n - 1}, p_n)$. Note that not all buckets are the same size, which is ok as to ensure linear run time, the inputs should on average be uniformly distributed amongst all buckets, of which the intervals defined with $p_i$ will do so.

$p_i$ is defined as follows:

$$P(p_i) = \frac{i}{n}.$$