11-2 Slot-size bound for chaining

Suppose that we have a hash table with $n$ slots, with collisions resolved by chaining, and suppose that $n$ keys are inserted into the table. Each key is equally likely to be hashed to each slot. Let $M$ be the maximum number of keys in any slot after all the keys have been inserted. Your mission is to prove an $O(\lg n / \lg\lg n)$ upper bound on $\text E[M]$, the expected value of $M$.

a. Argue that the probability $Q_k$ that exactly $k$ keys hash to a particular slot is given by

$$Q_k = \bigg(\frac{1}{n} \bigg)^k \bigg(1 - \frac{1}{n} \bigg)^{n - k} \binom{n}{k}.$$

b. Let $P_k$ be the probability that $M = k$, that is, the probability that the slot containing the most keys contains $k$ keys. Show that $P_k \le n Q_k$.

c. Use Stirling's approximation, equation $\text{(3.18)}$, to show that $Q_k < e^k / k^k$.

d. Show that there exists a constant $c > 1$ such that $Q_{k_0} < 1 / n^3$ for $k_0 = c\lg n / \lg\lg n$. Conclude that $P_k < 1 / n^2$ for $k \ge k_0 = c\lg n / \lg\lg n$.

e. Argue that

$$\text E[M] \le \Pr\bigg\{M > \frac{c\lg n}{\lg\lg n}\bigg\} \cdot n + \Pr\bigg\{M \le \frac{c\lg n}{\lg\lg n}\bigg\} \cdot \frac{c\lg n}{\lg\lg n}.$$

Conclude that $\text E[M] = O(\lg n / \lg\lg n)$.

a. A particular key is hashed to a particular slot with probability $1 / n$. Suppose we select a specific set of $k$ keys. The probability that these $k$ keys are inserted into the slot in question and that all other keys are inserted elsewhere is

$$\Big(\frac{1}{n}\Big)^k \Big(1 - \frac{1}{n}\Big)^{n - k}.$$

Since there are $\binom{n}{k}$ ways to choose our $k$ keys, we get

$$Q_k = \Big(\frac{1}{n}\Big)^k \Big(1 - \frac{1}{n}\Big)^{n - k} \binom{n}{k}.$$

b. For $i = 1, 2, \ldots, n$, let $X_i$ be a random variable denoting the number of keys that hash to slot $i$, and let $A_i$ be the event that $X_i = k$, i.e., that exactly $k$ keys hash to slot $i$. From part (a), we have $\Pr\{A\} = Q_k$. Then,

$$ \begin{aligned} P_k & = \Pr\{M = k\} \\ & = \Pr\Big\{\Big(\max_{1 \le i \le n} X_i\Big) = k\Big\} \\ & = \Pr\{\text{there exists $i$ such that $X_i = k$ and that $X_i\le k$ for $i = 1, 2, \ldots, n$}\} \\ & \le \Pr\{\text{there exists $i$ such that $X_i = k$}\} \\ & = \Pr\{A_1 \cup A_2 \cup \cdots \cup A_n\} \\ & \le \Pr\{A_1\} + \Pr\{A_2\} + \cdots + \Pr\{A_n\} \qquad \text{(by inequality (C.19))} \\ & = nQ_k. \end{aligned} $$

c. We start by showing two facts. First, $1 - 1 / n < 1$, which implies $(1 - 1 / n)^{n - k} < 1$. Second, $n! / (n - k)! = n \cdot (n - 1) \cdot (n - 2) \cdots (n - k + 1) < n^k$. Using these facts, along with the simplification $k! > (k / e)^k$ of equation $\text{(3.18)}$, we have

$$ \begin{aligned} Q_k & = \Big(\frac{1}{n}\Big)^k \Big(1 - \frac{1}{n}\Big)^{n - k} \frac{n!}{k!(n - k)!} \\ & < \frac{n!}{n^k k! (n - k)!} & ((1 - 1 / n)^{n - k} < 1) \\ & < \frac{1}{k!} & (n! / (n - k)! < n^k) \\ & < \frac{e^k}{k^k}. & (k! > (k / e)^k) \end{aligned} $$

d. Notice that when $n = 2$, $\lg\lg n = 0$, so to be precise, we need to assume that $n \ge 3$.

In part (c), we showed that $Q_k < e^k / k^k$ for any $k$; in particular, this inequality holds for $k_0$. Thus, it suffices to show that $e^{k_0} / k_0^{k_0} < 1 / n^3$ or, equivalently, that $n^3 < k_0^{k_0} / e^{k_0}$.

Taking logarithms of both sides gives an equivalent condition:

$$ \begin{aligned} 3\lg n & < k_0(\lg k_0 - \lg e) \\ & = \frac{c\lg n}{\lg\lg n}(\lg c + \lg\lg n - \lg\lg\lg n - \lg e). \end{aligned} $$

Dividing both sides by $\lg n$ gives the condition

$$ \begin{aligned} 3 & < \frac{c}{\lg\lg n} (\lg c + \lg\lg n - \lg\lg\lg n - \lg e) \\ & = c \Big(1 + \frac{\lg c - \lg e}{\lg\lg n} - \frac{\lg\lg\lg n}{\lg\lg n}\Big). \end{aligned} $$

Let $x$ be the last expression in parentheses:

$$x = \Big(1 + \frac{\lg c - \lg e}{\lg\lg n} - \frac{\lg\lg\lg n}{\lg\lg n}\Big).$$

We need to show that there exists a constant $c > 1$ such that $3 < cx$.

Noting that $\lim_{n \to \infty} x = 1$, we see that there exists $n_0$ such that $x \ge 1 / 2$ for all $n \ge n_0$. Thus, any constant $c > 6$ works for $n \ge n_0$.

We handle smaller values of $n$—in particular, $3 \le n < n_0$—as follows. Since $n$ is constrained to be an integer, there are a finite number of n in the range $3 \le n < n_0$. We can evaluate the expression $x$ for each such value of $n$ and determine a value of $c$ for which $3 < cx$ for all values of $n$. The final value of $c$ that we use is the larger of

  • $6$, which works for all $n \ge n_0$, and
  • $\max_{3 \le n \le n_0}\{c: 3 < cx\}$, i.e., the largest value of $c$ that we chose for the range $3 \le n < n_0$.

Thus, we have shown that $Q_{k_0} < 1 / n^3$, as desired.

To see that $P_k < 1 / n^2$ for $k \ge k_0$, we observe that by part (b), $P_k \le nQ_k$ for all $k$. Choosing $k = k_0$ gives $P_{k_0} \le nQ_{k_0} < n \cdot (1 / n^3) = 1 / n^2$. For $k > k_0$, we will show that we can pick the constant $c$ such that $Q_k < 1 / n^3$ for all $k \ge k_0$, and thus conclude that $P_k < 1 / n^2$ for all $k \ge k_0$.

To pick $c$ as required, we let $c$ be large enough that $k_0 > 3 > e$. Then $e / k < 1$ for all $k \ge k_0$, and so $e^k / k^k$ decreases as $k$ increases. Thus,

$$ \begin{aligned} Q_k & < e^k / k^k \\ & \le e^{k_0} / k^{k_0} \\ & < 1 / n^3 \end{aligned} $$

for $k \ge k_0$.

e. The expectation of $M$ is

$$ \begin{aligned} \text E[M] & = \sum_{k = 0}^n k \cdot \Pr\{M = k\} \\ & = \sum_{k = 0}^{k_0} k \cdot \Pr\{M = k\} + \sum_{k = k_0 + 1}^n k \cdot \Pr\{M = k\} \\ & \le \sum_{k = 0}^{k_0} k_0 \cdot \Pr\{M = k\} + \sum_{k = k_0 + 1}^n n \cdot \Pr\{M = k\} \\ & \le k_0 \sum_{k = 0}^{k_0} \Pr\{M = k\} + n \sum_{k = k_0 + 1}^n \Pr\{M = k\} \\ & = k_0 \cdot \Pr\{M \le k_0\} + n \cdot \Pr\{M > k_0\}, \end{aligned} $$

which is what we needed to show, since $k_0 = c \lg n / \lg\lg n$.

To show that $\text E[M] = O(\lg n / \lg\lg n)$, note that $\Pr\{M \le k_0\} \le 1$ and

$$ \begin{aligned} \Pr\{M > k_0\} & = \sum_{k = k_0 + 1}^n \Pr\{M = k\} \\ & = \sum_{k = k_0 + 1}^n P_k \\ & < \sum_{k = k_0 + 1}^n 1 / n^2 & \text{(by part (d))} \\ & < n \cdot (1 / n^2) \\ & = 1 / n. \end{aligned} $$

We conclude that

$$ \begin{aligned} \text E[M] & \le k_0 \cdot 1 + n \cdot (1 / n) \\ & = k_0 + 1 \\ & = O(\lg n / \lg\lg n). \end{aligned} $$