11-1 Longest-probe bound for hashing

Suppose that we use an open-addressed hash table of size $m$ to store $n \le m / 2$ items.

a. Assuming uniform hashing, show that for $i = 1, 2, \ldots, n$, the probability is at most $2^{-k}$ that the $i$th insertion requires strictly more than $k$ probes.

b. Show that for $i = 1, 2, \ldots, n$, the probability is $O(1 / n^2)$ that the $i$th insertion requires more than $2\lg n$ probes.

Let the random variable $X_i$ denote the number of probes required by the $i$th insertion. You have shown in part (b) that $\Pr\{X_i > 2\lg n\} = O(1 / n^2)$. Let the random variable $X = \max_{1 \le i \le n} X_i$ denote the maximum number of probes required by any of the $n$ insertions.

c. Show that $\Pr\{X > 2\lg n\} = O(1 / n)$.

d. Show that the expected length $\text E[X]$ of the longest probe sequence is $O(\lg n)$.

a. Since we assume uniform hashing, we can use the same observation as is used in Corollary 11.7: that inserting a key entails an unsuccessful search followed by placing the key into the first empty slot found. As in the proof of Theorem 11.6, if we let $X$ be the random variable denoting the number of probes in an unsuccessful search, then $\Pr\{X \ge i\} \le \alpha^{i - 1}$. Since $n \le m / 2$, we have $\alpha \le 1 / 2$. Letting $i = k + 1$, we have $\Pr\{X > k\} = \Pr\{X \ge k + 1\} \le (1 / 2)^{(k + 1) - 1} = 2^{-k}$.

b. Substituting $k = 2\lg n$ into the statement of part (a) yields that the probability that the $i$th insertion requires more than $k = 2\lg n$ probes is at most $2^{-2\lg n} = (2^{\lg n})^{-2} = n^{-2} = 1 / n^2$.

We must deal with the possibility that $2\lg n$ is not an integer, however. Then the event that the $i$th insertion requires more than $2\lg n$ probes is the same as the event that the $i$th insertion requires more than $\lfloor 2\lg n \rfloor$ probes. Since $\lfloor 2\lg n \rfloor > 2\lg n - 1$, we have that the probability of this event is at most $2^{-\lfloor 2\lg n \rfloor} < 2^{-(2\lg n - 1)} = 2 / n^2 = O(1 / n^2)$.

c. Let the event $A$ be $X > 2\lg n$, and for $i = 1, 2, \ldots, n$, let the event $A_i$ be $X_i > 2\lg n$. In part (b), we showed that $\Pr\{A_i\} = O(1 / n^2)$ for $i = 1, 2, \ldots, n$. From how we defined these events, $A = A_1 \cup A_2 \cup \cdots \cup A_n$. Using Boole's inequality, $\text{(C.19)}$, we have

$$ \begin{aligned} \Pr\{A\} & \le \Pr\{A_1\} + \Pr\{A_1\} + \cdots + \Pr\{A_n\} \\ & \le n \cdot O(1 / n^2) \\ & = O(1 / n). \end{aligned} $$

d. We use the definition of expectation and break the sum into two parts:

$$ \begin{aligned} \text E[X] & = \sum_{k = 1}^n k \cdot \Pr\{X = k\} \\ & = \sum_{k = 1}^{\lceil 2\lg n \rceil} k \cdot \Pr\{X = k\} + \sum_{\lceil 2\lg n \rceil + 1}^n k \cdot \Pr\{X = k\} \\ & \le \sum_{k = 1}^{\lceil 2\lg n \rceil} \lceil 2\lg n \rceil \cdot \Pr\{X = k\} + \sum_{\lceil 2\lg n \rceil + 1}^n n \cdot \Pr\{X = k\} \\ & = \lceil 2\lg n \rceil \sum_{k = 1}^{\lceil 2\lg n \rceil} \Pr\{X = k\} + n \sum_{\lceil 2\lg n \rceil + 1}^n \Pr\{X = k\}. \end{aligned} $$

Since $X$ takes on exactly one value, we have that $\sum_{k = 1}^{\lceil 2\lg n \rceil} \Pr\{X = k\} = \Pr\{X \le \lceil 2\lg n \rceil\} \le 1$ and $\sum_{k = \lceil 2\lg n \rceil + 1}^n \Pr\{X = k\} \le \Pr\{X > 2\lg n\} = O(1 / n)$, by part (c). Therefore,

$$ \begin{aligned} \text E[X] & \le \lceil 2\lg n \rceil \cdot 1 + n \cdot O(1 / n) \\ & = \lceil 2\lg n \rceil + O(1) \\ & = O(\lg n). \end{aligned} $$