11.2 Hash tables

11.2-1

Suppose we use a hash function $h$ to hash $n$ distinct keys into an array $T$ of length $m$. Assuming simple uniform hashing, what is the expected number of collisions? More precisely, what is the expected cardinality of $\{\{k, l\}: k \ne l \text{ and } h(k) = h(l)\}$?

For each pair of keys $k$, $l$, where $k \ne l$, define the indicator random variable $X_{kl} = \text I\{h(k) = h(l)\}$. Since we assume simple uniform hashing, $\Pr\{X_{kl} = 1\} = \Pr\{h(k) = h(l)\} = 1 / m$, and so $\text E[X_{kl}] = 1 / m$.

Now define the random variable $Y$ to be the total number of collisions, so that $Y = \sum_{k \ne l} X_{kl}$. The expected number of collisions is

\begin{aligned} \text E[Y] & = \text E\bigg[\sum_{k \ne l} X_{kl}\bigg] \\ & = \sum_{k \ne l} \text E[X_{kl}] & \text{(linearity of expectation)} \\ & = \binom{n}{2}\frac{1}{m} \\ & = \frac{n(n - 1)}{2} \cdot \frac{1}{m} \\ & = \frac{n(n - 1)}{2m}. \end{aligned}

11.2-2

Demonstrate what happens when we insert the keys $5, 28, 19, 15, 20, 33, 12, 17, 10$ into a hash table with collisions resolved by chaining. Let the table have $9$ slots, and let the hash function be $h(k) = k \mod 9$.

Let us number our slots $0, 1, \dots, 8$.

Then our resulting hash table will look like following:

$$\begin{array}{c|l} h(k) & \text{keys} \\ \hline 0 \mod 9 & \\ 1 \mod 9 & 28 \to 19 \to 10 \\ 2 \mod 9 & 20 \\ 3 \mod 9 & 12 \\ 4 \mod 9 & \\ 5 \mod 9 & 5 \\ 6 \mod 9 & 15 \to 33 \\ 7 \mod 9 & \\ 8 \mod 9 & 17 \end{array}$$

11.2-3

Professor Marley hypothesizes that he can obtain substantial performance gains by modifying the chaining scheme to keep each list in sorted order. How does the professor's modification affect the running time for successful searches, unsuccessful searches, insertions, and deletions?

• Successful searches: no difference, $\Theta(1 + \alpha)$.
• Unsuccessful searches: faster but still $\Theta(1 + \alpha)$.
• Insertions: same as successful searches, $\Theta(1 + \alpha)$.
• Deletions: same as successful searches, $\Theta(1 + \alpha)$.

11.2-4

Suggest how to allocate and deallocate storage for elements within the hash table itself by linking all unused slots into a free list. Assume that one slot can store a flag and either one element plus a pointer or two pointers. All dictionary and free-list operations should run in $O(1)$ expected time. Does the free list need to be doubly linked, or does a singly linked free list suffice?

The flag in each slot will indicate whether the slot is free.

• A free slot is in the free list, a doubly linked list of all free slots in the table. The slot thus contains two pointers.
• A used slot contains an element and a pointer (possibly $\text{NIL}$) to the next element that hashes to this slot. (Of course, that pointer points to another slot in the table.)

Operations

• Insertion:

• If the element hashes to a free slot, just remove the slot from the free list and store the element there (with a $\text{NIL}$ pointer). The free list must be doubly linked in order for this deletion to run in $O(1)$ time.
• If the element hashes to a used slot $j$, check whether the element $x$ already there "belongs" there (its key also hashes to slot $j$).

• If so, add the new element to the chain of elements in this slot. To do so, allocate a free slot (e.g., take the head of the free list) for the new element and put this new slot at the head of the list pointed to by the hashed-to slot ($j$).
• If not, $E$ is part of another slot's chain. Move it to a new slot by allocating one from the free list, copying the old slot's ($j$'s) contents (element $x$ and pointer) to the new slot, and updating the pointer in the slot that pointed to $j$ to point to the new slot. Then insert the new element in the now-empty slot as usual.

To update the pointer to $j$, it is necessary to find it by searching the chain of elements starting in the slot $x$ hashes to.

• Deletion: Let $j$ be the slot the element $x$ to be deleted hashes to.

• If $x$ is the only element in $j$ ($j$ doesn't point to any other entries), just free the slot, returning it to the head of the free list.
• If $x$ is in $j$ but there's a pointer to a chain of other elements, move the first pointed-to entry to slot $j$ and free the slot it was in.
• If $x$ is found by following a pointer from $j$, just free $x$'s slot and splice it out of the chain (i.e., update the slot that pointed to $x$ to point to $x$'s successor).
• Searching: Check the slot the key hashes to, and if that is not the desired element, follow the chain of pointers from the slot. All the operations take expected $O(1)$ times for the same reason they do with the version in the book: The expected time to search the chains is $O(1 + \alpha)$ regardless of where the chains are stored, and the fact that all the elements are stored in the table means that $\alpha \le 1$. If the free list were singly linked, then operations that involved removing an arbitrary slot from the free list would not run in $O(1)$ time.

11.2-5

Suppose that we are storing a set of $n$ keys into a hash table of size $m$. Show that if the keys are drawn from a universe $U$ with $|U| > nm$, then $U$ has a subset of size $n$ consisting of keys that all hash to the same slot, so that the worst-case searching time for hashing with chaining is $\Theta(n)$.

Suppose the $m - 1$ slots contains at most $n - 1$ elements, then the remaining slot should have

$$|U| - (m - 1)(n - 1) > nm - (m - 1)(n - 1) = n + m - 1 \ge n$$

elements, thus $U$ has a subset of size $n$.

11.2-6

Suppose we have stored $n$ keys in a hash table of size $m$, with collisions resolved by chaining, and that we know the length of each chain, including the length $L$ of the longest chain. Describe a procedure that selects a key uniformly at random from among the keys in the hash table and returns it in expected time $O(L \cdot (1 + 1 / \alpha))$.

We can view the hash table as if it had $m$ rows and $L$ columns; each row stores one chain. The array has $mL$ entries storing $n$ keys, and $mL - n$ empty values. The procedure picks array positions at random until it finds a key, which it returns. The probability of success on one draw is $n / mL$, so $mL / n = L / \alpha$ trials are needed. Each trial takes time $O(1)$, since the individual chain sizes are known. The chain for the last draw needs to be scanned to find the desired element, however, costing $O(L)$.