21.4 Analysis of union by rank with path compression


Prove Lemma 21.4.

The lemma states:

For all nodes $x$, we have $x.rank \le x.p.rank$, with strict inequality if $x \ne x.p$. The value of $x.rank$ is initially $0$ and increases through time until $x \ne x.p$; from then on, $x.rank$ does not change. The value of $x.p.rank$ monotonically increases over time.

The initial value of $x.rank$ is $0$, as it is initialized in line 2 of the $\text{MAKE-SET}(x)$ procedure. When we run $\text{LINK}(x, y)$, whichever one has the larger rank is placed as the parent of the other, and if there is a tie, the parent's rank is incremented. This means that after any $\text{LINK}(y, x)$, the two nodes being linked satisfy this strict inequality of ranks.

Also, if we have that $x \ne x.p$, then, we have that $x$ is not its own set representative, so, any linking together of sets that would occur would not involve $x$, but that's the only way for ranks to increase, so, we have that $x.rank$ must remain constant after that point.


Prove that every node has rank at most $\lfloor \lg n \rfloor$.

We'll prove the claim by strong induction on the number of nodes. If $n = 1$, then that node has rank equal to $0 = \lfloor \lg 1 \rfloor$. Now suppose that the claim holds for $1, 2, \ldots, n$ nodes.

Given $n + 1$ nodes, suppose we perform a $\text{UNION}$ operation on two disjoint sets with $a$ and $b$ nodes respectively, where $a, b \le n$. Then the root of the first set has rank at most $\lfloor \lg a \rfloor$ and the root of the second set has rank at most $\lfloor \lg b\rfloor$.

If the ranks are unequal, then the $\text{UNION}$ operation preserves rank and we are done, so suppose the ranks are equal. Then the rank of the union increases by $1$, and the resulting set has rank $\lfloor\lg a\rfloor + 1 \le\lfloor\lg(n + 1) / 2\rfloor + 1 = \lfloor\lg(n + 1)\rfloor$.


In light of Exercise 21.4-2, how many bits are necessary to store $x.rank$ for each node $x$?

Since their value is at most $\lfloor \lg n \rfloor$, we can represent them using $\Theta(\lg(\lg(n)))$ bits, and may need to use that many bits to represent a number that can take that many values.


Using Exercise 21.4-2, give a simple proof that operations on a disjoint-set forest with union by rank but without path compression run in $O(m\lg n)$ time.

$\text{MAKE-SET}$ takes constant time and both $\text{FIND-SET}$ and $\text{UNION}$ are bounded by the largest rank among all the sets. Exercise 21.4-2 bounds this from about by $\lceil \lg n \rceil$, so the actual cost of each operation is $O(\lg n)$. Therefore the actual cost of $m$ operations is $O(m\lg n)$.


Professor Dante reasons that because node ranks increase strictly along a simple path to the root, node levels must monotonically increase along the path. In other words, if $x.rank > 0$ and $x.p$ is not a root, then $\text{level}(x) \le \text{level}(x.p)$. Is the professor correct?

Professor Dante is not correct.

Suppose that we had that $x.p.rank > A_2(x.rank)$ but that $x.p.p.rank = 1 + x.p.rank$, then we would have that $\text{level}(x.p) = 0$, but $\text{level}(x) \ge 2$. So, we don't have that $\text{level}(x) \le \text{level}(x.p)$ even though we have that the ranks are monotonically increasing as we go up in the tree. Put another way, even though the ranks are monotonically increasing, the rate at which they are increasing (roughly captured by the level values) doesn't have to be increasing.

21.4-6 $\star$

Consider the function $\alpha'(n) = \min \{k: A_k(1) \ge \lg(n + 1)\}$. Show that $\alpha'(n) \le 3$ for all practical values of $n$ and, using Exercise 21.4-2, show how to modify the potential-function argument to prove that we can perform a sequence of $m$ $\text{MAKE-SET}$, $\text{UNION}$, and $\text{FIND-SET}$ operations, $n$ of which are $\text{MAKE-SET}$ operations, on a disjoint-set forest with union by rank and path compression in worst-case time $O(m \alpha'(n))$.

First, observe that by a change of variables, $\alpha'(2^{n − 1}) = \alpha(n)$. Earlier in the section we saw that $\alpha(n) \le 3$ for $0 \le n \le 2047$. This means that $\alpha'(n) \le 2$ for $0 \le n \le 2^{2046}$, which is larger than the estimated number of atoms in the observable universe.

To prove the improved bound $O(m\alpha'(n))$ on the operations, the general structure will be essentially the same as that given in the section.

First, modify bound 21.2 by observing that $A_{\alpha'(n)}(x.rank) \ge A_{\alpha'(n)}(1) \ge \lg(n + 1) > x.p.rank$ which implies $\text{level}(x) \le \alpha'(n)$.

Next, redefine the potential replacing $\alpha(n)$ by $\alpha'(n)$. Lemma 21.8 now goes through just as before. All subsequent lemmas rely on these previous observations, and their proofs go through exactly as in the section, yielding the bound.