Skip to content

21.4 Analysis of union by rank with path compression


Prove Lemma 21.4.

The lemma states:

For all nodes $x$, we have $x.rank \le x.p.rank$, with strict inequality if $x \ne x.p$. The value of $x.rank$ is initially $0$ and increases through time until $x \ne x.p$; from then on, $x.rank$ does not change. The value of $x.p.rank$ monotonically increases over time.

The initial value of $x.rank$ is $0$, as it is initialized in line 2 of the $\text{MAKE-SET}(x)$ procedure. When we run $\text{LINK}(x, y)$, whichever one has the larger rank is placed as the parent of the other, and if there is a tie, the parent's rank is incremented. This means that after any $\text{LINK}(y, x)$, the two nodes being linked satisfy this strict inequality of ranks.

Also, if we have that $x \ne x.p$, then, we have that $x$ is not its own set representative, so, any linking together of sets that would occur would not involve $x$, but that's the only way for ranks to increase, so, we have that $x.rank$ must remain constant after that point.


Prove that every node has rank at most $\lfloor \lg n \rfloor$.

We'll prove the claim by strong induction on the number of nodes. If $n = 1$, then that node has rank equal to $0 = \lfloor \lg 1 \rfloor$. Now suppose that the claim holds for $1, 2, \ldots, n$ nodes.

Given $n + 1$ nodes, suppose we perform a $\text{UNION}$ operation on two disjoint sets with $a$ and $b$ nodes respectively, where $a, b \le n$. Then the root of the first set has rank at most $\lfloor \lg a \rfloor$ and the root of the second set has rank at most $\lfloor \lg b\rfloor$.

If the ranks are unequal, then the $\text{UNION}$ operation preserves rank and we are done, so suppose the ranks are equal. Then the rank of the union increases by $1$, and the resulting set has rank $\lfloor\lg a\rfloor + 1 \le\lfloor\lg(n + 1) / 2\rfloor + 1 = \lfloor\lg(n + 1)\rfloor$.


In light of Exercise 21.4-2, how many bits are necessary to store $x.rank$ for each node $x$?

Since their value is at most $\lfloor \lg n \rfloor$, we can represent them using $\Theta(\lg(\lg(n)))$ bits, and may need to use that many bits to represent a number that can take that many values.


Using Exercise 21.4-2, give a simple proof that operations on a disjoint-set forest with union by rank but without path compression run in $O(m\lg n)$ time.

Clearly, each $\text{MAKE-SET}$ and $\text{LINK}$ operation takes $O(1)$ time. Because the rank of a node is an upper bound on its height, each find path has length $O(\lg n)$, which in turn implies that each $\text{FIND-SET}$ takes $O(\lg n)$ time. Thus, any sequence of $m$ $\text{MAKE-SET}$, $\text{LINK}$, and $\text{FIND-SET}$ operations on $n$ elements takes $O(m\lg n)$ time. It is easy to prove an analogue of Lemma 21.7 to show that if we convert a sequence of $m'$ $\text{MAKE-SET}$, $\text{UNION}$, and $\text{FIND-SET}$ operations into a sequence of $m$ $\text{MAKE-SET}$, $\text{LINK}$, and $\text{FIND-SET}$ operations that take $O(m\lg n)$ time, then the sequence of $m'$ $\text{MAKE-SET}$, $\text{UNION}$, and $\text{FIND-SET}$ operations takes $O(m'\lg n)$ time.


Professor Dante reasons that because node ranks increase strictly along a simple path to the root, node levels must monotonically increase along the path. In other words, if $x.rank > 0$ and $x.p$ is not a root, then $\text{level}(x) \le \text{level}(x.p)$. Is the professor correct?

Professor Dante is mistaken. Take the following scenario. Let $n = 16$, and make $16$ separate singleton sets using $\text{MAKE-SET}$. Then do $8$ $\text{UNION}$ operations to link the sets into $8$ pairs, where each pair has a root with rank $0$ and a child with rank $1$. Now do $4$ $\text{UNION}$s to link pairs of these trees, so that there are $4$ trees, each with a root of rank $2$, children of the root of ranks $1$ and $0$, and a node of rank $0$ that is the child of the rank-$1$ node. Now link pairs of these trees together, so that there are two resulting trees, each with a root of rank $3$ and each containing a path from a leaf to the root with ranks $0$, $1$, and $3$. Finally, link these two trees together, so that there is a path from a leaf to the root with ranks $0$, $1$, $3$, and $4$. Let $x$ and $y$ be the nodes on this path with ranks $1$ and $3$, respectively. Since $A_1(1) = 3$, $\text{level}(x) = 1$, and since $A_0(3) = 4$, $\text{level}(y) = 0$. Yet $y$ follows $x$ on the find path.

21.4-6 $\star$

Consider the function $\alpha'(n) = \min \{k: A_k(1) \ge \lg(n + 1)\}$. Show that $\alpha'(n) \le 3$ for all practical values of $n$ and, using Exercise 21.4-2, show how to modify the potential-function argument to prove that we can perform a sequence of $m$ $\text{MAKE-SET}$, $\text{UNION}$, and $\text{FIND-SET}$ operations, $n$ of which are $\text{MAKE-SET}$ operations, on a disjoint-set forest with union by rank and path compression in worst-case time $O(m \alpha'(n))$.

First, $\alpha'(2^{2047} - 1) = \min\{k: A_k(1) \ge 2047\} = 3$, and $2^{2047} - 1 \gg 10^{80}$.

Second, we need that $0 \le \text{level}(x) \le \alpha'(n)$ for all nonroots $x$ with $x.rank \ge 1$. With this definition of $\alpha'(n)$, we have

$$A_{\alpha'(n)}(x.rank) \ge A_{\alpha'(n)}(1) \ge \lg(n + 1) > \lg n \ge x.p.rank.$$

The rest of the proof goes through with $\alpha'(n)$ replacing $\alpha(n)$.