21-2 Depth determination

In the depth-determination problem, we maintain a forest $\mathcal F = \{T_i\}$ of rooted trees under three operations:

$\text{MAKE-TREE}(v)$ creates a tree whose only node is $v$.

$\text{FIND-DEPTH}(v)$ returns the depth of node $v$ within its tree.

$\text{GRAFT}(r, v)$ makes node $r$, which is assumed to be the root of a tree, become the child of node $v$, which is assumed to be in a different tree than $r$ but may or may not itself be a root.

a. Suppose that we use a tree representation similar to a disjoint-set forest: $v.p$ is the parent of node $v$, except that $v.p = v$ if $v$ is a root. Suppose further that we implement $\text{GRAFT}(r, v)$ by setting $r.p = v$ and $\text{FIND-DEPTH}(v)$ by following the find path up to the root, returning a count of all nodes other than $v$ encountered. Show that the worst-case running time of a sequence of $m$ $\text{MAKE-TREE}$, $\text{FIND-DEPTH}$, and $\text{GRAFT}$ operations is $\Theta(m^2)$.

By using the union-by-rank and path-compression heuristics, we can reduce the worst-case running time. We use the disjoint-set forest $\mathcal S = \{S_i\}$, where each set $S_i$ (which is itself a tree) corresponds to a tree $T_i$ in the forest $\mathcal F$. The tree structure within a set $S_i$, however, does not necessarily correspond to that of $T_i$. In fact, the implementation of $S_i$ does not record the exact parent-child relationships but nevertheless allows us to determine any node's depth in $T_i$.

The key idea is to maintain in each node $v$ a "pseudodistance" $v.d$, which is defined so that the sum of the pseudodistances along the simple path from $v$ to the root of its set $S_i$ equals the depth of $v$ in $T_i$. That is, if the simple path from $v$ to its root in $S_i$ is $v_0, v_1, \ldots, v_k$, where $v_0 = v$ and $v_k$ is $S_i$'s root, then the depth of $v$ in $T_i$ is $\sum_{j = 0}^k v_j.d$.

b. Give an implementation of $\text{MAKE-TREE}$.

c. Show how to modify $\text{FIND-SET}$ to implement $\text{FIND-DEPTH}$. Your implementation should perform path compression, and its running time should be linear in the length of the find path. Make sure that your implementation updates pseudodistances correctly.

d. Show how to implement $\text{GRAFT}(r, v)$, which combines the sets containing $r$ and $v$, by modifying the $\text{UNION}$ and $\text{LINK}$ procedures. Make sure that your implementation updates pseudodistances correctly. Note that the root of a set $S_i$ is not necessarily the root of the corresponding tree $T_i$.

e. Give a tight bound on the worst-case running time of a sequence of $m$ $\text{MAKE-TREE}$, $\text{FIND-DEPTH}$, and $\text{GRAFT}$ operations, $n$ of which are $\text{MAKE-TREE}$ operations.

a. Denote the number of nodes by $n$, and let $n = (m + 1) / 3$, so that $m = 3n - 1$. First, perform the $n$ operations $\text{MAKE-TREE}(v_1)$, $\text{MAKE-TREE}(v_2)$, $\ldots$, $\text{MAKE-TREE}(v_n)$. Then perform the sequence of $n - 1$ $\text{GRAFT}$ operations $\text{GRAFT}(v_1, v_2)$, $\text{GRAFT}(v_2, v_3)$, $\ldots$, $\text{GRAFT}(v_n - 1, v_n)$; this sequence produces a single disjoint-set tree that is a linear chain of $n$ nodes with $v_n$ at the root and $v_1$ as the only leaf. Then perform $\text{FIND-DEPTH}(v_1)$ repeatedly, $n$ times. The total number of operations is $n + (n - 1) + n = 3n - 1 = m$.

Each $\text{MAKE-TREE}$ and $\text{GRAFT}$ operation takes $O(1)$ time. Each $\text{FIND-DEPTH}$ operation has to follow an $n$-node find path, and so each of the $n$ $\text{FIND-DEPTH}$ operations takes $\Theta(n)$ time. The total time is $n \cdot \Theta(n) + (2n - 1) \cdot O(1) = \Theta(n^2) = \Theta(m^2)$.

b. $\text{MAKE-TREE}$ is like $\text{MAKE-SET}$, except that it also sets the $d$ value to $0$:

1
2
3
4
MAKE-TREE(v)
    v.p = v
    v.rank = 0
    v.d = 0

It is correct to set $v.d$ to $0$, because the depth of the node in the single-node disjoint-set tree is $0$, and the sum of the depths on the find path for $v$ consists only of $v.d$.

c. $\text{FIND-DEPTH}$ will call a procedure $\text{FIND-ROOT}$:

1
2
3
4
5
6
FIND-ROOT(v)
    if v.p != v.p.p
        y = v.p
        v.p = FIND-ROOT(y)
        v.d = v.d + y.d
    return v.p
1
2
3
4
5
FIND-DEPTH(v)
    FIND-ROOT(v)    // no need to save the return value
    if v == v.p
        return v.d
    else return v.d + v.p.d

$\text{FIND-ROOT}$ performs path compression and updates pseudodistances along the find path from $v$. It is similar to $\text{FIND-SET}$ on page 571, but with three changes. First, when $v$ is either the root or a child of a root (one of these conditions holds if and only if $v.p = v.p.p$) in the disjoint-set forest, we don't have to recurse; instead, we just return $v.p$. Second, when we do recurse, we save the pointer $v.p$ into a new variable $y$. Third, when we recurse, we update $v.d$ by adding into it the $d$ values of all nodes on the find path that are no longer proper ancestors of $v$ after path compression; these nodes are precisely the proper ancestors of $v$ other than the root. Thus, as long as $v$ does not start out the $\text{FIND-ROOT}$ call as either the root or a child of the root, we add $y.d$ into $v.d$. Note that $y.d$ has been updated prior to updating $v.d$, if $y$ is also neither the root nor a child of the root.

$\text{FIND-DEPTH}$ first calls $\text{FIND-ROOT}$ to perform path compression and update pseudodistances. Afterward, the find path from $v$ consists of either just $v$ (if $v$ is a root) or just $v$ and $v.p$ (if $v$ is not a root, in which case it is a child of the root after path compression). In the former case, the depth of $v$ is just $v.d$, and in the latter case, the depth is $v.d + v.p.d$.

d. Our procedure for $\text{GRAFT}$ is a combination of $\text{UNION}$ and $\text{LINK}$:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
GRAPT(r, v)
    r' = FIND-ROOT(r)
    v' = FIND-ROOT(v)
    z = FIND-DEPTH(v)
    if r'.rank > v'.rank
        v'.p = r'
        r'.d = r'.d + z + 1
        v'.d = v'.d - r'.d
    else r'.p = v'
        r'.d = r'.d + z + 1 - v'.d
        if r'.rank == v'.rank
            v'.rank = v'.rank + 1

This procedure works as follows. First, we call $\text{FIND-ROOT}$ on $r$ and $v$ in order to find the roots $r'$ and $v'$, respectively, of their trees in the disjoint-set forest. As we saw in part (c), these $\text{FIND-ROOT}$ calls also perform path compression and update pseudodistances on the find paths from $r$ and $v$. We then call $\text{FIND-DEPTH}(v)$, saving the depth of $v$ in the variable $z$. (Since we have just compressed $v$'s find path, this call of $\text{FIND-DEPTH}$ takes $\text{O(1)}$ time.) Next, we emulate the action of $\text{LINK}$, by making the root ($r'$ or $v'$) of smaller rank a child of the root of larger rank; in case of a tie, we make $r'$ a child of $v'$.

If $v'$ has the smaller rank, then all nodes in $r$'s tree will have their depths increased by the depth of $v$ plus $1$ (because $r$ is to become a child of $v$). Altering the psuedodistance of the root of a disjoint-set tree changes the computed depth of all nodes in that tree, and so adding $z + 1$ to $r'.d$ accomplishes this update for all nodes in $r$'s disjoint-set tree. Since $v'$ will become a child of $r'$ in the disjoint-set forest, we have just increased the computed depth of all nodes in the disjoint-set tree rooted at $v'$ by $r'.d$. These computed depths should not have changed, however. Thus, we subtract off $r'.d$ from $v'.d$, so that the sum $v'.d + r'.d$ after making $v'$ a child of $r'$ equals $v'.d$ before making $v'$ a child of $r'$.

On the other hand, if $r'$ has the smaller rank, or if the ranks are equal, then $r'$ becomes a child of $v'$ in the disjoint-set forest. In this case, $v'$ remains a root in the disjoint-set forest afterward, and we can leave $v'.d$ alone. We have to update $r'.d$, however, so that after making $r'$ a child of $v'$, the depth of each node in $r$'s disjoint-set tree is increased by $z + 1$. We add $z + 1$ to $r'.d$, but we also subtract out $v'.d$, since we have just made $r'$ a child of $v'$. Finally, if the ranks of $r'$ and $v'$ are equal, we increment the rank of $v'$, as is done in the $\text{LINK}$ procedure.

e. The asymptotic running times of $\text{MAKE-TREE}$, $\text{FIND-DEPTH}$, and $\text{GRAFT}$ are equivalent to those of $\text{MAKE-SET}$, $\text{FIND-SET}$, and $\text{UNION}$, respectively. Thus, a sequence of $m$ operations, $n$ of which are $\text{MAKE-TREE}$ operations, takes $\Theta(m\alpha(n))$ time in the worst case.