8-4 Water jugs

Suppose that you are given $n$ red and $n$ blue water jugs, all of different shapes and sizes. All red jugs hold different amounts of water, as do the blue ones. Moreover, for every red jug, there is a blue jug that holds the same amount of water, and vice versa.

Your task is to find a grouping of the jugs into pairs of red and blue jugs that hold the same amount of water. To do so, you may perform the following operation: pick a pair of jugs in which one is red and one is blue, fill the red jug with water, and then pour the water into the blue jug. This operation will tell you whether the red or the blue jug can hold more water, or that they have the same volume. Assume that such a comparison takes one time unit. Your goal is to find an algorithm that makes a minimum number of comparisons to determine the grouping. Remember that you may not directly compare two red jugs or two blue jugs.

a. Describe a deterministic algorithm that uses $\Theta(n^2)$ comparisons to group the jugs into pairs.

b. Prove a lower bound of $\Omega(n\lg n)$ for the number of comparisons that an algorithm solving this problem must make.

c. Give a randomized algorithm whose expected number of comparisons is $O(n\lg n)$, and prove that this bound is correct. What is the worst-case number of comparisons for your algorithm?

a. Compare each red jug with each blue jug. Since there are $n$ red jugs and $n$ blue jugs, that will take $\Theta(n^2)$ comparisons in the worst case.

b. To solve the problem, an algorithm has to perform a series of comparisons until it has enough information to determine the matching. We can view the computation of the algorithm in terms of a decision tree. Every internal node is labeled with two jugs (one red, one blue) which we compare, and has three outgoing edges (red jug smaller, same size, or larger than the blue jug). The leaves are labeled with a unique matching of jugs.

The height of the decision tree is equal to the worst-case number of comparisons the algorithm has to make to determine the matching. To bound that size, let us first compute the number of possible matchings for n red and n blue jugs.

If we label the red jugs from $1$ to $n$ and we also label the blue jugs from $1$ to $n$ before starting the comparisons, every outcome of the algorithm can be represented as a set

$$\text{\{$i, \pi(i): 1 \le i \le n$ and $\pi$ is a permutation on \{$1, \ldots, n$\}\}},$$

which contains the pairs of red jugs (first component) and blue jugs (second component) that are matched up. Since every permutation $\pi$ corresponds to a different outcome, there must be exactly $n!$ different results.

Now we can bound the height $h$ of our decision tree. Every tree with a branching factor of $3$ (every inner node has at most three children) has at most $3^h$ leaves. Since the decison tree must have at least $n!$ children, it follows that

$$3^h \ge n! \ge (n / e)^n \Rightarrow h \ge n\log_3 n - n\log_3 e = \Omega(n\lg n).$$

So any algorithm solving the problem must use $\Omega(n\lg n)$ comparisons.

c. Assume that the red jugs are labeled with numbers $1, 2, \ldots, n$ and so are the blue jugs. The numbers are arbitrary and do not correspond to the volumes of jugs, but are just used to refer to the jugs in the algorithm description. Moreover, the output of the algorithm will consist of $n$ distinct pairs $(i, j)$, where the red jug $i$ and the blue jug $j$ have the same volume.

The procedure $\text{MATCH-JUGS}$ takes as input two sets representing jugs to be matched: $R \subseteq \{1, \ldots, n\}$, representing red jugs, and $B \subseteq \{1, \ldots, n\}$, representing blue jugs. We will call the procedure only with inputs that can be matched; one necessary condition is that $|R| = |B|$.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
MATCH-JUGS(R, B)
    if |R| == 0         // sets are empty
        return
    if |R| == 1         // sets contain just one jug each
        let R = {r} and B = {b}
        output "(r, b)"
        return
    else r = a randomly chosen jug in R
        compare r to every jug of B
        B[<] = the set of jugs in B that are smaller than r
        B[>] = the set of jugs in B that are larger than r
        b = the one jug in B with the same size as r
        compare b to every jug of R - {r}
        R[<] = the set of jugs in R that are smaller than b
        R[>] = the set of jugs in R that are larger than b
        output "(r, b)"
        MATCH-JUGS(R[<], B[<])
        MATCH-JUGS(R[>], B[>])

Correctness can be seen as follows (remember that $|R| = |B|$ in each call). Once we pick $r$ randomly from $R$, there will be a matching among the jugs in volume smaller than $r$ (which are in the sets $R_<$ and $B_<$), and likewise between the jugs larger than $r$ (which are in $R_>$ and $B_>$). Termination is also easy to see: since $|R_<| + |R_>| < |R|$ in every recursive step, the size of the first parameter reduces with every recursive call. It eventually must reach $0$ or $1$, in which case the recursion terminates.

What about the running time? The analysis of the expected number of comparisons is similar to that of the quicksort algorithm in Section 7.4.2. Let us order the jugs as $r_1, \ldots, r_n$ and $b_1, \ldots,b_n$ where $r_i < r_{i + 1}$ and $b_i < b_{i + 1}$ for $i = 1, \ldots, n$, and $r_i = b_i$. Our analysis uses indicator random variables

$$X_{ij} = \text I\{\text{red jug $r_i$ is compared to blue jug $b_j$}\}.$$

As in quicksort, a given pair $r_i$ and $b_j$ is compared at most once. When we compare $r_i$ to every jug in $B$, jug $r_i$ will not be put in either $R_<$ or $R_>$. When we compare $b_i$ to every jug in $R - \{r_i\}$, jug $b_i$ is not put into either $B_<$ or $B_>$. The total number of comparisons is

$$X = \sum_{i = 1}^{n - 1}\sum_{j = i + 1}^n X_{ij}.$$

To calculate the expected value of $X$, we follow the quicksort analysis to arrive at

$$\text E[X] = \sum_{i = 1}^{n - 1}\sum_{j = i + 1}^n \Pr\{r_i \text{ is compared to } b_j\}.$$

As in the quicksort analysis, once we choose a jug $r_k$ such that $r_i < r_k < b_j$, we will put $r_i$ in $R_<$ and $b_j$ in $R_>$, and so $r_i$ and $b_j$ will never be compared again. Let us denote $R_{ij} = \{r_i, \ldots, r_j\}$. Then jugs $r_i$ and $b_j$ will be compared if and only if the first jug in $R_{ij}$ to be chosen is either $r_i$ or $r_j$.

Still following the quicksort analysis, until a jug from $R_{ij}$ is chosen, the entire set $R_{ij}$ is together. Any jug in $R_{ij}$ is equally likely to be first one chosen. Since $|R_{ij}| = j - i + 1$, the probability of any given jug being the first one chosen in $R_{ij}$ is $1 / (j - i + 1)$. The remainder of the analysis is the same as the quicksort analysis, and we arrive at the solution of $O(n\lg n)$ comparisons.

Just like in quicksort, in the worst case we always choose the largest (or smallest) jug to partition the sets, which reduces the set sizes by only $1$. The running time then obeys the recurrence $T(n) = T(n - 1) + \Theta(n)$, and the number of comparisons we make in the worst case is $T(n) = \Theta(n^2)$.