26-2 Minimum path cover

A path cover of a directed graph $G = (V, E)$ is a set $P$ of vertex-disjoint paths such that every vertex in $V$ is included in exactly one path in $P$. Paths may start and end anywhere, and they may be of any length, including $0$. A minimum path cover of $G$ is a path cover containing the fewest possible paths.

a. Give an efficient algorithm to find a minimum path cover of a directed acyclic graph $G = (V, E)$. ($\textit{Hint:}$ Assuming that $V = \{1, 2, \ldots, n\}$, construct the graph $G' = (V', E')$, where

$$ \begin{aligned} V' & = \{x_0, x_1, \ldots, x_n\} \cup \{y_0, y_1, \ldots, y_n\}, \\ E' & = \{(x_0, x_i): i \in V\} \cup \{(y_i, y_0): i \in V\} \cup \{(x_i, y_j): (i, j) \in E\}, \end{aligned} $$

and run a maximum-flow algorithm.)

b. Does your algorithm work for directed graphs that contain cycles? Explain.

a. The idea is to use a maximum-flow algorithm to find a maximum bipartite matching that selects the edges to use in a minimum path cover. We must show how to formulate the max-flow problem and how to construct the path cover from the resulting matching, and we must prove that the algorithm indeed finds a minimum path cover.

Define $G'$ as suggested, with directed edges. Make $G'$ into a flow network with source $x_0$ and sink $y_0$ by defining all edge capacities to be $1$. $G'$ is the flow network corresponding to a bipartite graph $G''$ in which $L = \{x_1, \ldots, x_n\}$, $R = \{y_1, \ldots, y_n\}$, and the edges are the (undirected version of the) subset of $E'$ that doesn't involve $x_0$ or $y_0$ .

The relationship of $G$ to the bipartite graph $G''$ is that every vertex $i$ in $G$ is represented by two vertices, $x_i$ and $y_i$, in $G''$. Edge $(i, j)$ in $G$ corresponds to edge $(x_i, y_j)$ in $G''$. That is, an edge $(x_i, y_j)$ in $G''$ means that an edge in $G$ leaves $i$ and enters $j$. Vertex $x_i$ tells us about edges leaving $i$, and $y_i$ tells us about edges entering $i$.

The edges in a bipartite matching in $G''$ can be used in a path cover of $G$, for the following reasons:

  • In a bipartite matching, no vertex is used more than once. In a bipartite matching in $G''$, since no $x_i$ is used more than once, at most one edge in the matching leaves any vertex $i$ in $G$. Similarly, since no $y_j$ is used more than once, at most one edge in the matching enters any vertex $j$ in $G$.
  • In a path cover, since no vertex appears in more than one path, at most one path edge enters each vertex and at most one path edge leaves each vertex.

We can construct a path cover $P$ from any bipartite matching $M$ (not just a maximum matching) by moving from some $x_i$ to its matching $y_j$ (if any), then from $x_j$ to its matching $y_k$, and so on, as follows:

  1. Start a new path containing a vertex $i$ that has not yet been placed in a path.
  2. If $x_i$ is unmatched, the path can't go any farther; just add it to $P$.
  3. If $x_i$ is matched to some $y_j$, add $j$ to the current path. If $j$ has already been placed in a path (i.e., though we've just entered $j$ by processing $y_j$, we've already built a path that leaves $j$ by processing $x_j$), combine this path with that one and go back to step 1. Otherwise go to step 2 to process $x_j$.

This algorithm constructs a path cover, for the following reasons:

  • Every vertex is put into some path, because we keep picking an unused vertex from which to start a path until there are no unused vertices.
  • No vertex is put into two paths, because every $x_i$ is matched to at most one $y_j$, and vice versa. That is, at most one candidate edge leaves each vertex, and at most one candidate edge enters each vertex. When building a path, we start or enter a vertex and then leave it, building a single path. If we ever enter a vertex that was left earlier, it must have been the start of another path, since there are no cycles, and we combine those paths so that the vertex is entered and left on a single path.

Every edge in $M$ is used in some path because we visit every $x_i$, and we incorporate the single edge, if any, from each visited $x_i$. Thus, there is a one-to-one correspondence between edges in the matching and edges in the constructed path cover.

We now show that the path cover $P$ constructed above has the fewest possible paths when the matching is maximum.

Let $f$ be the flow corresonding to the bipartite matching $M$.

$$ \begin{aligned} |V| & = \sum_{p \in P} \text{(\# vertices in $p$)} & \text{(every vertex is on exactly 1 path)} \\ & = \sum_{p \in P} \text{(1 + \# edges in $p$)} \\ & = \sum_{p \in P} 1 + \sum_{p \in P} \text{(\# edges in $p$)} \\ & = |P| + |M| & \text{(by 1-to-1 correspondence)} \\ & = |P| + |f|. & \text{(by Lemma 26.9)} \end{aligned} $$

Thus, for the fixed set $V$ in our graph $G$, $|P|$ (the number of paths) is minimized when the flow $f$ is maximized.

The overall algorithm is as follows:

  • Use $\text{FORD-FULKERSON}$ to find a maximum flow in $G'$ and hence a maximum bipartite matching $M$ in $G''$.
  • Construct the path cover as described above.

Time

$O(VE)$ total:

  • $O(V + E)$ to set up $G'$,
  • $O(VE)$ to find the maximum bipartite matching,
  • $O(E)$ to trace the paths, because each edge $\in M$ is traversed only once and there are $O(E)$ edges in $M$.

b. The algorithm does not work if there are cycles.

Consider a graph $G$ with $4$ vertices, consisting of a directed triangle and an edge pointing to the triangle:

$$E = \{(1, 2), (2, 3), (3, 1), (4, 1)\}.$$

$G$ can be covered with a single path: $4 \to 1 \to 2 \to 3$, but our algorithm might find only a $2$-path cover.

In the bipartite graph $G'$, the edges $(x_i, y_j)$ are

$$(x_1, y_2), (x_2, y_3), (x_3, y_1), (x_4, y_1).$$

There are $4$ edges from an $x_i$ to a $y_j$, but $2$ of them lead to $y_1$, so a maximum bipartite matching can have only $3$ edges (and the maximum flow in $G'$ has value $3$). In fact, there are $2$ possible maximum matchings. It is always possible to match $(x_1, y_2)$ and $(x_2, y_3)$, and then either $(x_3, y_1)$ or $(x_4, y_1)$ can be chosen, but not both.

The maximum flow found by one of our max-flow algorithms could find the flow corresponding to either of these matchings, since both are maximal. If it finds the matching with edge $(x_3, x_1)$, then the matching would not contain $(x_4, x_1)$; given that matching, our path algorithm is forced to produce $2$ paths, one of which contains just the vertex $4$.