# 16.1 An activity-selection problem

## 16.1-1

Give a dynamic-programming algorithm for the activity-selection problem, based on recurrence $\text{(16.2)}$. Have your algorithm compute the sizes $c[i, j]$ as defined above and also produce the maximum-size subset of mutually compatible activities.

Assume that the inputs have been sorted as in equation $\text{(16.1)}$. Compare the running time of your solution to the running time of $\text{GREEDY-ACTIVITY-SELECTOR}$.

The tricky part is determining which activities are in the set $S_{ij}$. If activity $k$ is in $S_{ij}$, then we must have $i < k < j$, which means that $j - 1 \ge 2$, but we must also have that $f_i \le s_k$ and $f_k \le s_j$. If we start $k$ at $j - 1$ and decrement $k$, we can stop once $k$ reaches $i$, but we can also stop once we find that $f_k \le f_i$, since then activities $i + 1$ through $k$ cannot be compatible with activity $i$.

We create two fictitious activities, $a_0$ with $f_0 = 0$ and $a_{n + 1}$ with $s_{n + 1} = \infty$. We are interested in a maximum-size set $A_{0, n + 1}$ of mutually compatible activities in $S_{0, n + 1}$. We'll use tables $c[0..n + 1, 0..n + 1]$, as in recurrence $\text{(16.2)}$ (so that $c[i, j] = |A_{ij}|$), and $act[0..n + 1, 0..n + 1]$, where $act[i, j]$ is the activity $k$ that we choose to put into $A_{ij}$.

We fill the tables in according to increasing difference $j - 1$, which we denote by $l$ in the pseudocode. Since $S_{ij} = \emptyset$; if $j - i < 2$, we initialize $c[i, i] = 0$ for all $i$ and $c[i, i + 1] = 0$ for $0 \le i \le n$. As in $\text{RECURSIVE-ACTIVITY-SELECTOR}$ and $\text{GREEDY-ACTIVITY-SELECTOR}$, the start and finish times are given as arrays $s$ and $f$, where we assume that the arrays already include the two fictitious activities and that the activities are sorted by monotonically increasing finish time.

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 DYNAMIC-ACTIVITY-SELECTOR(s, f, n) let c[0..n + 1, 0..n + 1] and act[0..n + 1, 0..n + 1] be new tables for i = 0 to n c[i, i] = 0 c[i, i + 1] = 0 c[n + 1, n + 1] = 0 for l = 2 to n + 1 for i = 0 to n - l + 1 j = i + l c[i, j] = 0 k = j - 1 while f[i] < f[k] if f[i] ≤ s[k] and f[k] ≤ s[j] and c[i, k] + c[k, j] + 1 > c[i, j] c[i, j] = c[i, k] + c[k, j] + 1 act[i, j] = k k = k - 1 print "A maximum size set of mutually compatible activities has size" c[0, n + 1] print "The set contains" PRINT-ACTIVITIES(c, act, 0, n + 1)
 1 2 3 4 5 6 PRINT-ACTIVITIES(c, act, i, j) if c[i, j] > 0 k = act[i, j] print k PRINT-ACTIVITIES(c, act, i, k) PRINT-ACTIVITIES(c, act, k, j)

The $\text{PRINT-ACTIVITIES}$ procedure recursively prints the set of activities placed into the optimal solution $A_{ij}$. It first prints the activity $k$ that achieved the maximum value of $c[i, j]$, and then it recurses to print the activities in $A_{ik}$ and $A_{kj}$. The recursion bottoms out when $c[i, j] = 0$, so that $A_{ij} = \emptyset$.

Whereas $\text{GREEDY-ACTIVITY-SELECTOR}$ runs in $\Theta(n)$ time, the $\text{DYNAMIC-ACTIVITY-SELECTOR}$ procedure runs in $O(n^3)$ time.

## 16.1-2

Suppose that instead of always selecting the first activity to finish, we instead select the last activity to start that is compatible with all previously selected activities. Describe how this approach is a greedy algorithm, and prove that it yields an optimal solution.

The proposed approach—selecting the last activity to start that is compatible with all previously selected activities—is really the greedy algorithm but starting from the end rather than the beginning.

Another way to look at it is as follows. We are given a set $S = \{a_1, a_2, \ldots, a_n\}$ of activities, where $a_i = [s_i, f_i)$, and we propose to find an optimal solution by selecting the last activity to start that is compatible with all previously selected activities. Instead, let us create a set $S = \{a_1', a_2', \ldots, a_n'\}$, where $a_i' = [f_i, s_i)$. That is, $a_i'$ is $a_i$ in reverse. Clearly, a subset of $\{a_{i_1}, a_{i_2}, \ldots, a_{i_k}\} \subseteq S$ is mutually compatible if and only if the corresponding subset $\{a_{i_1}', a_{i_2}', \ldots, a_{i_k}'\} \subseteq S'$ is also mutually compatible. Thus, an optimal solution for $S$ maps directly to an optimal solution for $S'$ and vice versa.

The proposed approach of selecting the last activity to start that is compatible with all previously selected activities, when run on $S$, gives the same answer as the greedy algorithm from the text—selecting the first activity to finish that is compatible with all previously selected activities—when run on $S'$. The solution that the proposed approach finds for $S$ corresponds to the solution that the text's greedy algorithm finds for $S'$, and so it is optimal.

## 16.1-3

Not just any greedy approach to the activity-selection problem produces a maximum-size set of mutually compatible activities. Give an example to show that the approach of selecting the activity of least duration from among those that are compatible with previously selected activities does not work. Do the same for the approaches of always selecting the compatible activity that overlaps the fewest other remaining activities and always selecting the compatible remaining activity with the earliest start time.

• For the approach of selecting the activity of least duration from those that are compatible with previously selected activities:

$$\begin{array}{l|ccc} i & 1 & 2 & 3 \\ \hline s_i & 0 & 2 & 3 \\ f_i & 3 & 4 & 6 \\ \text{duration} & 3 & 2 & 3 \end{array}$$

This approach selects just $\{a_2\}$, but the optimal solution selects $\{a_1, a_3\}$.

• For the approach of always selecting the compatible activity that overlaps the fewest other remaining activities:

$$\begin{array}{l|ccc} i & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 \\ \hline s_i & 0 & 1 & 1 & 1 & 2 & 3 & 4 & 5 & 5 & 5 & 6 \\ f_i & 2 & 3 & 3 & 3 & 4 & 5 & 6 & 7 & 7 & 7 & 8 \\ \text{\# of overlapping activities} & 3 & 4 & 4 & 4 & 4 & 2 & 4 & 4 & 4 & 4 & 3 \end{array}$$

This approach frst selects $a_6$, and after that choice it can select only two other activities (one of $a_1$, $a_2$, $a_3$, $a_4$ and one of $a_8$, $a_9$, $a_{10}$, $a_{11}$). An optimal solution is $\{a_1, a_5, a_7, a_{11}\}$.

• For the approach of always selecting the compatible remaining activity with the earliest start time, just add one more activity with the interval $[0, 14)$ to the example in Section 16.1. It will be the first activity selected, and no other activities are compatible with it.

## 16.1-4

Suppose that we have a set of activities to schedule among a large number of lecture halls, where any activity can take place in any lecture hall. We wish to schedule all the activities using as few lecture halls as possible. Give an efficient greedy algorithm to determine which activity should use which lecture hall.

(This problem is also known as the interval-graph coloring problem. We can create an interval graph whose vertices are the given activities and whose edges connect incompatible activities. The smallest number of colors required to color every vertex so that no two adjacent vertices have the same color corresponds to finding the fewest lecture halls needed to schedule all of the given activities.)

Let $S$ be the set of $n$ activities.

The "obvious" solution of using $\text{GREEDY-ACTIVITY-SELECTOR}$ to find a maximum-size set $S_1$ of compatible activities from $S$ for the first lecture hall, then using it again to find a maximum-size set $S_2$ of compatible activities from $S - S_1$ for the second hall, (and so on until all the activities are assigned), requires $\Theta(n^2)$ time in the worst case.

There is a better algorithm, however, whose asymptotic time is just the time needed to sort the activities by time—$O(n\lg n)$ time for arbitrary times, or possibly as fast as $O(n)$ if the times are small integers.

The general idea is to go through the activities in order of start time, assigning each to any hall that is available at that time. To do this, move through the set of events consisting of activities starting and activities finishing, in order of event time. Maintain two lists of lecture halls: Halls that are busy at the current event time $t$ (because they have been assigned an activity $i$ that started at $s_i \le t$ but won't finish until $f_i > t$) and halls that are free at time $t$. (As in the activity- selection problem in Section 16.1, we are assuming that activity time intervals are half open—i.e., that if $s_i \ge f_j$, then activities $i$ and $j$ are compatible.) When $t$ is the start time of some activity, assign that activity to a free hall and move the hall from the free list to the busy list. When $t$ is the finish time of some activity, move the activity's hall from the busy list to the free list. (The activity is certainly in some hall, because the event times are processed in order and the activity must have started before its finish time $t$, hence must have been assigned to a hall.)

To avoid using more halls than necessary, always pick a hall that has already had an activity assigned to it, if possible, before picking a never-used hall. (This can be done by always working at the front of the free-halls list—putting freed halls onto the front of the list and taking halls from the front of the list—so that a new hall doesn't come to the front and get chosen if there are previously-used halls.)

This guarantees that the algorithm uses as few lecture halls as possible: The algorithm will terminate with a schedule requiring $m \le n$ lecture halls. Let activity $i$ be the first activity scheduled in lecture hall $m$. The reason that $i$ was put in the $m$th lecture hall is that the first $m - 1$ lecture halls were busy at time $s_i$ . So at this time there are $m$ activities occurring simultaneously. Therefore any schedule must use at least $m$ lecture halls, so the schedule returned by the algorithm is optimal.

Run time:

• Sort the $2n$ activity-starts/activity-ends events. (In the sorted order, an activity-ending event should precede an activity-starting event that is at the same time.) $O(n\lg n)$ time for arbitrary times, possibly $O(n)$ if the times are restricted (e.g., to small integers).
• Process the events in $O(n)$ time: Scan the $2n$ events, doing $O(1)$ work for each (moving a hall from one list to the other and possibly associating an activity with it).

Total: $O(n + \text{time to sort)}$

For this specific problem, time to sort could range in between $O(n)$ to $O(n\lg n)$. Therefore, Time complexity could be $O(n)$ or $O(n\lg n)$ depending on the implementation of the sorting algorithm. Counting Sort can sort the array in $O(n)$ time whereas merge sort requires $O(n\lg n)$ time.

[The idea of this algorithm is related to the rectangle-overlap algorithm in Exercise 14.3-7.]

## 16.1-5

Consider a modification to the activity-selection problem in which each activity $a_i$ has, in addition to a start and finish time, a value $v_i$. The objective is no longer to maximize the number of activities scheduled, but instead to maximize the total value of the activities scheduled. That is, we wish to choose a set $A$ of compatible activities such that $\sum_{a_k \in A} v_k$ is maximized. Give a polynomial-time algorithm for this problem.

We can no longer use the greedy algorithm to solve this problem. However, as we show, the problem still has an optimal substructure which allows us to formulate a dynamic programming solution. The analysis here follows closely the analysis of Section 16.1 in the book. We define the value of a set of compatible events as the sum of values of events in that set. Let $S_{ij}$ be defined as in Section 16.1. An optimal solution to $S_{ij}$ is a subset of mutually compatible events of $S_{ij}$ that has maximum value. Let $A_{ij}$ be an optimal solution to $S_{ij}$. Suppose $A_{ij}$ includes an event $a_k$. Let $A_{ik}$ and $A_{kj}$ be defined as in Section 16.1. Thus, we have $A_{ij} = A_{ik} \cup \{a_k\} \cup A_{kj}$, and so the value of maximum-value set $A_{ij}$ is equal to the value of $A_{ik}$ plus the value of $A_{kj}$ plus $v_k$.

The usual cut-and-paste argument shows that the optimal solution $A_{ij}$ must also include optimal solutions to the two subproblems for $S_{ik}$ and $S_{kj}$. If we could find a set $A_{kj}'$ of mutually compatible activities in $S_{kj}$ where the value of $A_{kj}'$ is greater than the value of $A_{kj}$, then we could use $A_{kj}'$, rather than $A_{kj}$, in a solution to the subproblem for $S_{ij}$. We would have constructed a set of mutually compatible activities with greater value than that of $A_{ij}$, which contradicts the assumption that $A_{ij}$ is an optimal solution. A symmetric argument applies to the activities in $S_{ik}$.

Let us denote the value of an optimal solution for the set $S_{ij}$ by $val[i, j]$. Then, we would have the recurrence

$$val[i, j] = val[i, k] + val[k, j] + v_k.$$

Of course, since we do not know that an optimal solution for the set $S_{ij}$ includes activity $a_k$ , we would have to examine all activities in $S_{ij}$ to find which one to choose, so that

$$val[i, j] = \begin{cases} 0 & \text{if S_{ij} = \emptyset}, \\ \max\limits_{a_k\in S_{ij}} {val[i, k] + val[k, j] + v_k} & \text{if S_{ij} \ne \emptyset}. \end{cases}$$

While implementing the recurrence, the tricky part is determining which activities are in the set $S_{ij}$ . If activity $k$ is in $S_{ij}$ , then we must have $i < k < j$, which means that $j - i \ge 2$, but we must also have that $f_i \le s_k$ and $f_k \le s_j$. If we start $k$ at $j - 1$ and decrement $k$, we can stop once $k$ reaches $i$, but we can also stop once we find that $f_k \le f_i$, since then activities $i + 1$ through $k$ cannot be compatible with activity $i$.

We create two fictitious activities, $a_0$ with $f_0 = 0$ and $a_{n + 1}$ with $s_{n + 1} = \infty$. We are interested in a maximum-size set $A_{0, n + 1}$ of mutually compatible activities in $S_{0, n + 1}$. We'll use tables $val[0..n + 1, 0..n + 1]$, as in the recurrence, and $act[0..n + 1, 0..n + 1]$, where $act[i, j]$ is the activity $k$ that we choose to put into $A_{ij}$ .

We fill the tables in according to increasing difference $j - i$, which we denote by $l$ in the pseudocode. Since $S_{ij} = \emptyset$ if $j - i < 2$, we initialize $val[i, i] = 0$ for all $i$ and $val[i, i + 1] = 0$ for $0\le i\le n$. As in $\text{RECURSIVE-ACTIVITY-SELECTOR}$ and $\text{GREEDY-ACTIVITY-SELECTOR}$, the start and finish times are given as arrays $s$ and $f$, where we assume that the arrays already include the two fictitious activities and that the activities are sorted by monotonically increasing finish time. The array $v$ specifies the value of each activity.

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 MAX-VALUE-ACTIVITY-SELECTOR(s, f, v, n) let val[0..n + 1, 0..n + 1] and act[0..n + 1, 0..n + 1] be new tables for i = 0 to n val[i, i] = 0 val[i, i + 1] = 0 val[n + 1, n + 1] = 0 for l = 2 to n + 1 for i = 0 to n - l + 1 j = i + l val[i, j] = 0 k = j - 1 while f[i] < f[k] if f[i] ≤ s[k] and f[k] ≤ s[j] and val[i, k] + val[k, j] + v[k] > val[i, j] val[i, j] = val[i, k] + val[k, j] + v[k] act[i, j] = k k = k - 1 print "A maximum-value set of mutually compatible activities has value" val[0, n + 1] print "The set contains" PRINT-ACTIVITIES(val, act, 0, n + 1)
 1 2 3 4 5 6 PRINT-ACTIVITIES(val, act, i, j) if val[i, j] > 0 k = act[i, j] print k PRINT-ACTIVITIES(val, act, i, k) PRINT-ACTIVITIES(val, act, k, j)

The $\text{PRINT-ACTIVITIES}$ procedure recursively prints the set of activities placed into the optimal solution $A_{ij}$ . It first prints the activity $k$ that achieved the maximum value of $val[i, j]$, and then it recurses to print the activities in $A_{ik}$ and $A_{kj}$. The recursion bottoms out when $val[i, j] = 0$, so that $A_{ij} = \emptyset$.

Whereas $\text{GREEDY-ACTIVITY-SELECTOR}$ runs in $\Theta(n)$ time, the $\text{MAX-VALUE-ACTIVITY-SELECTOR}$ procedure runs in $O(n^3)$ time.