14-1 Point of maximum overlap

Suppose that we wish to keep track of a point of maximum overlap in a set of intervals—a point with the largest number of intervals in the set that overlap it.

a. Show that there will always be a point of maximum overlap that is an endpoint of one of the segments.

b. Design a data structure that efficiently supports the operations $\text{INTERVAL-INSERT}$, $\text{INTERVAL-DELETE}$, and $\text{FIND-POM}$, which returns a point of maximum overlap. ($\textit{Hint:}$ Keep a red-black tree of all the endpoints. Associate a value of $+1$ with each left endpoint, and associate a value of $-1$ with each right endpoint. Augment each node of the tree with some extra information to maintain the point of maximum overlap.)

a. Assume for the purpose of contradiction that there is no point of maximum overlap in an endpoint of a segment. The maximum overlap point $p$ is in the interior of $m$ segments. Actually, $p$ is in the interior of the intersection of those $m$ segments. Now look at one of the endpoints $p'$ of the intersection of the $m$ segments. Point $p'$ has the same overlap as $p$ because it is in the same intersection of $m$ segments, and so $p'$ is also a point of maximum overlap. Moreover, $p'$ is in the endpoint of a segment (otherwise the intersection would not end there), which contradicts our assumption that there is no point of maximum overlap in an endpoint of a segment. Thus, there is always a point of maximum overlap which is an endpoint of one of the segments.

b. Keep a balanced binary search tree of the endpoints. That is, to insert an interval, we insert its endpoints separately. With each left endpoint $e$, associate a value $p(e) = +1$ (increasing the overlap by $1$). With each right endpoint $e$ associate a value $p(e) = -1$ (decreasing the overlap by $1$). When multiple endpoints have the same value, insert all the left endpoints with that value before inserting any of the right endpoints with that value.

Here's some intuition. Let $e_1, e_2, \ldots, e_n$ be the sorted sequence of endpoints corresponding to our intervals. Let $s(i, j)$ denote the sum $p(e_i) + p(e_{i + 1}) + \cdots + p(e_j)$ for $1 \le i \le j \le n$. We wish to find an $i$ maximizing $s(1, i)$.

For each node $x$ in the tree, let $l(x)$ and $r(x)$ be the indices in the sorted order of the leftmost and rightmost endpoints, respectively, in the subtree rooted at $x$. Then the subtree rooted at $x$ contains the endpoints $e_{l(x)}, e_{l(x) + 1}, \ldots, e_{r(x)}$.

Each node $x$ stores three new attributes. We store $x.v = s(l(x), r(x))$, the sum of the values of all nodes in the subtree rooted at $x$. We also store $x.m$, the maximum value obtained by the expression $s(l(x), i)$ for any $i$ in $\{l(x), l(x) + 1, \ldots, r(x)\}$. Finally, we store $x.o$ as the value of $i$ for which $x.m$ achieves its maximum. For the sentinel, we define $T.nil.v = T.nil.m = 0$.

We can compute these attributes in a bottom-up fashion to satisfy the requirements of Theorem 14.1:

$$x.v = x.left.v + p(x) + x.right.v,$$

$$ x.m = \max \begin{cases} x.left.m & \text{(max is in $x$'s left subtree)}, \\ x.left.v + p(x) & \text{(max is at $x$)}, \\ x.left.v + p(x) + x.right.m & \text{(max is in $x$'s right subtree)}. \end{cases} $$

Computing $x.v$ is straightforward. Computing $x.m$ bears further explanation. Recall that it is the maximum value of the sum of the $p$ values for the nodes in the subtree rooted at $x$, starting at the node for $e_{l(x)}$, which is the leftmost endpoint in $x$'s subtree, and ending at any node for $e_i$ in $x$'s subtree. The endpoint $e_i$ that maximizes this sum—let's call it $e_{i^*}$—corresponds to either a node in $x$'s left subtree, $x$ itself, or a node in $x$'s right subtree. If $e_{i^*}$ corresponds to a node in $x$'s left subtree, then $x.left.m$ represents a sum starting at the node for $e_{l(x)}$ and ending at a node in $x$'s left subtree, and hence $x.m = x.left.m$. If $e_i$ corresponds to $x$ itself, then $x.m$ represents the sum of all $p$ values in $x$'s left subtree, plus $p(x)$, so that $x.m = x.left.v + p(x)$. Finally, if $e_{i^*}$ corresponds to a node in $x$'s right subtree, then $x.m$ represents the sum of all $p$ values in $x$'s left subtree, plus $p(x)$, plus the sum of some subset of $p$ values in $x$'s right subtree. Moreover, the values taken from $x$'s right subtree must start from the leftmost endpoint stored in the right subtree. To maximize this sum, we need to maximize the sum from the right subtree, and that value is precisely $x.right.m$. Hence, in this case, $x.m = x.left.v + p(x) + x.right.m$.

Once we understand how to compute $x.m$, it is straightforward to compute $x.o$ from the information in $x$ and its two children. Thus, we can implement the operations as follows:

  • $\text{INTERVAL-INSERT}$: insert two nodes, one for each endpoint of the interval.
  • $\text{FIND-POM}$: return the interval whose endpoint is represented by $T.root.o$.

(Note that because we are building a binary search tree of all the endpoints and then determining $T.root.o$, we have no need to delete any nodes from the tree.)

Because of how we have defined the new attributes, Theorem 14.1 says that each operation runs in $O(\lg n)$ time. In fact, $\text{FIND-POM}$ takes only $O(1)$ time.