# 16-5 Off-line caching

Modern computers use a cache to store a small amount of data in a fast memory. Even though a program may access large amounts of data, by storing a small subset of the main memory in the cache—a small but faster memory—overall access time can greatly decrease. When a computer program executes, it makes a sequence $\langle r_1, r_2, \ldots, r_n \rangle$ of $n$ memory requests, where each request is for a particular data element. For example, a program that accesses 4 distinct elements $\{a, b, c, d\}$ might make the sequence of requests $\langle d, b, d, b, d, a, c, d, b, a, c, b \rangle$. Let $k$ be the size of the cache. When the cache contains $k$ elements and the program requests the $(k + 1)$st element, the system must decide, for this and each subsequent request, which $k$ elements to keep in the cache. More precisely, for each request $r_i$, the cache-management algorithm checks whether element $r_i$ is already in the cache. If it is, then we have a cache hit; otherwise, we have a cache miss. Upon a cache miss, the system retrieves $r_i$ from the main memory, and the cache-management algorithm must decide whether to keep $r_i$ in the cache. If it decides to keep $r_i$ and the cache already holds $k$ elements, then it must evict one element to make room for $r_i$ . The cache-management algorithm evicts data with the goal of minimizing the number of cache misses over the entire sequence of requests.

Typically, caching is an on-line problem. That is, we have to make decisions about which data to keep in the cache without knowing the future requests. Here, however, we consider the off-line version of this problem, in which we are given in advance the entire sequence of $n$ requests and the cache size $k$, and we wish to minimize the total number of cache misses.

We can solve this off-line problem by a greedy strategy called furthest-in-future, which chooses to evict the item in the cache whose next access in the request sequence comes furthest in the future.

a. Write pseudocode for a cache manager that uses the furthest-in-future strategy. The input should be a sequence $\langle r_1, r_2, \ldots, r_n \rangle$ of requests and a cache size $k$, and the output should be a sequence of decisions about which data element (if any) to evict upon each request. What is the running time of your algorithm?

b. Show that the off-line caching problem exhibits optimal substructure.

c. Prove that furthest-in-future produces the minimum possible number of cache misses.

a. The procedure $\text{CACHE-MANAGER}$ is a generic procedure, which initializes a cache by calling $\text{INITIALIZE-CACHE}$ and then calls $\text{ACCESS}$ with each data element in turn. The inputs are a sequence $R = \langle r_1, r_, \ldots, r_n \rangle$ of memory requests and a cache size $k$.

 1 2 3 4 CACHE-MANAGER(R, k) INITIALIZE-CACHE(R, k) for i = 1 to n ACCESS(r[i]) 

The running time of $\text{CACHE-MANAGER}$ of course depends heavily on how $\text{ACCESS}$ is implemented. We have several choices for how to implement the greedy strategy outlined in the problem. A straightforward way of implementing the greedy strategy is that when processing request $r_i$, for each of the at most $k$ elements currently in the cache, scan through requests $r_{i + 1}, \ldots, r_n$ to find which of the elements in the cache and $r_i$ has its next access furthest in the future, and evict this element. Because each scan takes $O(n)$ time, each request entails $O(k)$ scans, and there are $n$ requests, the running time of this straightforward approach is $O(kn^2)$.

Instead, we describe an asymptotically faster algorithm, which uses a red-black tree to check whether a given element is currently in the cache, a max-priority queue to retrieve the data element with the furthest access time, and a hash table (resolving collisions by chaining) to map data elements to integer indices. We assume that the data elements can be linearly ordered, so that it makes sense to put them into a red-black tree and a max-priority queue. The following procedure $\text{INITIALIZE-CACHE}$ creates and initializes some global data structures that are used by $\text{ACCESS}$.

  1 2 3 4 5 6 7 8 9 10 11 12 13 INITIALIZE-CACHE(R, k) let T be a new red-black tree let P be a new max-priority queue let H be a new hash table ind = 1 for i = 1 to n j = HASH-SEARCH(r[i]) if j == NIL HASH-INSERT(r[i], ind) let S[ind] be a new linked list j = ind ind = ind + 1 append i to S[j] 

In the above procedure, here is the meaning of various variables:

• The red-black tree $T$ has at most $k$ nodes and holds the distinct data elements that are currently in the cache. We assume that the red-black tree procedures are modified to keep track of the number of nodes currently in the tree, and that the procedure $\text{TREE-SIZE}$ returns this value. Because red-black tree $T$ has at most $k$ nodes, we can insert into, delete from, or search in it in $O(\lg k)$ worst-case time.
• The max-priority queue $P$ contains elements with two attributes: $key$ is the next access time of a data element, and $value$ is the actual data element for each data element in the cache. $key$ gives the key and $value$ is satellite data in the priority queue. Like the red-black tree $T$, the max-priority queue contains only elements currently in the cache. We need to maintain $T$ and $P$ separately, however, because $T$ is keyed on the data elements and $P$ is keyed on access times. Using a max-heap to implement $P$, we can extract the maximum element or insert a new element in $O(\lg k)$ time, and we can find the maximum element in $\Theta(1)$ time.
• The hash table $H$ is a dictionary or a map, which maps each data element to a unique integer. This integer is used to index linked lists, which are described next. We assume that the $\text{HASH-INSERT}$ procedure uses the table-expansion technique of Section 17.4.1 to keep the hash table's load factor to be at most some constant $\alpha$. In this way, the amortized cost per insertion is $\Theta(1)$ and, under the assumption of simple uniform hashing, then by Theorems 11.1 and 11.2, the average-case search time is also $\Theta(1)$.
• For every distinct data element $r_i$, we create a linked list $S_{ind}$ (where $ind$ is obtained through the hash table) holding the indices in the input array where $r_i$ occurs. For example, if the input sequence is $\langle d, b, d, b, d, a, c, d, b, a, c, b \rangle$, then we create four linked lists: $S_1$ for $a$, $S_2$ for $b$, $S_3$ for $c$, and $S_4$ for $d$. $S_1$ holds the indices where $a$ is accessed, and so $S_1 = \langle 6, 10 \rangle$. Similarly, $S_2 = \langle 2, 4, 9, 12 \rangle$, $S_3 = \langle 7, 11 \rangle$ and $S_4 = \langle 1, 3, 5, 8 \rangle$.

For each data element $r_i$, we first check whether there is already a linked list associated with $r_i$ and create a new linked list if not. We retrieve the linked list associated with $r_i$ and append $i$ to it, indicating that an access to $r_i$ occurs at access $i$.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 ACCESS(r[i]) // Compute the next access time for r[i]. ind = HASH-SEARCH(r[i]) time = ∞ delete the head of S[ind] if S[ind] is not empty time = head of S[ind] // Check to see whether r[i] is currently in the cache. if TREE-SEARCH(T.root, r[i]) != NIL print "cache hit" else if TREE-SIZE(T) < k // Insert in an empty slot in the cache. let z be a new node for T z.key = r[i] RB-INSERT(T, z) let event be a new object for P event.key = time event.value = r[i] INSERT(P, event) print "cache miss, inserted" r[i] "in empty slot" else event = MAXIMUM(P) if event.key ≤ time // r[i] has the furthest access time print "cache miss, no data element evicted" else // evict the element with furthest access time print "cache miss, evict data element" event.value event = EXTRACT-MAX(P) RB-DELETE(T, TREE-SEARCH(T.root, event.value)) event.key = time event.value = r[i] INSERT(P, event) let z be a new node for T z.key = r[i] RB-INSERT(T, z) 

The procedure $\text{ACCESS}$ takes an input $r_i$ and decides which element to evict, if any, from the cache. The first if condition properly sets time to the next access time of $r_i$. The head of the linked list associated with $r_i$ contains $i$; we remove this element from the list, and the new head contains the next access time for $r_i$. Then, we check to see whether $r_i$ is already present in the cache. If $r_i$ is not present in the cache, we check to see whether we can store $r_i$ in an empty slot. If there are no empty slots, we have to evict the element with the furthest access time. We retrieve the element with the furthest access time from the max-priority queue and compare it with that of $r_i$ . If $r_i$'s next access is sooner, we evict the element with the furthest access time from the cache (deleting the element from the tree and from the priority queue) and insert $r_i$ into the tree and priority queue.

Under the assumption of simple uniform hashing, the average-case running time of $\text{ACCESS}$ is $O(\lg k)$, since it performs a constant number of operations on the red-black tree, priority queue, and hash table. Thus, the average-case running time of $\text{CACHE-MANAGER}$ is $O(n\lg k)$.

b. To show that the problem exhibits optimal substructure, we define the subproblem $(C, i)$ as the contents of the cache just before the $i$th request, where $C$ is a subset of the set of input data elements containing at most $k$ of them. A solution to $(C, i)$ is a sequence of decisions that specifies which element to evict (if any) for each request $i, i + 1, \ldots, n$. An optimal solution to $(C, i)$ is a solution that minimizes the number of cache misses.

Let $S$ be an optimal solution to $(C, i)$. Let $S'$ be the subsolution of $S$ for requests $i + 1, i + 2, \ldots, n$. If a cache hit occurs on the $i$th request, then the cache remains unchanged. If a cache miss occurs, then the $i$th request results in the contents of the cache changing to $C'$ (possibly with $C' = C$ if no element was evicted). We claim that $S'$ is an optimal solution to $(C', i + 1)$. Why? If $S'$ were not an optimal solution to $(C', i + 1)$, then there exists another solution $S''$ to $(C', i + 1)$ that makes fewer cache misses than $S'$. By combining $S''$ with the decision of $S$ at the $i$th request, we obtain another solution that makes fewer cache misses than $S$, which contradicts our assumption that $S$ is an optimal solution to $(C, i)$.

Suppose the $i$th request results in a cache miss. Let $P_C$ be the set of all cache states that can be reached from $C$ through a single decision of the cache manager. The set $P_C$ contains up to $k + 1$ states: $k$ of them arising from different elements of the cache being evicted and one arising from the decision of evicting no element. For example, if $C = {r_1, r_2, r_3}$ and the requested data element is $r_4$, then

$$P_C = \{\{r_1, r_2, r_3\}, \{r_1, r_2, r_4\}, \{r_1, r_3, r_4\}, \{r_2, r_3, r_4\}\}.$$ Let $miss(C, i)$ denote the minimum number of cache misses for $(C, i)$. We can state a recurrence for $miss(C, i)$ as

$$miss(C, i) = \begin{cases} 0 & \text{if i = n and r_n \in C}, \\ 1 & \text{if i = n and r_n \notin C}, \\ miss(C, i + 1) & \text{if i < n and r_i \in C}, \\ 1 + \min\limits_{C' \in P_C} \{miss(C', i + 1)\} & \text{if i < n and r_i \notin C}. \end{cases}$$

Thus, we conclude that the problem exhibits optimal substructure.

c. To prove that the furthest-in-future strategy yields an optimal solution, we show that the problem exhibits the greedy-choice property. Combined with the optimal-substructure property from part (b), the greedy-choice property will prove that furthest-in-future produces the minimum possible number of cache misses.

We use the definitions of subproblem, solution, and optimal solution from part (b). Since we will be comparing different solutions, let us define $C_{Ai}$ as the state of the cache for solution $A$ just before the ith request. The following theorem is the key.

Theorem (Greedy-choice property)

Let $A$ be some optimal solution to $(C, i)$. Let b be the element in $C_{Ai} \cup \{r_i\}$ whose next access at the time of the $i$th request is furthest in the future, at time $m$. Then, we can construct another solution $A'$ to $(C, i)$ that has the following properties:

1. On the $i$th request, $A'$ evicts $b$.
2. For $i + 1 \le j \le m$, the caches $C_{Aj}$ and $C_{A' j}$ differ by at most one element. If they differ, then $b \in C_{Aj}$ is always the element in $C_{Aj}$ that is not in $C_{A' j}$. Equivalently, if $C_{Aj}$ and $C_{A' j}$ differ, we can write $C_{Aj} = D_j \cup \{b\}$ and $C_{A' j} = D_j \cup \{x\}$, where $D_j$ is a size-($k - 1$) set and $x \ne b$ is some data element.
3. For requests $i, \ldots, m - 1$, if $A$ has a cache hit, then $A'$ has a cache hit.
4. $C_{Aj} = C_{A' j}$ for $j > m$.
5. For requests $i, \ldots, m$, the number of cache misses produced by $A'$ is at most the number of cache misses produced by $A$.

Proof

If $A$ evicts $b$ at request $i$, then the proof of the theorem is trivial. Therefore, suppose $A$ evicts data element $a$ on request $i$, where $a \ne b$. We will prove the theorem by constructing $A'$ inductively for each request.

1. At request $i$, $A'$ evicts $b$ instead of $a$.
2. We proceed with induction on $j$, where $i + 1 \le j \le m$. The construction for property 1 establishes the base case because $C_{A, i + 1}$ and $C_{A', i + 1}$ differ by just one element and b is the element in $C_{A, i + 1}$ that is not in $C_{A', i + 1}$.

For the induction step, suppose property 2 is true for some request $j$, where $i + 1 \le j < m$. If $A$ does not evict any element or evicts an element in $D_j$, then construct $A'$ to make the same decision on request $j$ as $A$ makes. If $A$ evicts $b$ on request $j$, then construct $A'$ to evict $x$ and keep the same element as $A$ keeps, namely $r_j$. This construction conserves property 2 for $j + 1$. Note that this construction might sometimes insert duplicate elements in the cache. This situation can easily be dealt with by introducing a dummy element for $x$.

3. Suppose $A$ has a cache hit for request $j$, where $i \le j \le m - 1$. Then, $r_j \in D_j$ since $r_j \ne b$. Thus, $r_j \in C_{A' j}$ and $A'$ has a cache hit, too.

4. By property 2, the cache $C_{Am}$ differs from $C_{A' m}$ by at most one element, with $b$ being the element in $C_{Am}$ that might not be in $C_{A' m}$ . If $C_{Am} = C_{A' m}$ , then construct $A'$ to make the same decision on request $m$ as $A$. Otherwise, $C_{Am} \ne C_{A' m}$ and $b \in C_{Am}$. Construct $A'$ to evict $x$ and keep $b$ on request $m$. Since the $m$th request is for element $b$ and $b \in C_{Am}$, $A$ has a cache hit so that it does not evict any element. Thus, we can ensure that $C_{A, m + 1} = C_{A', m + 1}$. From the ($m + 1$)st request on, $A'$ simply makes the same decisions as $A$.
5. By property 3, for requests $i, \ldots, m - 1$, whenever we have a cache hit for $A$, we also have a cache hit for $A'$. Thus, we have to concern ourselves with only the $m$th request. If $A$ has a cache miss on the $m$th request, we are done. Otherwise, $A$ has a cache hit on the $m$th request, and we will prove that there exists at least one request $j$, where $i + 1 \le j \le m - 1$, such that the $j$th request results in a cache miss for $A$ and a cache hit for $A'$. Because $A$ evicts data element $a$ in request $i$, then, by our construction of $A'$, $C_{A', i + 1} = D_{i + 1} \cup \{a\}$. The $m$th request is for data element $b$. If $A$ has a cache hit, then because none of the requests $i + 1, \ldots, m - 1$ were for $b$, $A$ could not have evicted $b$ and brought it back. Moreover, because $A$ has a cache hit on the $m$th request, $b \in C_{Am}$. Therefore, $A$ did not evict $b$ in any of requests $i, \ldots, m - 1$. By our construction, $A'$ did not evict $a$. But a request for $a$ occurs at least once before the $m$th request. Consider the first such instance. At this instance, $A$ has a cache miss and $A'$ has a cache hit.

The above theorem and the optimal-substructure property proved in part (b) imply that furthest-in-future produces the minimum number of cache misses.