32.4 The Knuth-Morris-Pratt algorithm

32.4-1

Compute the prefix function $\pi$ for the pattern $\text{ababbabbabbababbabb}$.

$$\pi = \{ 0, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 3, 4, 5, 6, 7, 8 \}.$$

32.4-2

Give an upper bound on the size of $\pi^*[q]$ as a function of $q$. Give an example to show that your bound is tight.

$|\pi^*[q]| < q$.

32.4-3

Explain how to determine the occurrences of pattern $P$ in the text $T$ by examining the $\pi$ function for the string $PT$ (the string of length $m + n$ that is the concatenation of $P$ and $T$).

$\{ q \mid \pi[q] = m \text{ and } q \ge 2m \}$.

32.4-4

Use an aggregate analysis to show that the running time of $\text{KMP-MATCHER}$ is $\Theta$.

The number of $q = q + 1$ is at most $n$.

32.4-5

Use a potential function to show that the running time of $\text{KMP-MATCHER}$ is $\Theta(n)$.

$\Phi = p.$

32.4-6

Show how to improve $\text{KMP-MATCHER}$ by replacing the occurrence of $\pi$ in line 7 (but not line 12) by $\pi'$, where $\pi'$ is defined recursively for $q = 1, 2, \ldots, m - 1$ by the equation

$$ \pi'[q] = \begin{cases} 0 & \text{ if } \pi[q] = 0, \\ \pi'[\pi[q]] & \text{ if } \pi[q] \ne 0 \text{ and } P[\pi[q] + 1] = P[q + 1] \\ \pi[q] & \text{ if } \pi[q] \ne 0 \text{ and } P[\pi[q] + 1] \ne P[q + 1]. \end{cases} $$

Explain why the modified algorithm is correct, and explain in what sense this change constitutes an improvement.

If $P[q + 1] \ne T[i]$, then if $P[\pi[q] + q] = P[q + 1] \ne T[i]$, there is no need to compare $P[\pi[q] + q]$ with $T[i]$.

32.4-7

Give a linear-time algorithm to determine whether a text $T$ is a cyclic rotation of another string $T'$. For example, $\text{arc}$ and $\text{car}$ are cyclic rotations of each other.

Find $T'$ in $TT$.

32.4-8 $\star$

Give an $O(m|\Sigma|)$-time algorithm for computing the transition function $\delta$ for the string-matching automaton corresponding to a given pattern $P$. (Hint: Prove that $\delta(q, a) = \delta(\pi[q], a)$ if $q = m$ or $P[q + 1] \ne a$.)

Compute the prefix function $m$ times.