5-1 Probabilstic counting

With a $b$-bit counter, we can ordinarily only count up to $2^b - 1$. With R. Morris's probabilistic counting, we can count up to a much larger value at the expense of some loss of precision.

We let a counter value of $i$ represent that a count of $n_i$ for $i = 0, 1, \ldots, 2^b - 1$, where the $n_i$ form an increasing sequence of nonnegative values. We assume that the initial value of the counter is $0$, representing a count of $n_0 = 0$. The $\text{INCREMENT}$ operation works on a counter containing the value $i$ in a probabilistic manner. If $i = 2^b - 1$, then the operation reports an overflow error. Otherwise, the $\text{INCREMENT}$ operation increases the counter by $1$ with probability $1 / (n_{i + 1} - n_i)$, and it leaves the counter unchanged with probability $1 - 1 / (n_{i + 1} - n_i)$.

If we select $n_i = i$ for all $i \ge 0$, then the counter is an ordinary one. More interesting situations arise if we select, say, $n_i = 2^{i - 1}$ for $i > 0$ or $n_i = F_i$ (the $i$th Fibonacci number—see Section 3.2).

For this problem, assume that $n_{2^b - 1}$ is large enough that the probability of an overflow error is negligible.

a. Show that the expected value represented by the counter after $n$ $\text{INCREMENT}$ operations have been performed is exactly $n$.

b. The analysis of the variance of the count represented by the counter depends on the sequence of the $n_i$. Let us consider a simple case: $n_i = 100i$ for all $i \ge 0$. Estimate the variance in the value represented by the register after $n$ $\text{INCREMENT}$ operations have been performed.

a. To determine the expected value represented by the counter after $n$ $\text{INCREMENT}$ operations, we define some random variables:

  • For $j = 1, 2, \ldots, n$, let $X_j$ denote the increase in the value represented by the counter due to the $j$th $\text{INCREMENT}$ operation.
  • Let $V_n$ be the value represented by the counter after $n$ $\text{INCREMENT}$ operations.

Then $V_n = X_1 + X_2 + \cdots + X_n$. We want to compute $\text E[V_n]$. By linearity of expection,

$$ \begin{aligned} \text E[V_n] & = \text E[X_1 + X_2 + \cdots + X_n] \\ & = \text E[X_1] + \text E[X_2] + \cdots + \text E[X_n]. \end{aligned} $$

We shall show that $\text E[X_j] = 1$ for $j = 1, 2, \ldots, n$, which will prove that $\text E[V_n] = n$.

We actually show that $\text E[X_j] = 1$ in two ways, the second more rigorous than the first:

  1. Suppose that at the start of the $j$th $\text{INCREMENT}$ operation, the counter holds the value $i$, which represents $n_i$. If the counter increases due to this $\text{INCREMENT}$ operation, then the value it represents increases by $n_{i + 1} - n_i$. The counter increases with probability $1 / (n_{i + 1} - n_i)$, and so

    $$ \begin{aligned} \text E[X_j] & = (0 \cdot \Pr\{\text{counter does not increase}\}) + ((n_{i + 1} - n_i) \cdot \Pr\{\text{counter increases}\}) \\ & = \Big(0 \cdot\Big(1 - \frac{1}{n_{i + 1} - n_i}\Big)\Big) + \Big((n_{i + 1} - n_i) \cdot \frac{1}{n_{i + 1} - n_i}\Big) \\ & = 1, \end{aligned} $$

    and so $\text E[X_j] = 1$ regardless of the value held by the counter.

  2. Let $C_j$ be the random variable denoting the value held in the counter at the start of the $j$th $\text{INCREMENT}$ operation. Since we can ignore values of $C_j$ greater than $2^b - 1$, we use a formula for conditional expectation:

    $$ \begin{aligned} \text E[X_j] & = \text E[\text E[X_j\mid C_j]] \\ & = \sum_{i = 0}^{2^b - 1} \text E[X_j \mid C_j = i] \cdot \Pr\{C_j = i\}. \end{aligned} $$

    To compute $\text E[X_j \mid C_j = i]$, we note that

    • $\Pr\{X_j = 0 \mid C_j = i\} = 1 - 1 / (n_{i + 1} - n_i)$,
    • $\Pr\{X_j = n_{i + 1} - n_i \mid C_j = i\} = 1 / (n_{i + 1} - n_i)$, and
    • $\Pr\{X_j = k \mid C_j = i\} = 0$ for all other $k$.

    Thus,

    $$ \begin{aligned} \text E[X_j \mid C_j = i] & = \sum_k k \cdot \Pr\{X_j = k \mid C_j = i\} \\ & = \Big(0 \cdot \Big(1 - \frac{1}{n_{i + 1} - n_i}\Big)\Big) + \Big((n_{i + 1} - n_i) \cdot \frac{1}{n_{i + 1} - n_i}\Big) \\ & = 1. \end{aligned} $$

    Therefore, noting that

    $$\sum_{i = 0}^{2^b - 1}\Pr\{C_j = i\} = 1,$$

    we have

    $$ \begin{aligned} \text E[X_j] & = \sum_{i = 0}^{2^b - 1}1 \cdot \Pr\{C_j = i\} \\ & = 1. \end{aligned} $$

Why is the second way more rigorous than the first? Both ways condition on the value held in the counter, but only the second way incorporates the conditioning into the expression for $\text E[X_j]$.

b. Defining $V_n$ and $X_j$ as in part (a), we want to compute $\text{Var}[V_n]$, where $n_i = 100i$. The $X_j$ are pairwise independent, and so by equation $\text{(C.29)}$,

$$\text{Var[$V_n$]} = \text{Var[$X_1$]} + \text{Var[$X_2$]} + \cdots + \text{Var[$X_n$]}.$$

Since $n_i = 100i$, we see that $n_{i + 1} - n_i = 100(i + 1) - 100i = 100$. Therefore, with probability $99 / 100$, the increase in the value represented by the counter due to the $j$th $\text{INCREMENT}$ operation is $0$, and with probability $1 / 100$, the value represented increases by $100$. Thus, by equation $\text{(C.27)}$,

$$ \begin{aligned} \text{Var[$X_j$]} & = \text E[X_j^2] - \text E^2[X_j] \\ & = \Big(\Big(0^2 \cdot \frac{99}{100}\Big) + \Big(100^2 \cdot \frac{1}{100}\Big)\Big) - 1^2 \\ & = 100 - 1 \\ & = 99. \end{aligned} $$

Summing up the variances of the $X_j$ gives $\text{Var}[V_n] = 99n$.