5.4 Probabilistic analysis and further uses of indicator random variables

5.4-1

How many people must there be in a room before the probability that someone has the same birthday as you do is at least $1 / 2$? How many people must there be before the probability that at least two people have a birthday on July 4 is greater than $1 / 2$?

The probability of a person not having the same birthday as me is $(n - 1) / n$. The probability of $k$ people not having the same birthday as me is that, squared. We apply the same approach as the text - we take the complementary event and solve it for $k$,

$$ \begin{aligned} 1 - \big(\frac{n - 1}{n}\big)^k & \ge \frac{1}{2} \\ \big(\frac{n - 1}{n}\big)^k & \le \frac{1}{2} \\ k\lg\big(\frac{n - 1}{n}\big) & \ge \lg\frac{1}{2} \\ k = \frac{\log(1 / 2)}{\log(364 / 365)} & \approx 253. \end{aligned} $$

As for the other question,

$$ \begin{aligned} \Pr\{\text{2 born on Jul 4}\} & = 1 - \Pr\{\text{1 born on Jul 4}\} - \Pr\{\text{0 born on Jul 4}\} \\ & = 1 - \frac{k}{n}\big(\frac{n - 1}{n}\big)^{k - 1} - \big(\frac{n - 1}{n}\big)^k \\ & = 1 - \big(\frac{n - 1}{n}\big)^{k - 1}\big(\frac{n + k - 1}{n}\big). \end{aligned} $$

Writing a Ruby programme to find the closest integer, we get $115$.

5.4-2

Suppose that we toss balls into $b$ bins until some bin contains two balls. Each toss is independent, and each ball is equally likely to end up in any bin. What is the expected number of ball tosses?

This is just a restatement of the birthday problem. I consider this all that needs to be said on this subject.

5.4-3 $\star$

For the analysis of the birthday paradox, is it important that the birthdays be mutually independent, or is pairwise independence sufficient? Justify your answer.

Pairwise independence is enough. It's sufficient for the derivation after $\text{(5.6)}$.

5.4-4 $\star$

How many people should be invited to a party in order to make it likely that there are $three$ people with the same birthday?

The answer is $88$. I reached it by trial and error. But let's analyze it with indicator random variables.

Let $X_{ijk}$ be the indicator random variable for the event of the people with indices $i$, $j$ and $k$ have the same birthday. The probability is $1 / n^2$. Then,

$$ \begin{aligned} \text E[X] & = \sum_{i = 1}^n\sum_{j = i + 1}^n \sum_{k = j + 1}^n X_{ijk} \\ & = \sum_{i = 1}^n\sum_{j = i + 1}^n \sum_{k = j + 1}^n \frac{1}{n^2} \\ & = \binom{n}{3}\frac{1}{n^2} \\ & = \frac{k(k - 1)(k - 2)}{6n^2}. \end{aligned} $$

Solving this yields $94$. It's a bit more, but again, indicator random variables are approximate.

Finding more commentary online is tricky.

5.4-5 $\star$

What is the probability that a $k$-string over a set of size $n$ forms a $k$-permutation? How does this question relate to the birthday paradox?

$$ \begin{aligned} \Pr\{k\text{-perm in }n\} & = 1 \cdot \frac{n - 1}{n} \cdot \frac{n - 2}{n} \cdots \frac{n - k + 1}{n} \\ & = \frac{n!}{n^k(n - k)!}. \end{aligned} $$

This is the complementary event to the birthday problem, that is, the chance of $k$ people have distinct birthdays.

5.4-6 $\star$

Suppose that $n$ balls are tossed into $n$ bins, where each toss is independent and the ball is equally likely to end up in any bin. What is the expected number of empty bins? What is the expected number of bins with exactly one ball?

Let $X_i$ be the indicator variable that bin $i$ is empty after all balls are tossed and $X$ be the random variable that gives the number of empty bins. Thus we have

$$E[X] = \sum_{i = 1}^n E[X_i] = \sum_{i = 1}^n \bigg(\frac{n - 1}{n}\bigg)^n = n\bigg(\frac{n - 1}{n}\bigg)^n.$$

Let $X_i$ be the indicator variable that bin $i$ contains exactly $1$ ball after all balls are tossed and $X$ be the random variable that gives the number of bins containing exactly $1$ ball. Thus we have

$$E[X] = \sum_{i = 1}^n E[X_i] = \sum_{i = 1}^n \binom{n}{1}\bigg(\frac{n - 1}{n}\bigg)^{n - 1} \frac{1}{n} = n\bigg(\frac{n - 1}{n}\bigg)^{n - 1},$$

because we need to choose which toss will go into bin $i$, then multiply by the probability that that toss goes into that bin and the remaining $n − 1$ tosses avoid it.

5.4-7 $\star$

Sharpen the lower bound on streak length by showing that in $n$ flips of a fair coin, the probability is less than $1 / n$ that no streak longer than $\lg n - 2\lg\lg n$ consecutive heads occurs.

We split up the n flips into $n / s$ groups where we pick $s = \lg(n) - 2 \lg(\lg(n))$. We will show that at least one of these groups comes up all heads with probability at least $\frac{n - 1}{n}$. So, the probability the group starting in position $i$ comes up all heads is

$$\Pr(A_{i,\lg n - 2\lg(\lg n)}) = \frac{1}{2^{\lg n - 2\lg(\lg n)}} = \frac{\lg n^2}{n}.$$

Since the groups are based of of disjoint sets of IID coin flips, these probabilities are independent. so,

$$ \begin{aligned} \Pr(\bigwedge\neg A_{i,\lg n - 2\lg(\lg n)}) & = \prod_i\Pr(\neg A_{i,\lg n - 2\lg(\lg n)}) \\ & = \Big(1-\frac{\lg n^2}{n}\Big)^{\frac{n}{\lg n - 2\lg(\lg n)}} \\ & \le e^{-\frac{\lg n^2}{\lg n - 2\lg(\lg n)}} \\ &= \frac{1}{n} e^{\frac{-2\lg(\lg n)\lg n}{\lg n - 2\lg(\lg n)}} \\ & = n^{-1-\frac{2\lg(\lg n)}{\lg n - 2\lg(\lg n)}} \\ & < n^{-1}. \end{aligned} $$

Showing that the probability that there is no run of length at least $\lg n - 2\lg(\lg n)$ to be $< \frac{1}{n}$.