# 5.4 Probabilistic analysis and further uses of indicator random variables

## 5.4-1

How many people must there be in a room before the probability that someone has the same birthday as you do is at least $1 / 2$? How many people must there be before the probability that at least two people have a birthday on July 4 is greater than $1 / 2$?

The probability of a person not having the same birthday as me is $(n - 1) / n$. The probability of $k$ people not having the same birthday as me is that, squared. We apply the same approach as the text - we take the complementary event and solve it for $k$,

\begin{aligned} 1 - \big(\frac{n - 1}{k}\big)^k & \ge \frac{1}{2} \\ \big(\frac{n - 1}{k}\big)^k & \le \frac{1}{2} \\ k\lg\big(\frac{n - 1}{n}\big) & \ge \lg\frac{1}{2} \\ k = \frac{\log(1 / 2)}{\log(364 / 365)} & \approx 263. \end{aligned}

As for the other question,

\begin{aligned} \Pr\{\text{2 born on Jul 4}\} & = 1 - \Pr\{\text{1 born on Jul 4}\} - \Pr\{\text{0 born on Jul 4}\} \\ & = 1 - \frac{k}{n}\big(\frac{n - 1}{n}\big)^{k - 1} - \big(\frac{n - 1}{n}\big)^k \\ & = 1 - \big(\frac{n - 1}{n}\big)^{k - 1}\big(\frac{n + k - 1}{n}\big). \end{aligned}

Writing a Ruby programme to find the closest integer, we get $115$.

## 5.4-2

Suppose that we toss balls into $b$ bins until some bin contains two balls. Each toss is independent, and each ball is equally likely to end up in any bin. What is the expected number of ball tosses?

This is just a restatement of the birthday problem. I consider this all that needs to be said on this subject.

## 5.4-3 $\star$

For the analysis of the birthday paradox, is it important that the birthdays be mutually independent, or is pairwise independence sufficient? Justify your answer.

Pairwise independence is enough. It's sufficient for the derivation after $\text{(5.6)}$.

## 5.4-4 $\star$

How many people should be invited to a party in order to make it likely that there are $three$ people with the same birthday?

The answer is $88$. I reached it by trial and error. But let's analyze it with indicator random variables.

Let $X_{ijk}$ be the indicator random variable for the event of the people with indices $i$, $j$ and $k$ have the same birthday. The probability is $1 / n^2$. Then,

\begin{aligned} \text E[X] & = \sum_{i = 1}^n\sum_{j = i + 1}^n \sum_{k = j + 1}^n X_{ijk} \\ & = \sum_{i = 1}^n\sum_{j = i + 1}^n \sum_{k = j + 1}^n \frac{1}{n^2} \\ & = \binom{n}{3}\frac{1}{n^2} \\ & = \frac{k(k - 1)(k - 2)}{6n^2}. \end{aligned}

Solving this yields $94$. It's a bit more, but again, indicator random variables are approximate.

Finding more commentary online is tricky.

## 5.4-5 $\star$

What is the probability that a $k$-string over a set of size $n$ forms a $k$-permutation? How does this question relate to the birthday paradox?

\begin{aligned} \Pr\{k\text{-perm in }n\} & = 1 \cdot \frac{n - 1}{n} \cdot \frac{n - 2}{n} \cdots \frac{n - k + 1}{n} \\ & = \frac{(n - 1)!}{(n - k)!n^k}. \end{aligned}

This is the complementary event to the birthday problem, that is, the chance of $k$ people have distinct birthdays.

## 5.4-6 $\star$

Suppose that $n$ balls are tossed into $n$ bins, where each toss is independent and the ball is equally likely to end up in any bin. What is the expected number of empty bins? What is the expected number of bins with exactly one ball?

First we determine the expected number of empty bins. We define a random variable $X$ to be the number of empty bins, so that we want to compute $\text E[X]$. Next, for $i = 1, 2, \ldots, n$, we define the indicator random variable $Y_i = I\{\text{bin } i \text{ is empty}\}$. Thus,

$$X = \sum_{i = 1}^n Y_i,$$

and so

\begin{aligned} \text E[X] & = \text E \Bigg[\sum\limits_{i = 1}^n Y_i\Bigg] & \\ & = \sum\limits_{i = 1}^n\text E[Y_i] & \text{(by linearity of expectation)} \\ & = \sum\limits_{i = 1}^n\Pr\{\text{bin i is empty}\}. & \text{(by Lemma 5.1)} \end{aligned}

Let us focus on a specific bin, say bin $i$. We view a toss as a success if it misses bin $i$ and as a failure if it lands in bin $i$. We have $n$ independent Bernoulli trials, each with probability of success $1 - 1 / n$. In order for bin $i$ to be empty, we need $n$ successes in $n$ trials. Using a binomial distribution, therefore, we have that

\begin{aligned} \Pr\{\text{bin i is empty}\} & = \binom{n}{n}\Big(1 - \frac{1}{n}\Big)^n\Big(\frac{1}{n}\Big)^0 \\ & = \Big(1 - \frac{1}{n}\Big)^n. \end{aligned}

Thus,

\begin{aligned} \text E[X] & = \sum_{i = 1}^n\Big(1 - \frac{1}{n})^n \\ & = n\Big(1 - \frac{1}{n}\Big)^n. \end{aligned}

By equation $\text{(3.14)}$, as $n$ approaches $\infty$, the quantity $(1 - 1 / n)^n$ approaches $1/e$, and so $\text E[X]$ approaches $n/e$.

Now we determine the expected number of bins with exactly one ball. We redefine $X$ to be number of bins with exactly one ball, and we redefine $Y_i$ to be $I\{\text{bin$i$gets exactly one ball}\}$. As before, we find that

$$\text E[X] = \sum_{i = 1}^n \Pr\{\text{bin i gets exactly one ball}\}.$$

Again focusing on bin $i$, we need exactly $n - 1$ successes in $n$ independent Bernoulli trials, and so

\begin{aligned} \Pr\{\text{bin i gets exactly one ball}\} & = \binom{n}{n - 1} \Big(1 - \frac{1}{n}\Big)^{n - 1} \Big(\frac{1}{n}\Big)^1 \\ & = n \cdot \Big(1 - \frac{1}{n} \Big)^{n - 1} \frac{1}{n} \\ & = \Big(1 - \frac{1}{n})^{n - 1}, \end{aligned}

and so

\begin{aligned} \text E[X] & = \sum_{i = 1}^n \Big(1 - \frac{1}{n}\Big)^{n - 1} \\ & = n \Big(1 - \frac{1}{n})^{n - 1}. \end{aligned}

Because

$$n\Big(1 - \frac{1}{n}\Big)^{n - 1} = \frac{n(1 - \frac{1}{n})^n}{1 - \frac{1}{n}},$$

as $n$ approaches $\infty$, we find that $\text E[X]$ approaches

$$\frac{n / e}{1 - 1 / n} = \frac{n^2}{e(n - 1)}.$$

## 5.4-7 $\star$

Sharpen the lower bound on streak length by showing that in $n$ flips of a fair coin, the probability is less than $1 / n$ that no streak longer than $\lg n - 2\lg\lg n$ consecutive heads occurs.

We split up the n flips into $n / s$ groups where we pick $s = \lg(n) - 2 \lg(\lg(n))$. We will show that at least one of these groups comes up all heads with probability at least $\frac{n - 1}{n}$. So, the probability the group starting in position $i$ comes up all heads is

$$\Pr(A_{i,\lg n - 2\lg(\lg n)}) = \frac{1}{2^{\lg n - 2\lg(\lg n)}} = \frac{\lg n^2}{n}.$$

Since the groups are based of of disjoint sets of IID coin flips, these probabilities are independent. so,

\begin{aligned} \Pr(\bigwedge\neg A_{i,\lg n - 2\lg(\lg n)}) & = \prod_i\Pr(\neg A_{i,\lg n - 2\lg(\lg n)}) \\ & = \Big(1-\frac{\lg n^2}{n}\Big)^{\frac{n}{\lg n - 2\lg(\lg n)}} \\ & \le e^{-\frac{\lg n^2}{\lg n - 2\lg(\lg n)}} \\ &= \frac{1}{n} e^{\frac{-2\lg(\lg n)\lg n}{\lg n - 2\lg(\lg n)}} \\ & = n^{-1-\frac{2\lg(\lg n)}{\lg n - 2\lg(\lg n)}} \\ & < n^{-1}. \end{aligned}

Showing that the probability that there is no run of length at least $\lg n - 2\lg(\lg n)$ to be $< \frac{1}{n}$.