81 Probabilistic lower bounds on comparison sorting
In this problem, we prove a probabilistic $\Omega(n\lg n)$ lower bound on the running time of any deterministic or randomized comparison sort on $n$ distinct input elements. We begin by examining a deterministic comparison sort $A$ with decision tree $T_A$. We assume that every permutation of $A$'s inputs is equally likely.
a. Suppose that each leaf of $T_A$ is labeled with the probability that it is reached given a random input. Prove that exactly $n!$ leaves are labeled $1 / n!$ and that the rest are labeled $0$.
b. Let $D(T)$ denote the external path length of a decision tree $T$; that is, $D(T)$ is the sum of the depths of all the leaves of $T$. Let $T$ be a decision tree with $k > 1$ leaves, and let $LT$ and $RT$ be the left and right subtrees of $T$. Show that $D(T) = D(LT) + D(RT)+k$.
c. Let $d(k)$ be the minimum value of $D(T)$ over all decision trees $T$ with $k > 1$ leaves. Show that $d(k) = \min _{1 \le i \le k  1} \{ d(i) + d(k  i) + k \}$. ($\textit{Hint:}$ Consider a decision tree $T$ with $k$ leaves that achieves the minimum. Let $i_0$ be the number of leaves in $LT$ and $k  i_0$ the number of leaves in $RT$.)
d. Prove that for a given value of $k > 1$ and $i$ in the range $1 \le i \le k  1$, the function $i\lg i + (k  i) \lg(k  i)$ is minimized at $i = k / 2$. Conclude that $d(k) = \Omega(k\lg k)$.
e. Prove that $D(T_A) = \Omega(n!\lg(n!))$, and conclude that the averagecase time to sort $n$ elements is $\Omega(n\lg n)$.
Now, consider a randomized comparison sort $B$. We can extend the decisiontree model to handle randomization by incorporating two kinds of nodes: ordinary comparison nodes and "randomization" nodes. A randomization node models a random choice of the form $\text{RANDOM}(1, r)$ made by algorithm $B$; the node has $r$ children, each of which is equally likely to be chosen during an execution of the algorithm.
f. Show that for any randomized comparison sort $B$, there exists a deterministic comparison sort $A$ whose expected number of comparisons is no more than those made by $B$.
a. For a comparison algorithm $A$ to sort, no two input permutations can reach the same leaf of the decision tree, so there must be at least $n!$ leaves reached in $T_A$, one for each possible input permutation. Since $A$ is a deterministic algorithm, it must always reach the same leaf when given a particular permutation as input, so at most $n!$ leaves are reached (one for each permutation). Therefore exactly $n!$ leaves are reached, one for each input permutation.
These $n!$ leaves will each have probability $1 / n!$, since each of the $n!$ possible permutations is the input with the probability $1 / n!$. Any remaining leaves will have probability $0$, since they are not reached for any input.
Without loss of generality, we can assume for the rest of this problem that paths leading only to $0$probability leaves aren't in the tree, since they cannot affect the running time of the sort. That is, we can assume that $T_A$ consists of only the $n!$ leaves labeled $1 / n!$ and their ancestors.
b. If $k > 1$, then the root of $T$ is not a leaf. This implies that all of $T$'s leaves are leaves in $LT$ and $RT$. Since every leaf at depth $h$ in $LT$ or $RT$ has depth $h + 1$ in $T$, $D(T)$ must be the sum of $D(LT)$, $D(RT)$, and $k$, the total number of leaves. To prove this last assertion, let $d_T(x) =$ depth of node $x$ in tree $T$. Then,
$$ \begin{aligned} D(T) & = \sum_{x \in T} d_T(x) \\ & = \sum_{x \in LT} d_T(x) + \sum_{x \in RT} d_T(x) \\ & = \sum_{x \in LT} (d_{LT}(x) + 1) + \sum_{x \in RT} (d_{RT}(x) + 1) \\ & = \sum_{x \in LT} d_{LT}(x) + \sum_{x \in RT} d_{RT}(x) + \sum_{x \in T} 1 \\ & = D(LT) + D(RT) + k. \\ \end{aligned} $$
c. To show that $d(k) = \min_{1\le i\le k  1}{d(i) + d(k  i) + k}$ we will show separately that
$$ \begin{aligned} & d(k) \le \min_{1\le i\le k  1}{d(i) + d(k  i) + k} \\ \text{and } & d(k) \ge \min_{1\le i\le k  1}{d(i) + d(k  i) + k}. \end{aligned} $$

To show that $d(k) \le \min_{1\le i\le k  1}{d(i) + d(k  i) + k}$, we need only show that $d(k) \le d(i) + d(k  i) + k$, for $i = 1, 2, \ldots, k  1$. For any $i$ from $1$ to $k  1$ we can find trees $RT$ with $i$ leaves and $LT$ with $k  i$ leaves such that $D(RT) = d(i)$ and $D(LT) = d(k  i)$. Construct $T$ such that $RT$ and $LT$ are the right and left subtrees of $T$'s root respectively. Then
$$ \begin{aligned} d(k) & \le D(T) & \text{(by definition of $d$ as min $D(T)$ value)} \\ & = D(RT) + D(LT) + k & \text{(by part (b))} \\ & = d(i) + d(k  i) + k. & \text{(by choice of $RT$ and $LT$)} \end{aligned} $$

To show that $d(k) \ge \min_{1\le i\le k  1}{d(i) + d(k  i) + d}$, we need only show that $d(k) \ge d(i) + d(k  i) + k$, for some $i$ in $\{1, 2, \ldots, k  1\}$. Take the tree $T$ with $k$ leaves such that $D(T) = d(k)$, let $RT$ and $LT$ be $T$'s right and left subtree, respectively, and let $i$ be the number of leaves in $RT$. Then $k  i$ is the number of leaves in $LT$ and
$$ \begin{aligned} d(k) & = D(T) & \text{(by choice of $T$)} \\ & = D(RT) + D(LT) + k & \text{(by part (b))} \\ & \ge d(i) + d(k  i) + k. & \text{(by definition of $d$ as min $D(T)$ value)} \end{aligned} $$
Neither $i$ nor $k  i$ can be $0$ (and hence $1 \le i \le k  1$), since if one of these were $0$, either $RT$ or $LT$ would contain all $k$ leaves of $T$, and that $k$leaf subtree would have a $D$ equal to $D(T)  k$ (by part (b)), contradicting the choice of $T$ as the $k$leaf tree with the minimum $D$.
d. Let $f_k(i) = i\lg i + (k  i)\lg(k  i)$. To find the value of $i$ that minimizes $f_k$, find the $i$ for which the derivative of $f_k$ with respect to $i$ is $0$:
$$ \begin{aligned} f_k'(i) & = \frac{d}{di} \Big(\frac{i\ln i + (k  i)\ln(k  i)}{\ln 2}\Big) \\ & = \frac{\ln i + 1  \ln(k  i)  1}{\ln 2} \\ & = \frac{\ln i  \ln(k  i)}{\ln 2} \end{aligned} $$
is $0$ at $i = k / 2$. To verify this is indeed a minimum (not a maximum), check that the second derivative of $f_k$ is positive at $i = k / 2$:
$$ \begin{aligned} f_k''(i) & = \frac{d}{di}\Big(\frac{\ln i  \ln(k  i)}{\ln 2}\Big) \\ & = \frac{1}{\ln 2}\Big(\frac{1}{i} + \frac{1}{k  i}\Big). \end{aligned} $$
$$ \begin{aligned} f_k''(k / 2) & = \frac{1}{\ln 2}\Big(\frac{2}{k} + \frac{2}{k}\Big) \\ & = \frac{1}{\ln 2} \cdot \frac{4}{k} \\ & > 0 & \text{since $k > 1$}. \end{aligned} $$
Now we use substitution to prove $d(k) = \Omega(k\lg k)$. The base case of the induction is satisfied because $d(1) \ge 0 = c \cdot 1 \cdot \lg 1$ for any constant $c$. For the inductive step we assume that $d(i) \ge ci\lg i$ for $1 \le i \le k  1$, where $c$ is some constant to be determined.
$$ \begin{aligned} d(k) & = \min_{1\le i\le k  1} {d(i) + d(k  i) + k} \\ & \ge \min_{1\le i\le k  1} {c(i\lg i + (k  i)\lg(k  i)) + k} \\ & = \min_{1\le i\le k  1} {cf_k(i) + k} \\ & = c\Big(\frac{k}{2}\lg\frac{k}{2}\Big(k  \frac{k}{2}\Big)\lg\Big(k  \frac{k}{2}\Big)\Big) + k \\ & = ck\lg\Big(\frac{k}{2}\Big) + k \\ & = c(k\lg k  k) + k \\ & = ck\lg k + (k  ck) \\ & \ge ck\lg k & \text{if $c \le 1$}, \end{aligned} $$
and so $d(k) = \Omega(k\lg k)$.
e. Using the result of part (d) and the fact that $T_A$ (as modified in our solution to part (a)) has $n!$ leaves, we can conclude that
$$D(T_A) \ge d(n!) = \Omega(n!\lg(n!)).$$
$D(T_A)$ is the sum of the decisiontree path lengths for sorting all input permutations, and the path lengths are proportional to the run time. Since the $n!$ permutations have equal probability $1 / n!$, the expected time to sort $n$ random elements (1 input permutation) is the total time for all permutations divided by $n!$:
$$\frac{\Omega(n!\lg(n!))}{n!} = \Omega(\lg(n!)) = \Omega(n\lg n).$$
f. We will show how to modify a randomized decision tree (algorithm) to define a deterministic decision tree (algorithm) that is at least as good as the randomized one in terms of the average number of comparisons.
At each randomized node, pick the child with the smallest subtree (the subtree with the smallest average number of comparisons on a path to a leaf). Delete all the other children of the randomized node and splice out the randomized node itself.
The deterministic algorithm corresponding to this modified tree still works, because the randomized algorithm worked no matter which path was taken from each randomized node.
The average number of comparisons for the modified algorithm is no larger than the average number for the original randomized tree, since we discarded the higheraverage subtrees in each case. In particular, each time we splice out a randomized node, we leave the overall average less than or equal to what it was, because
 the same set of input permutations reaches the modified subtree as before, but those inputs are handled in less than or equal to average time than before, and
 the rest of the tree is unmodified.
The randomized algorithm thus takes at least as much time on average as the corresponding deterministic one. (We've shown that the expected running time for a deterministic comparison sort is $\Omega(n\lg n)$, hence the expected time for a randomized comparison sort is also $\Omega(n\lg n)$).