28.3 Symmetric positive-definite matrices and least-squares approximation

28.3-1

Prove that every diagonal element of a symmetric positive-definite matrix is positive.

To see this, let $e_i$ be the vector that is $0$s except for a $1$ in the $i$th position. Then, we consider the quantity $e_i^{\text T}Ae_i$ for every $i$. $Ae_i$ takes each row of $A$ and pulls out the $i$th column of it, and puts those values into a column vector. Then, we multiply that on the left by $e_i^{\text T}$, pulls out the $i$th row of this quantity, which means that the quantity $e_i^{\text T}Ae_i$ exactly the value of $A_{i, i}$.

So, we have that by positive definiteness, since $e_i$ is nonzero, that quantity must be positive. Since we do this for every $i$, we have that every entry along the diagonal must be positive.

28.3-2

Let

$$ A = \begin{pmatrix} a & b \\ b & c \end{pmatrix} $$

be a $2 \times 2$ symmetrix positive-definite matrix. Prove that its determinant $ac - b^2$ is positive by "completing the square" in a manner similar to that used in the proof of Lemma 28.5.

Let $x = -by / a$. Since $A$ is positive-definite, we have

$$ \begin{aligned} 0 & < \begin{pmatrix} x & y \end{pmatrix}^{\text T} A \begin{pmatrix} x \\ y \end{pmatrix} \\ & = \begin{pmatrix} x & y \end{pmatrix}^{\text T} \begin{pmatrix} ax + by \\ bx + cy \end{pmatrix} \\ & = ax^2 + 2bxy + cy^2 \\ & = cy^2 - \frac{b^2y^2}{a} \\ & = (c - b^2 / a)y^2. \end{aligned} $$

Thus, $c - b^2 / a > 0$, which implies $ac - b^2 > 0$, since $a > 0$.

28.3-3

Prove that the maximum element in a symmetric positive-definite matrix lies on the diagonal.

Suppose to a contradiction that there were some element $a_{ij}$ with $i \ne j$ so that $a_{ij}$ were a largest element. We will use $e_i$ to denote the vector that is all zeroes except for having a $1$ at position $i$. Then, we consider the value $(e_i − e_j)^{\text T} A(e_i − e_j)$. When we compute $A(e_i - e_j)$ this will return a vector which is column $i$ minus column $j$. Then, when we do the last multiplication, we will get the quantity which is the $i$th row minus the $j$th row. So,

$$ \begin{aligned} (e_i - e_j)^{\text T} A(e_i - e_j) & = a_{ii} - a_{ij} - a_{ji} + a_{jj} \\ & = a_{ii} + a_{jj} - 2a_{ij} \le 0 \end{aligned} $$

Where we used symmetry to get that $a_{ij} = a_{ji}$. This result contradicts the fact that $A$ was positive definite. So, our assumption that there was a element tied for largest off the diagonal must of been false.

28.3-4

Prove that the determinant of each leading submatrix of a symmetrix positive-definite matrix is positive.

The claim clearly holds for matrices of size $1$ because the single entry in the matrix is positive the only leading submatrix is the matrix itself. Now suppose the claim holds for matrices of size $n$, and let $A$ be an $(n + 1) \times (n + 1)$ symmetric positive-definite matrix. We can write $A$ as

$$ A = \left[ \begin{array}{ccc|c} & & & \\ & A' & & w \\ & & & \\ \hline & v & & c \end{array} \right] . $$

Then $A'$ is clearly symmetric, and for any $x$ we have $x^{\text T} A'x = \begin{pmatrix} x & 0 \end{pmatrix} A \begin{pmatrix} x \\ 0 \end{pmatrix} > 0$, so $A'$ is positive-definite. By our induction hypothesis, every leading submatrix of $A'$ has positive determinant, so we are left only to show that $A$ has positive determinant. By Theorem D.4, the determinant of $A$ is equal to the determinant of the matrix

$$ B = \left[ \begin{array}{c|ccc} c & & v & \\ \hline & & & \\ w & & A' & \\ & & & \end{array} \right] . $$

Theorem D.4 also tells us that the determinant is unchanged if we add a multiple of one column of a matrix to another. Since $0 < e_{n + 1}^{\text T} Ae_{n + 1} = c$, we can use multiples of the first column to zero out every entry in the first row other than $c$. Specifically, the determinant of $B$ is the same as the determinant of the matrix obtained in this way, which looks like

$$ C = \left[ \begin{array}{c|ccc} c & & 0 & \\ \hline & & & \\ w & & A'' & \\ & & & \end{array} \right] . $$

By definition, $\det(A) = c\det(A'')$. By our induction hypothesis, $\det(A'') > 0$. Since $c > 0$ as well, we conclude that $\det(A) > 0$, which completes the proof.

28.3-5

Let $A_k$ denote the $k$th leading submatrix of a symmetric positive-definite matrix $A$. Prove that $\text{det}(A_k) / \text{det}(A_{k - 1})$ is the $k$th pivot during $\text{LU}$ decomposition, where, by convention, $\text{det}(A_0) = 1$.

When we do an LU decomposition of a positive definite symmetric matrix, we never need to permute the rows. This means that the pivot value being used from the first operation is the entry in the upper left corner. This gets us that for the case $k = 1$, it holds because we were told to define $\det(A_0) = 1$, getting us, $a_{11} = \det(A_1) / \det(A_0)$. When Diagonalizing a matrix, the product of the pivot values used gives the determinant of the matrix. So, we have that the determinant of $A_k$ is a product of the $k$th pivot value with all the previous values. By induction, the product of all the previous values is $\det(A_{k − 1})$. So, we have that if $x$ is the $k$th pivot value, $\det(A_k) = x\det(A_{k − 1})$, giving us the desired result that the $k$th pivot value is $\det(A_k) / \det(A_{k − 1})$.

28.3-6

Find the function of the form

$$F(x) = c_1 + c_2x\lg x + c_3 e^x$$

that is the best least-squares fit to the data points

$$(1, 1), (2, 1), (3, 3), (4, 8).$$

First we form the $A$ matrix

$$ A = \begin{pmatrix} 1 & 0 & e \\ 1 & 2 & e^2 \\ 1 & 3\lg 3 & e^3 \\ 1 & 8 & e^4 \end{pmatrix} . $$

We compute the pseudoinverse, then multiply it by $y$, to obtain the coefficient vector

$$ c = \begin{pmatrix} 0.411741 \\ -0.20487 \\ 0.16546 \end{pmatrix} . $$

28.3-7

Show that the pseudoinverse $A^+$ satisfies the following four equations:

$$ \begin{aligned} AA^+A & = A, \\ A^+AA^+ & = A^+, \\ (AA^+)^{\text T} & = AA^+, \\ (A^+A)^{\text T} & = A^+A. \end{aligned} $$

$$ \begin{aligned} AA^+A & = A((A^{\text T}A)^{-1}A^{\text T})A \\ & = A(A^{\text T}A)^{-1}(A^{\text T}A) \\ & = A, \end{aligned} $$

$$ \begin{aligned} A^+AA^+ & = ((A^{\text T}A)^{-1}A^{\text T})A((A^{\text T}A)^{-1}A^{\text T}) \\ & = (A^{\text T}A)^{-1}(A^{\text T}A)(A^{\text T}A)^{-1}A^{\text T} \\ & = (A^{\text T}A)^{-1}A^{\text T} \\ & = A^+, \end{aligned} $$

$$ \begin{aligned} (AA^+)^{\text T} & = (A(A^{\text T}A)^{-1}A^{\text T})^{\text T} \\ & = A((A^{\text T}A)^{-1})^{\text T}A^{\text T} \\ & = A((A^{\text T}A)^{\text T})^{-1}A^{\text T} \\ & = A(A^{\text T}A)^{-1}A^{\text T} \\ & = AA^+, \end{aligned} $$

$$ \begin{aligned} (A^+A)^{\text T} & = ((A^{\text T}A)^{-1}A^{\text T}A)^{\text T} \\ & = ((A^{\text T}A)^{-1}(A^{\text T}A))^{\text T} \\ & = I^{\text T} \\ & = I \\ & = (A^{\text T}A)^{-1}(A^{\text T}A) \\ & = A^+A. \end{aligned} $$