The maiden post: the singular value decomposition

Singular value decomposition

Singular value decomposition (SVD) is perhaps the most interesting and fundamental theorem of linear algebra. Generally speaking, it says that any matrix can be factored into the product of three matrices with specific properties, one of which is a diagonal matrix. (In this post we will just assume our matrices are real-valued).

Specifically, the SVD of a real matrix $A \in \mathbb{R}^{m \times n}$ can be written as the product of three matrices $U$, $\Sigma$, $V^T.$ $$ A = U \Sigma V^T $$ where $U \in \mathbb{R}^{m\times m}$ and $V \in \mathbb{R}^{n \times n}$ are orthonormal matrices and $\Sigma \in \mathbb{R}^{m\times n}$ is a diagonal matrix.

Diagonalization

To understand what this means geometrically, it is worth first considering the diagonalization of a matrix. The diagonalization is a factorization which rewrites a matrix as the product of three matrices, one of which is diagonal. Interestingly, a diagonalization gives you the eigenvalues and eigenvectors of a matrix.

Now, suppose $A$ is real and symmetric. There is a theorem that says $A$ is orthonormally diagonalizable (though we will not prove it here) as $A = UDU^{-1}$ where $U$ contains the eigenvectors of $A$ as columns, and $D$ contains the eigenvalues along its main diagonal (with zeros everywhere else). Decomposing this equation, we see that $A$ is exactly equal to $U$ (its eigenvectors) times a diagonal matrix (which just scales these eigenvectors) times $U^{-1}$ (which just converts back to the standard basis). This is very curious: a real symmetric matrix can be expressed entirely by its eigenvectors and the scaling factors on these eigenvectors (the eigenvalues).

Back to the SVD

Let’s go back to the SVD and see if any parallels arise. Consider some general $m \times n$ matrix $A$ and its SVD: $A = U \Sigma V^T$. Decomposing this equation, we can see that the linear transformation represented by $A$ can be expressed by:

  • taking some basis vectors contained in $U$
  • scaling these vectors using the diagonal matrix $\Sigma$
  • converting these scaled $u_i$’s into another basis (specifically, the columns in $V$)

In some sense, the SVD is really a more general diagonalization. It tells us that any matrix can be represented by a scaling relationship from one basis to another.

A simple problem

With all of this in mind, let’s move on to a simple introductory problem related to the SVD. (Note: this problem is taken from an MIT course on financial mathematics).

Consider the SVD of some matrix. Prove whether the following statement is true or false: the number of non-zero diagonal entries of $\Sigma$ is equal to the rank of $A$.

Consider the kernel of $A$, i.e. the span of vectors that satisfy $Av = 0$. If we left-multiply by $A^T$, we can see that $A^T A v = A^T \cdot 0 = 0$. Therefore, the kernel of $A^T A$ is identical to the kernel of $A$.

By definition, the number of non-zero singular values of $A$ is the number of non-zero eigenvalues of $A^T A$. We can see this by considering $A^T A$: $$ \begin{align*} A^T A &= (U \Sigma V^T)^T U \Sigma V^T \newline &= V \Sigma^T U^T U \Sigma V^T \newline &= V \Sigma^T \Sigma V^T \newline &= V \Sigma^T \Sigma V^{-1} \quad\text{as $V$ is orthogonal} \end{align*} $$

Hence, this is a diagonalization of $A^TA$ with eigenvalues equal to $\sigma_i^2$ and eigenvectors as the columns of $V$.

Counting multiplicities, the number of non-zero eigenvalues of $A^TA$ is equal to $\text{rank}(A^TA)$ (this is trivial to see from the equation $Av = \lambda v$).

Above, we showed that $\text{null}(A) = \text{null}(A^TA)$. By the rank nullity theorem: $$ \begin{align*} \text{null}(A) + \text{rank}(A) &= n \newline \text{rank}(A) &= n - \text{null}(A) \newline &= n - \text{null}(A^TA) \newline &= n - (n - \text{rank}(A^TA)) \newline &= \text{rank}(A^TA) \newline &= \text{(the number of non-zero diagonal entries of $\Sigma$)} \end{align*} $$

Postscript

It is surprisingly difficult to find a web framework that natively supports $\LaTeX$. On this site I am using the light-weight markdown-based framework called Hugo along with a JavaScript renderer for $\LaTeX$ called MathJax.