Skip to main content

Section 14.2 Smith Normal Form

Subsection Stating and Finding the Smith Normal Form of a Matrix

The Smith normal form provides a way to understand the structure of matrices over a PID and is analogous to the row-echelon form and reduced row-echelon form in the case of matrices over a field.

Proof.

Before we begin, note that we will apply a sequence of steps that correspond to multiplication by an invertible matrix on the left or right, and by Lemma 14.20, none of these steps will change the gcd.
To prove existence of such a matrix \(M\text{,}\) we claim we can multiply \(A\) on the right and the left by invertible matrices to transform it into a matrix of the form
\begin{equation*} \left[\begin{array}{ll} g & 0 \\ 0 & B \end{array}\right] \end{equation*}
for some \((n-1) \times(m-1)\) matrix \(B\text{,}\) where \(g=\operatorname{gcd}(A)\text{.}\) If our claim holds, then we are done: notice that \(g\) divides every entry of \(B\text{,}\) since \(\operatorname{gcd}(A)=\operatorname{gcd}(g, B)\) by Lemma 14.20, and so by applying this fact to \(B\text{,}\) and then over and over again, we arrive at a matrix of the desired form \(M\text{.}\) Notice moreover that if \(C\) is an invertible matrix, then so is
\begin{equation*} \left[\begin{array}{ll} 1 & 0 \\ 0 & C \end{array}\right] \end{equation*}
To construct a matrix in the form above, let \(a\) be the upper left \((1,1)\) entry of \(A\text{.}\) First, we are going to show that we can turn \(A\) into a matrix of the form
\begin{equation*} \left[\begin{array}{ll} * & 0 \\ 0 & B \end{array}\right] \end{equation*}
with the same gcd as \(A\text{.}\) If a happens to divide all the entries on the first row and column, then we can simply apply elementary row and column operations to get to the desired form. Suppose there exists \(b\) on the first column such that \(a \nmid b\text{.}\) Then we may apply an elementary row operation switching rows so that \(b=a_{2,1}\) is on the first column, second row. By Lemma 14.21, we can now find an invertible matrix \(C \in \mathrm{M}_{2}(R)\) such that
\begin{equation*} C\left[\begin{array}{l} a \\ b \end{array}\right] =\left[\begin{array}{c} \operatorname{gcd}(a, b) \\ 0 \end{array}\right] . \end{equation*}
Consider the \(m \times m\) matrix
\begin{equation*} P:=\left[\begin{array}{cc} C & 0 \\ 0 & I_{m-2} \end{array}\right] . \end{equation*}
Note that \(P A\) has \((P A)_{1,1}=\operatorname{gcd}(a, b)\) and \((P A)_{2,1}=0\text{,}\) so this step replaces \(a\) by \(\operatorname{gcd}(a, b)\) and \(b\) by 0 . By Lemma 14.20, \(\operatorname{gcd}(P A)=\operatorname{gcd}(A)\text{.}\) We can keep repeating this until the top left corner entry divides every entry on the first column, and this process must stop after at most \(m-1\) steps, since there are only \(m\) elements on the first column.
Similarly, if there exists \(b\) on the first row that \(a\) does not divide, we can repeat this process by instead multiplying \(A\) on the left by an invertible matrix, until we zero out all the remaining entries on the first row and column. Finally, we arrive at a matrix of the form
\begin{equation*} \left[\begin{array}{ll} a & 0 \\ 0 & B \end{array}\right] \end{equation*}
If \(a=\operatorname{gcd}(A)\text{,}\) we are done. If not, then we can find some entry \(b=a_{i, j}\) such that \(a \nmid b\text{.}\) We can then add the \(j\) th column to the first column, which puts \(b\) into the first column without affecting \(a\text{,}\) since the remainder of the top row is zero. But this brings us back to the previous situation, and we have already shown that we can replace the top left corner by \(\operatorname{gcd}(a, b)\text{.}\)
At each step, we replace \(a\) by some \(c\) with is both a divisor of \(a\) and a multiple of \(\operatorname{gcd}(A)\text{.}\) Our ring \(R\) is a UFD, so there are finitely many factors of \(a / \operatorname{gcd}(A)\text{,}\) and this process must stop. This shows that we can eventually replace \(A\) by
\begin{equation*} \left[\begin{array}{cc} \operatorname{gcd}(A) & 0 \\ 0 & B \end{array}\right]. \end{equation*}
Now it remains to show the uniqueness portion of the theorem. For any \(i\) and any matrix \(B\text{,}\) let \(\operatorname{gcd}_{i}(B)\) denote the gcd of all the \(i \times i\) minors of \(B\text{.}\) By Lemma 14.23, \(\operatorname{gcd}_{i}\) is unchanged by row and column operations, so \(\gcd_i(A) = \gcd_i(M )\text{.}\)
For a matrix of the form \(M\text{,}\) the only minors that are nonzero are those where the choices of columns and rows are the same, and hence the only nonzero \(i \times i\) minors of \(M\) are \(g_{s_{1}} \cdots g_{s_{i}}\) for some \(s_{1}<\cdots<s_{i}\text{.}\) Since \(g_{s_{1}} \cdots g_{s_{i}}\) divide each other, it follows that
\begin{equation*} \operatorname{gcd}_{i}(A)=\operatorname{gcd}_{i}(M)=g_{1} \cdots g_{i} . \end{equation*}
In particular, the largest value of \(i\) such that some \(i \times i\) minor of \(A\) is nonzero is \(\ell\text{.}\) Also, we have
\begin{equation*} g_{i}=\frac{\operatorname{gcd}_{i}(A)}{\operatorname{gcd}_{i-1}(A)} . \end{equation*}
This proves uniqueness, for it shows that \(\ell, g_{1}, \ldots, g_{\ell}\) are all defined from \(A\) directly, without any choices.

Remark 14.15.

Note that some of the diagonal entries could be \(0\text{.}\) Recall \(s \mid 0\) for all \(s \in R\) (including \(s = 0\)), and \(0\) is the only element that divides \(0\text{.}\) So, the “tail’’ end of the sequence could be all \(0\)’s, and indeed if \(s_{i,i} = 0\) for some \(i\) then all subsequent diagonal entries must be \(0\text{.}\)
Elementary row and column operations correspond to multiplication by elementary matrices, which are invertible, and that the composition of invertible matrices is invertible. So whenever we apply elementary row and column operations, we can translate it into multiplication by an invertible matrix on the left or the right, respectively.
To transform a matrix \(A\) into its Smith Normal Form, we will use a sequence of steps that all correspond to multiplication by invertible matrices. Many of those steps will actually be elementary row and column operations, which correspond to multiplication by an elementary matrix. Elementary matrices are invertible, and a product of invertible matrices is invertible, and so any finite sequence of elementary row and column operations can be described by multiplication by an invertible matrix. However, in general not every invertible matrix can be obtained as a product of elementary matrices. In fact, there are examples of PIDs \(R\) and matrices \(A\) for which the Smith Normal Form cannot be obtained by simply taking a sequence of elementary row and column operations. However, it is not easy to give such an example, in part because when our PID \(R\) is nice enough, the Smith Normal Form can in fact be obtained by simply taking a sequence of elementary row and column operations. This is the case for Euclidean domains: over such rings, the Euclidean Algorithm for finding the gcd of two elements works, and it’s the key step we will need to find a Smith Normal Form. When \(R\) is a general PID, however, we need to work a little harder.

Example 14.16. Special Case of SNF.

Suppose \(R\) is a Euclidean domain and \(A = (x_1, \dots, x_n)\) is a \(1 \times n\) matrix. Column operations of type I in this case amount to adding a multiple of one element in this list to another one. The SNF Theorem in this case amounts to the Euclidean algorithm: By adding a suitable multiple of the one entry in the first two positions to the other, in the usual back-and-forth way, we arrive at \((g_1, 0, x_3, \dots, x_n)\) where \(g_1 = \gcd(x_1, x_2)\text{.}\) Repeat this for columns \(1\) and \(3\) to arrive at \((g_2, 0, 0, x_4, \dots, x_n)\) where \(g_2 = \gcd(g_1, x_3) = \gcd(x_1, x_2, x_3)\text{.}\) Continuing in this fashion we arrive at \((g, 0, \dots, 0)\) where \(g = \gcd(x_1, \dots, x_n)\text{.}\)
The proof of the SNF Theorem in general amounts to an extended version of the idea used in this special case.

Example 14.17. Finding SNF.

Consider the matrix with entries in \(\Z\)
\begin{equation*} A=\begin{bmatrix} 1 & 6 & 5 & 2 \\ 2 & 1 & -1 & 0 \\ 3 & 0 & 3 & 0 \end{bmatrix}. \end{equation*}
Do row and column operations to put \(A\) into its Smith Normal Form:
\begin{equation*} S=\begin{bmatrix}1 & 0 & 0 & 0 \\0 & 1 & 0 & 0 \\0 & 0 & 6 & 0 \\\end{bmatrix}. \end{equation*}
Conclude that the module presented by \(A\) is isomorphic to \(\Z/6\text{.}\)

Subsection Proof and Uniqueness of Smith Normal Form

Definition 14.18.

Let \(R\) be a PID and let \(a_{1}, \ldots, a_{n} \in R\text{.}\) The greatest common divisor or gcd of \(a_{1}, \ldots, a_{n}\text{,}\) denoted \(\operatorname{gcd}\left(a_{1}, \ldots, a_{n}\right)\text{,}\) is a generator for the principal ideal \(\left(a_{1}, \ldots, a_{n}\right)\text{.}\) Given a matrix \(A \in \mathrm{M}_{m, n}(R), \operatorname{gcd}(A)\) is the gcd of the entries of \(A\text{.}\) We adopt the convention that \(\operatorname{gcd}(0,0)=0\) and thus if \(A\) is the matrix of all zeroes, then \(\operatorname{gcd}(A)=0\text{.}\)

Remark 14.19.

Notice that the greatest common divisor is only defined up to multiplication by a unit.

Proof.

First, suppose that \(n=1\text{,}\) meaning that \(A\) is a column, say
\begin{equation*} A=\left[\begin{array}{c} a_{1} \\ \vdots \\ a_{m} \end{array}\right] \text { and let } P A=\left[\begin{array}{c} b_{1} \\ \vdots \\ b_{m} \end{array}\right] \end{equation*}
We need to show that \(\left(a_{1}, \ldots, a_{m}\right)=\left(b_{1}, \ldots, b_{m}\right)\text{.}\) On the one hand, each \(b_{i}\) is a linear combination of the \(a_{j}\text{,}\) so \(\left(b_{1}, \ldots, b_{m}\right) \subseteq\left(a_{1}, \ldots, a_{m}\right)\text{.}\) On the other hand,
\begin{equation*} \left[\begin{array}{c} a_{1} \\ \vdots \\ a_{m} \end{array}\right] =P^{-1}(P A)=P^{-1} \left[\begin{array}{c} b_{1} \\ \vdots \\ b_{m} \end{array}\right] \end{equation*}
so each \(a_{j}\) is a combination of the \(b_{i}\text{,}\) and \(a_{j} \in\left(b_{1}, \ldots, b_{m}\right)\text{.}\) We conclude that we have an equality of ideals \(\left(a_{1}, \ldots, a_{m}\right)=\left(b_{1}, \ldots, b_{m}\right)\text{,}\) and thus multiplying a column vector by an invertible matrix does not change the greatest common divisor of the entries.
Now given \(A \in \mathrm{M}_{m, n}(R)\text{,}\) if we denote the \(i\) th column of \(A\) by \(A_{i}\text{,}\) we have
\begin{equation*} P A=\left[\begin{array}{lll} P A_{1} & \cdots & P A_{m} \end{array}\right] . \end{equation*}
Since the gcd of each column remains the same, the gcd of all the entries does not change.
To show that \(\operatorname{gcd}(A Q)=\operatorname{gcd}(A)\text{,}\) note that transposing a matrix does not change its gcd nor the fact that its invertible, so we can apply what we have already shown:
\begin{equation*} \operatorname{gcd}(A Q)=\operatorname{gcd}\left((A Q)^{T}\right)=\operatorname{gcd}\left(Q^{T} A^{T}\right)=\operatorname{gcd}\left(A^{T}\right)=\operatorname{gcd}(A) \end{equation*}
Finally, applying elementary row and column operations corresponds to multiplying by an elementary matrix on the left or right, and elementary matrices are invertible.

Proof.

By definition of greatest common divisor, \((x, y)=(\operatorname{gcd}(x, y))\text{,}\) so there exist \(a, b \in R\) such that \(a x+b y=\operatorname{gcd}(x, y)\text{.}\) Write \(g:=\operatorname{gcd}(x, y)\) and \(h=\operatorname{gcd}(a, b)\text{.}\) Then \(a x+b y\) is a multiple of \(g h\text{,}\) but since \(a x+b y=g\) and \(R\) is a domain, we conclude that \(h\) must be a unit, and \((a, b)=(h)=(1)\text{.}\) In particular, we can find \(c, d \in R\) such that \(a d-b c=1\text{.}\) Finally, \(b x+c y \in(x, y)=(g)\text{,}\) so
\begin{equation*} \left[\begin{array}{ll} a & b \\ c & d \end{array}\right] \left[\begin{array}{l} x \\ y \end{array}\right] =\left[\begin{array}{c} g \\ e g \end{array}\right] \end{equation*}
Now we can apply the row operation that adds \(-e\) times the first row to the second row: by setting
\begin{equation*} P:= \left[\begin{array}{cc} 1 & 0 \\ -e & 1 \end{array}\right] \left[\begin{array}{ll} a & b \\ c & d \end{array}\right] =\left[\begin{array}{cc} a & b \\ c-e a & b-d e \end{array}\right] . \end{equation*}
we get
\begin{equation*} P\left[\begin{array}{l} x \\ y \end{array}\right] =\left[\begin{array}{l} g \\ 0 \end{array}\right] \end{equation*}
Finally, one can easily check that
\begin{equation*} P^{-1}=\left[\begin{array}{cc} d & -b \\ -c & a \end{array}\right] \left[\begin{array}{ll} 1 & 0 \\ e & 1 \end{array}\right] \end{equation*}
By transposing the matrices in Lemma 14.21, we can show that there exists an invertible \(2 \times 2\) matrix \(Q\) such that
\begin{equation*} \left[\begin{array}{ll} x & y \end{array}\right] Q=[\operatorname{gcd}(x, y) \quad 0] \end{equation*}

Definition 14.22. Minor.

A \(k \times k\) minor of \(A\) is the determinant of a \(k \times k\) submatrix of \(A\text{;}\) more formally, if \(A\) is an \(m \times n\) matrix, a \(k\times k\) minor of \(A\) is any element of \(R\) given as follows: Choose \(1 \leq i_1 < i_2 < \cdots < i_k \leq m\) and \(1 \leq j_1 < \cdots < j_k \leq n\text{,}\) let \(B = (b_{p,q})\) be the \(k \times k\) matrix with \(b_{p,q} = a_{i_p, j_q}\text{.}\) Then \(\det(B)\) is a minor of \(A\text{.}\)

Proof.

Recall that for a PID \(R\text{,}\) the gcd of any set of elements is defined to be a generatpr of the ideal they generate. So, Lemma 14.20 implies that \(\gcd_k(A) = \gcd_k(S)\) for all \(k\text{.}\) Since \(S\) is diagonal, the only non-zero minors of \(S\) are those given by indices \(1 \leq i_1 < \cdots < i_k \leq m\) and\(1 \leq j_1 < \cdots < j_k \leq m\) for which \(i_t = j_t\) for all \(1 \leq t \leq k\text{,}\) and moreover such a minor is equal to \(d_{i_1} \cdots d_{i_k}\text{.}\) Since \(d_l \mid d_{l+1}\) for all \(l\text{,}\) it follows that \(d_{1} \cdots d_k\) divides \(d_{i_1} \cdots d_{i_k}\) for all \(1 \leq i_1 < \cdots < i_k \leq n\text{.}\) Thus \(\gcd_k(S) = d_1 \cdots d_k\text{,}\) for each \(k\text{,}\) and hence \(d_k = \gcd_k(S)/\gcd_{k-1}(S)\) as claimed.

Remark 14.25.

So, another way of finding the SNF of a matrix \(A\) with entries in a Euclidean domain is to calculate \(\gcd_k(A)\) for all \(k\text{.}\) This is not practical except in very special cases.