Skip to main content

Section 14.1 Rational Canonical Form

“If everything on Earth were rational, nothing would happen.”
―Fyodor Dostoevsky

Subsection 14.1.1 Making Transformations into Modules

Given a vector space over a field, we can finagle it into an \(F[x]\) module instead. As \(F\) is a field, \(F[x]\) is a PID, and thus we can use the Fundamental Theorem CITEX to decompose it into a quotient by a direct sum of polynomials. These polynomials are the invariant factors of the module.

Definition 14.1. Endomorphisms and Linear Operators.

Let \(M\) be an \(R\)-module. An \(R\)-module homomorphism \(\varphi\) from \(M\) to itself is known as an endomorphism. In the case where \(R\) is a field and \(M\) a vector space, \(\varphi\) is known as a linear operator. The set of all endomorphisms of \(M\) is denoted \(\End_R(M)\text{.}\)

Remark 14.2.

Equivalently, \(\End_R(M)=\Hom_R(M,M)\text{.}\)

Proof.

If \(V\) is an \(F[x]\) module then \(V\) is an \(F\)-vector space by restriction of scalars along the inclusion \(F \hookrightarrow F[x]\text{.}\) Let \(T: V \rightarrow V\) be defined by \(T(v)=x v\text{.}\) It can be shown that \(T \in \operatorname{End}_{F}(V)\) since \(T\left(v_{1}+v_{2}\right)=x\left(v_{1}+v_{2}\right)=x v_{1}+x v_{2}=T\left(v_{1}\right)+T\left(v_{2}\right)\) and \(T(c v)=x(c v)=c(x v)\) for any \(c \in F, v, v_{1}, v_{2} \in V\) by the axioms of the \(F[x]\)-module.
Conversely, let \(V\) be an \(F\)-vector space and \(T \in \operatorname{End}_{F}(V)\text{.}\) Consider the evaluation homomorphism \(\varphi: F[x] \rightarrow \operatorname{End}_{F}(V), \quad \varphi(f(x))=f(T)\text{.}\) (For example, if \(f(x)=x^{2}+5\) then \(\varphi(f(x))=T \circ T+5 \cdot \mathrm{id}_{V}\text{.}\)) Since \(V\) is an \(\operatorname{End}_{F}(V)\)-module by Remark 3.17, then \(V\) is also an \(F[x]\)-module by restriction of scalars along \(\phi\text{.}\) Concretely, this action is given by
\begin{equation*} f(x) v=(f(T))(v) . \end{equation*}
(For example, if \(f(x)=x^{2}+5\text{,}\) then \(f(x) v=T(T(v))+5 v\text{.}\))

Proof.

We leave the proof that for \(f, g \in \operatorname{End}_{R}(V)\) we have \([f \circ g]_{B}^{B}=[f]_{B}^{B}[g]_{B}^{B}\) as an exercise.

Remark 14.5.

Suppose \(F\) is a field and \(M\) is a \(F[x]\)-module. By restriction of scalars along the canonical ring map \(F \into F[x]\) we may regard \(M\) as a \(F\)-vector space — let us write this vector space as \(M|_F\) to be precise. Let \(\l_x: M|_F \to M|_F\) be the map given by \(\l_x(m) = xm\text{.}\) Then \(\l_x\) is an \(F\)-linear operator on \(M|_F\text{.}\) So, to a \(F[x]\)-module \(M\) we may associate the pair \((M|_F, \l_x)\) where \(M|_F\) is an \(F\)-vector space and \(\l_x\) is an \(F\)-linear operator on \(M|_F\text{.}\) This process is reversible:

Definition 14.6. \(F[x]\)-Module \(V_g\).

Let \(F\) be a field, let \(V\) be a finite dimensional vector space over \(F\text{,}\) and let \(g: V \to V\) be an \(F\)-linear operator. The \(F[x]\)-module \(V_g\) is defined to be the abelian group \((V, +)\) equipped with the rule for \(F[x]\) scaling given by
\begin{equation*} (c_nx^n + \dots+ c_0)v = c_ng^{\circ n}(v) + \dots + c_1 g(v) + c_0v \end{equation*}
for any polynomial \(c_nx^n + \dots + c_0 \in F[x]\) and vector \(v \in V\text{.}\)

Example 14.8. Special Case of \(V_g\).

We have the following special case (it isn’t really special — the general case reduces to this one upon choosing a basis):
Given a matrix \(A \in \Mat_{n \times n}(F)\text{,}\) then \(F^n_A\) is the \(F[x]\)-module whose underlying abelian group is \(F^n\) (column vectors) with the usual rule for addition and with the rule for \(F[x]\) scaling given by
\begin{equation*} (\sum_i c_i x^i) \cdot v := \sum c_i A^i v \end{equation*}
for any column vector \(v\text{.}\) For short, we write this rule as
\begin{equation*} f(x) \cdot v = f(A) v \end{equation*}
for any polynomial \(f(x) \in k[x]\text{,}\) where \(f(A)\) is the matrix obtained by evaluating \(f(x)\) at \(x = A\) in the evident sense.

Example 14.9. \(\Mat_{2\times 2}(\Q)\).

Let \(A=\begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}\in \Mat_{2 \times 2}(\Q)\) and let \(M\) be the \(\Q[x]\)-module \(\Q^2_A\text{.}\) So \(M = \Q^2\) as a \(\Q\)-vector space, and \(x\) acts on \(M\) by sending \(\vectwo{a}{b} \in M\) to \(\vectwo{a+b}{b} \in M\text{.}\) I claim there is an isomorphism
\begin{equation*} M \cong \Q[x]/(x-1)^2 \end{equation*}
of \(\Q[x]\)-modules.
Solution.
Let \(v = \vectwo{0}{1} \in M\text{.}\) Note that \(x \cdot v = \vectwo{1}{1}\) and that \(v\) and \(x \cdot v\) span \(\Q^2\) as a \(\Q\)-vector space. It follows that \(v\) generates \(M\) as a \(\Q[x]\)-module; in detail, for any \(\vectwo{a}{b} \in \Q^2\) we have \((b- a + ax)v = \vectwo{0}{a} + \vectwo{a}{a} = \vectwo{a}{b}\text{.}\)
Define a \(\Q[x]\)-module homomorphism
\begin{equation*} \pi: \Q[x] \onto M \end{equation*}
by sending \(1\) to \(v \in M\) and hence \(p(x)\) to \(p(x) \cdot v\text{.}\) It is onto since \(v\) generates \(M\) as a \(\Q[x]\)-module. The kernel will be a (necessarily principle) ideal of \(\Q[x]\text{;}\) we just need to find it. Note that \(x^2 v = \vectwo21\text{,}\) \(xv\) and \(v\) are linearly dependent and in fact we have
\begin{equation*} x^2 v = 2x v- v \end{equation*}
and hence \((x^2 -2x + 1)v =0\text{.}\) This gives that \(x^2 -2x + 1\) is in the kernel of \(\pi\) and hence, by the \(0\)-th Isomorphism Theorem we have an induced homomorphism of \(\Q[x]\)-modules
\begin{equation*} \ov{\pi}: \Q[x]/(x^2 - 2x + 1) \onto M \end{equation*}
induced by \(\pi\text{.}\) The map \(\ov{\pi}\) is onto since \(\pi\) is onto. Since the source and target both have dimension two as \(\Q\)-vector spaces, \(\ov{\pi}\) is \(\Q\)-linear, and \(\ov{\pi}\) is onto, it must in fact be an isomorphism of \(\Q[x]\)-modules (by the Rank-Nullity Theorem CITEX).

Remark 14.11.

In fact, these rules determine an “isomorphism of categories’’, whatever that means.

Subsection 14.1.2 Rational Heads Prevail

Definition 14.12. Block Diagonal Matrix.

Given square matrices \(A_1 \in \Mat_{n_1 \times n_1}(F), \dots, A_m \in \Mat_{n_m \times n_m}(F)\text{,}\) we define \(A_1 \oplus \cdots \oplus A_m\) to be the block diagonal matrix
\begin{equation*} \begin{bmatrix} A_1 & 0 & \cdots & 0 \\ 0 & A_2 & \cdots & 0 \\ \vdots & & \ddots & \vdots \\ 0 & \cdots & 0 & A_m \\ \end{bmatrix} \end{equation*}
which belongs to \(\Mat_{n \times n}(F)\) for \(n = \sum_i n_i\text{.}\)

Proof.

We know by the Fundamental Theorem of modules over \(F[x]\) (i.e., Corollary ) that there is a \(F[x]\)-module isomorphism
\begin{equation*} V_{g} \cong F[x]/(f_1)\oplus \cdots \oplus F[x]/(f_s) \end{equation*}
for some unique list of monic, non-constant polynomials \(f_1, \cdots, f_s\) with \(f_i \mid f_{i+1}\) for all \(i\text{.}\) Recall that the operator \(g\) on \(V\) is given as \(\l_x\) (multiplication by \(x\)) on \(V_g\text{.}\) Since this is a \(F[x]\)-module isomorphism, \(g\) corresponds to multiplication by \(x\) on each summand \(F[x]/(f_i)\text{.}\) As we have seen before, for each \(i\text{,}\) the matrix representing \(\l_x\) on \(F[x]/f_i\) relative to the basis \(B_i := \{1, x, x^2, \dots, x^{\deg(f_i)-1}\}\) of \(F[x]/f_i\) is the companion matrix \(C(f_i)\) of \(f_i\text{.}\) Let \(B\) be the \(F\)-basis of \(F[x]/(f_1)\oplus \cdots \oplus F[x]/(f_s)\) given by tuples
\begin{equation*} B = \{(b_1, 0, \dots, 0) \mid b_1 \in B_1\} \cup \{(0, b_2, 0, \dots, 0) \mid b_2 \in B_{2\} \cup}\cdots \cup \{(0, 0, \dots, 0, b_s) \mid b_s \in B_s\} \end{equation*}
(in that order). Then the matrix of \(\l_x\) on \(F[x]/(f_1)\oplus \cdots \oplus F[x]/(f_k)\) for \(B\) is \(C(f_1) \oplus \cdots \oplus C(f_s)\text{.}\)
This gives existence. Uniqueness is a consequence of the uniqueness of the list \(f_1, \dots, f_s\text{,}\) but I will omit the details.

Remark 14.14.

The matrix is unique, but the basis \(B\) that realizes it is, in general, not unique. As an extreme example illustrating this: Take \(g\) to be the identity operator on a finite dimensional vector space \(V\text{.}\) Then \([g]_B^B = I_n\) holds for any basis \(B\text{.}\) (Note that \(I_n\) is indeed in rational canonical form: it is equal to \(C(x-1) \oplus \cdots \oplus C(x-1)\text{.}\))

Definition 14.15. Invariant Factor.

Recall that in Theorem [provisional cross-reference: cite], CITEX the number \(r\) is the rank of \(G\text{,}\) the numbers \(n_1,\ldots,n_t\) are the invariant factors of \(G\text{,}\) and the decomposition of \(G\) in part (1) is the invariant factor decomposition of \(G\text{.}\)

Example 14.16. Back to \(\Mat_{2\times 2}(\Q)\).

Let us return to the example of \(A = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}\in \Mat_{2 \times 2}(\Q)\) to illustrate the Theorem and its proof. By the previous example we have an isomorphism of \(\Q[x]\)-module
\begin{equation*} \Q^2_A \cong \Q[x]/(x^2 - 2x + 1). \end{equation*}
Recall that \(\l_x\) (multiplication by \(x\)) on \(\Q^2_A\) is given by multiplication by the matrix \(A\text{.}\) This is an isomorphism of \(\Q[x]\)-modules, and so \(A\) corresponds to the operator \(\l_x\) on \(\Q[x]/(x^2 - 2x + 1)\text{.}\) As we have seen before, relative to the basis \(\{1, x\}\text{,}\) the matrix for \(\l_x\) is
\begin{equation*} C(x^2 - 2x + 1) = \begin{bmatrix} 0 & -1 \\ 1 & 2 \\ \end{bmatrix}. \end{equation*}
This is the Rational Canonical Form of \(A\text{.}\) \(A\) has just one invariant factor, namely \(x^2 - 2x + 1\text{.}\)
By the way, tracking through the calculations that got us here, we see that the basis of \(\Q^2\) that gives the RCF of \(A\) if \(\vectwo01, \vectwo11\) of \(\Q^2\text{.}\)

Example 14.18. Similarity Classes of \(4\times 4\) Matrices.

Let \(F = \Z/p\) be the field with \(p\) elements for some prime \(p\text{.}\) Up to similarity, how many \(4 \times 4\) matrices are there with entries in \(F\text{?}\)
Solution.
Each such matrix is similar to a unique one of the form
\begin{equation*} C(f_1) \oplus \cdots \oplus C(f_k) \end{equation*}
with \(f_1, \dots, f_k\) monic polynomials of positive degree such that \(f_1 \mid \cdots \mid f_k\text{.}\) Moreover, since \(C(f)\) is a \(d \times d\) matrix where \(d = \deg(f)\text{,}\) we must have \(\deg(f_1) + \cdots + \deg(f_k) = 4\text{.}\) So the goal becomes to count all such tuples \((f_1, \dots, f_k)\) of polynomials. We proceed by cases on \(k\text{.}\) Note that \(k \geq 5\) is not possible.
  • Case \(k = 1\).
    Then \(\deg(f_1) = 4\) and the number of such polynomials is \(p^4\) (since \(f_1 = x^4 + a_3 x^3 + a_2 x^2 + a_1 x + a_0\) and \(F\) has \(p\) elements).
  • Case \(k = 2\).
    Note that \(\deg(f_1) \geq 3\) is not possible. If \(\deg(f_1) = 2\) then \(f_1 = f_2\text{,}\) and there are \(p^2\) possibilities. If \(\deg(f_1) = 1\text{,}\) then \(\deg(f_2) = 3\) and \(f_2 = f_1 \cdot g\) with \(g\) monic and \(\deg(g) = 2\text{.}\) There are \(p\) possibilities for \(f_1\) and \(p^2\) for \(g\text{,}\) for a total of \(p^3\) in this subcase. The total for this case is thus \(p^2 + p^3\text{.}\)
  • Case \(k= 3\).
    The only possibilities are \(\deg(f_1) = 1\text{,}\) \(\deg(f_2) = 1\) and \(\deg(f_3) = 2\) so that \(f_2 = f_1\) and \(f_3 = f_1 \cdot g\) with \(\deg(g) =1\text{.}\) We get \(p^3\) possibilities.
  • Case \(k = 4\).
    We must have \(f_1 = f_2 = f_3 = f_4\) with each of degree \(1\text{,}\) for a total of \(p\) possibilities. The total is
    \begin{equation*} p^4 + p^2+ p^3 + p^3 + p. \end{equation*}

Remark 14.19.

The proof of Theorem [provisional cross-reference: cite] CITEX makes clear the following fact: For a field \(F\text{,}\) finite dimensional vector space \(V\text{,}\) and \(F\)-linear operator \(g: V \to V\text{,}\)the invariant factors of the operator \(g\) are identical to the invariant factors of the \(F[x]\)-module \(V_g\text{.}\)
The following result is thus very useful for finding the Rational Canonical Form of an operator (we will state it just for operators given explicitly by matrices):

Proof.

For this proof it is useful to identity \(F[x]^n\) with \(F^n[x]\) where the latter refers to all expressions of the form \(\sum_i v_i x^i = v_0 + v_1 x + \cdots + v_m x^m\) with \(v_i \in F^n\text{.}\) For instance, (when \(n = 2\)) we identify \(\vectwo{x^2 -3x + 1}{x^7 +x + 5}\) with \(\vectwo{1}{5} + \vectwo{-3}{1} x + \vectwo{1}{0} x^2 + \vectwo{0}{1} x^7\text{.}\) Using this identification we define
\begin{equation*} \phi: F[x]^n = F^n[x] \to F^n_A \end{equation*}
by \(\phi(\sum_i v_i x^i) = \sum_i A^i v_i\text{.}\) Then \(\phi\) is a \(F[x]\)-module homomorphism — I leave it to you to verify this. \(\phi\) is onto since, e.g., for any \(v \in F^n\) we have \(\phi(vx^0) = v\text{.}\)
We have
\begin{equation*} \phi((xI_n - A) \cdot \sum_i v_i x^i) = \phi(\sum_i v_i x^{i+1} - \sum_i (Av_i) x^i) = \sum_i A^{i+1}(v_i) - \sum_i A^i(Av_i) = 0 \end{equation*}
and hence \(\ker(\phi) \supseteq \im(xI_n - A)\text{.}\) By the \(0\)-th isomorphism theorem, there is an induced \(F[x]\)-module homomorphism
\begin{equation*} \ov{\phi}: \frac{F^n[x]}{\im(xI_n - A)} \onto F^n_A \end{equation*}
induced by \(\phi\text{,}\) and it is onto since \(\phi\) is onto. It remains to show this map is one-to-one.
Since \(\overline{\phi}\) is \(F[x]\)-linear it is certainly \(F\)-linear. Since \(\dim_F(F^n_A) = n\text{,}\) to prove \(\overline{\phi}\) is one-to-one, it suffices to prove \(\dim_F \frac{F^n[x]}{\im(xI_n - A)} \leq n\) (by Rank-Nullity). I claim the images of the standard basis \(e_1, \dots, e_n\) in \(\frac{F^n[x]}{\im(xI_n - A)}\) span it as an \(F\)-vector space. To see this, note that \(x^je_i\text{,}\) for \(1 \leq i \leq n\text{,}\) \(j \geq 0\) span \(F^n[x]\) as an \(F\)-vector space, and hence they span the quotient. It thus suffices to show \(\overline{x^j e_i}\) lies in the span of \(\overline{e_1}, \dots, \overline{e_n}\) in \(\frac{F^n[x]}{\im(xI_n - A)}\) for all \(i\) and \(j\text{.}\) We have \(x^{j-1} \cdot (A - x) e_i \in \im(xI_n - A)\) and thus
\begin{equation*} \overline{x^j e_i} = \overline{x^{j-1} Ae_i} \end{equation*}
and by repeating this argument we have
\begin{equation*} \overline{x^j e_i} =\overline{x^{j-1} Ae_i} = \overline{x^{j-2} A^2e_i} = \cdots =\overline{A^je_i} \in \Span_F(e_1, \dots, e_n). \end{equation*}

Proof.

Let \(S\) be the Smith Normal Form of \(xI_n - A\) and let \(d_1, \dots, d_n\) be its diagonal entries. As proven before, the matrix \(xI_n - A\) and \(S\) present isomorphic \(F[x]\)-modules, and thus the Theorem gives an isomorphism
\begin{equation*} F^n_A \cong F[x]/(d_1) \oplus \cdots \oplus F[x]/(d_n). \end{equation*}
Since \(\dim_F F^n_A < \infty\text{,}\) none of the \(d_i\)’s can be zero. So, each \(d_i\) is monic and \(d_1 \mid \cdots \mid d_n\text{.}\) Now some of the \(d_i\) might be non-zero constants, in which case \(d_i\) is a unit and \(F[x]/(d_i) = 0\text{.}\) Upon tossing those out, we are left with
\begin{equation*} F^n_A \cong F[x]/(f_1) \oplus \cdots \oplus F[x]/(f_k) \end{equation*}
with each \(f_i\) monic of positive degree and \(f_1 \mid \cdots \mid f_k\text{.}\) These are, by definition, the invariant factors of \(A\text{.}\)

Example 14.22. Once More to Back to \(\Mat_{2\times 2}(\Q)\).

Let’s find the invariant factors of the matrix \(A= \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}\in \M_{2 \times 2}(\Q)\) we looked at before, but this time using the Theorem 14.20 and Corollary 14.21.
Solution.
We have
\begin{equation*} xI_2-A= \begin{bmatrix} x-1 & -1 \\ 0 & x-1 \end{bmatrix}. \end{equation*}
To find the invariant factors of \(A\) we just need to find the Smith Normal Form of \(xI_2-A\text{.}\) I’ll do this two ways:
  • Method I: Do row and column operations using the generalized Euclidean algorithm:.
    \begin{equation*} \begin{bmatrix} x-1 & -1 \\ 0 & x-1 \end{bmatrix} \mapsto \begin{bmatrix} x-1 & -1 \\ (x-1)^2 & 0 \end{bmatrix} \mapsto \begin{bmatrix} 0 & -1 \\ (x-1)^2 & 0 \end{bmatrix} \mapsto \begin{bmatrix} 1 & 0 \\ 0 & (x-1)^2 \end{bmatrix}. \end{equation*}
    Tossing out the unit, we see that the only invariant factor is \((x-1)^2\text{,}\) as before.
  • Method II: Call \(d_1, d_2\) the entries on the diagonal of the SNF of \(xI_2-A\text{:}\).
    \begin{equation*} \begin{bmatrix} x-1 & -1 \\ 0 & x-1 \end{bmatrix} \mapsto \begin{bmatrix} x-1 & -1 \\ (x-1)^2 & 0 \end{bmatrix} \mapsto \begin{bmatrix} 0 & -1 \\ (x-1)^2 & 0 \end{bmatrix} \mapsto \begin{bmatrix} 1 & 0 \\ 0 & (x-1)^2 \end{bmatrix}. \end{equation*}
    Tossing out the unit, we see that the only invariant factor is \((x-1)^2\text{,}\) as before.
Recall from Theorem that \(d_1\) is the gcd of the entries of \(xI_2-A\) and \(d_1d_2=\det(xI_2-A)\text{.}\) Thus \(d_1=1\) and \(d_2=\det(xI_2-A)=(x-1)^2\text{.}\) Therefore the only invariant factor of \(A\) is \((x-1)^2\text{.}\)

Example 14.23. Finding IFs and RCF.

Let
\begin{equation*} A = \begin{bmatrix} 1 & -1 & 1 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \\ \end{bmatrix}. \end{equation*}
Let us find the invariant factors and Rational Canonical Form of \(A\) by finding the Smith Normal Form of \(xI_3 - A\text{.}\)
Solution.
We have
\begin{equation*} xI_3 - A = \begin{bmatrix} x-1 & 1 & -1 \\ 0 & x & -1 \\ 0 & -1 & x \\ \end{bmatrix}. \end{equation*}
A sequence of messy row and column operations yields
\begin{equation*} \begin{bmatrix} 1 & 0 & 0 \\ 0 & x-1 & 0 \\ 0 & 0 & x^2-1 \\ \end{bmatrix}. \end{equation*}
Note that this is indeed in Smith Normal Form. It follows that the invariant factors of \(A\) are \(x-1, x^2-1\) and the RCF of \(A\) is
\begin{equation*} C(x-1) \oplus C(x^2-1) = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \\ \end{bmatrix}. \end{equation*}
For an alternative approach, we could use that the diagonal entries \(d_1, d_2, d_3\) of the Smith Normal Form of \(xI_3 - A\) satisfy \(d_1 = \gcd(xI_3 - A)\text{,}\) \(d_1d_2\) is the gcd of the \(2 \times 2\) minors of \(xI_3 - A\text{,}\) and \(d_1d_2d_3 = \det(xI_3 - A)\text{.}\) It’s clear that \(d_1 = 1\) and an easy calculation gives that \(\det(xI_3 - A) = (x-1)(x^2 -1)\text{.}\) There are nine \(2 \times 2\) minors of \(xI_3 - A\text{,}\) and a tedious check reveals that each of them is one of \(x^2 -1\text{,}\) \(x(x-1)\text{,}\) \(x-1\) or \(0\) (up to signs). So \(d_1d_2 = x-1\text{.}\) We get that
\begin{equation*} d_1 = 1, d_2 = x-1, d_3 = x^2 -1 \end{equation*}
as before.