Skip to main content

Section 15.1 Rational Canonical Form

“If everything on Earth were rational, nothing would happen.”
―Fyodor Dostoevsky

Subsection Making Transformations into Modules

Given a vector space over a field, we can finagle it into an \(F[x]\) module instead. As \(F\) is a field, \(F[x]\) is a PID, and thus we can use the Fundamental Theorem CITEX to decompose it into a quotient by a direct sum of polynomials. These polynomials are the invariant factors of the module.

Definition 15.1. Endomorphisms and Linear Operators.

Let \(M\) be an \(R\)-module. An \(R\)-module homomorphism \(\varphi\) from \(M\) to itself is known as an endomorphism. In the case where \(R\) is a field and \(M\) a vector space, \(\varphi\) is known as a linear operator. The set of all endomorphisms of \(M\) is denoted \(\End_R(M)\text{.}\)

Remark 15.2.

Equivalently, \(\End_R(M)=\Hom_R(M,M)\text{.}\)

Proof.

If \(V\) is an \(F[x]\) module then \(V\) is an \(F\)-vector space by restriction of scalars along the inclusion \(F \hookrightarrow F[x]\text{.}\) Let \(T: V \rightarrow V\) be defined by \(T(v)=x v\text{.}\) It can be shown that \(T \in \operatorname{End}_{F}(V)\) since \(T\left(v_{1}+v_{2}\right)=x\left(v_{1}+v_{2}\right)=x v_{1}+x v_{2}=T\left(v_{1}\right)+T\left(v_{2}\right)\) and \(T(c v)=x(c v)=c(x v)\) for any \(c \in F, v, v_{1}, v_{2} \in V\) by the axioms of the \(F[x]\)-module.
Conversely, let \(V\) be an \(F\)-vector space and \(T \in \operatorname{End}_{F}(V)\text{.}\) Consider the evaluation homomorphism \(\varphi: F[x] \rightarrow \operatorname{End}_{F}(V), \quad \varphi(f(x))=f(T)\text{.}\) (For example, if \(f(x)=x^{2}+5\) then \(\varphi(f(x))=T \circ T+5 \cdot \mathrm{id}_{V}\text{.}\)) Since \(V\) is an \(\operatorname{End}_{F}(V)\)-module by Remark 3.17, then \(V\) is also an \(F[x]\)-module by restriction of scalars along \(\phi\text{.}\) Concretely, this action is given by
\begin{equation*} f(x) v=(f(T))(v) . \end{equation*}
(For example, if \(f(x)=x^{2}+5\text{,}\) then \(f(x) v=T(T(v))+5 v\text{.}\))

Proof.

We leave the proof that for \(f, g \in \operatorname{End}_{R}(V)\) we have \([f \circ g]_{B}^{B}=[f]_{B}^{B}[g]_{B}^{B}\) as an exercise.

Remark 15.5.

Suppose \(F\) is a field and \(M\) is a \(F[x]\)-module. By restriction of scalars along the canonical ring map \(F \into F[x]\) we may regard \(M\) as a \(F\)-vector space — let us write this vector space as \(M|_F\) to be precise. Let \(\l_x: M|_F \to M|_F\) be the map given by \(\l_x(m) = xm\text{.}\) Then \(\l_x\) is an \(F\)-linear operator on \(M|_F\text{.}\) So, to a \(F[x]\)-module \(M\) we may associate the pair \((M|_F, \l_x)\) where \(M|_F\) is an \(F\)-vector space and \(\l_x\) is an \(F\)-linear operator on \(M|_F\text{.}\) This process is reversible:

Definition 15.6. \(F[x]\)-Module \(V_g\).

Let \(F\) be a field, let \(V\) be a finite dimensional vector space over \(F\text{,}\) and let \(g: V \to V\) be an \(F\)-linear operator. The \(F[x]\)-module \(V_g\) is defined to be the abelian group \((V, +)\) equipped with the rule for \(F[x]\) scaling given by
\begin{equation*} (c_nx^n + \dots+ c_0)v = c_ng^{\circ n}(v) + \dots + c_1 g(v) + c_0v \end{equation*}
for any polynomial \(c_nx^n + \dots + c_0 \in F[x]\) and vector \(v \in V\text{.}\)

Example 15.8. Special Case of \(V_g\).

We have the following special case (it isn’t really special — the general case reduces to this one upon choosing a basis):
Given a matrix \(A \in \Mat_{n \times n}(F)\text{,}\) then \(F^n_A\) is the \(F[x]\)-module whose underlying abelian group is \(F^n\) (column vectors) with the usual rule for addition and with the rule for \(F[x]\) scaling given by
\begin{equation*} (\sum_i c_i x^i) \cdot v := \sum c_i A^i v \end{equation*}
for any column vector \(v\text{.}\) For short, we write this rule as
\begin{equation*} f(x) \cdot v = f(A) v \end{equation*}
for any polynomial \(f(x) \in k[x]\text{,}\) where \(f(A)\) is the matrix obtained by evaluating \(f(x)\) at \(x = A\) in the evident sense.

Example 15.9. \(\Mat_{2\times 2}(\Q)\).

Let \(A=\begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}\in \Mat_{2 \times 2}(\Q)\) and let \(M\) be the \(\Q[x]\)-module \(\Q^2_A\text{.}\) So \(M = \Q^2\) as a \(\Q\)-vector space, and \(x\) acts on \(M\) by sending \(\vectwo{a}{b} \in M\) to \(\vectwo{a+b}{b} \in M\text{.}\) I claim there is an isomorphism
\begin{equation*} M \cong \Q[x]/(x-1)^2 \end{equation*}
of \(\Q[x]\)-modules.
Solution.
Let \(v = \vectwo{0}{1} \in M\text{.}\) Note that \(x \cdot v = \vectwo{1}{1}\) and that \(v\) and \(x \cdot v\) span \(\Q^2\) as a \(\Q\)-vector space. It follows that \(v\) generates \(M\) as a \(\Q[x]\)-module; in detail, for any \(\vectwo{a}{b} \in \Q^2\) we have \((b- a + ax)v = \vectwo{0}{a} + \vectwo{a}{a} = \vectwo{a}{b}\text{.}\)
Define a \(\Q[x]\)-module homomorphism
\begin{equation*} \pi: \Q[x] \onto M \end{equation*}
by sending \(1\) to \(v \in M\) and hence \(p(x)\) to \(p(x) \cdot v\text{.}\) It is onto since \(v\) generates \(M\) as a \(\Q[x]\)-module. The kernel will be a (necessarily principle) ideal of \(\Q[x]\text{;}\) we just need to find it. Note that \(x^2 v = \vectwo21\text{,}\) \(xv\) and \(v\) are linearly dependent and in fact we have
\begin{equation*} x^2 v = 2x v- v \end{equation*}
and hence \((x^2 -2x + 1)v =0\text{.}\) This gives that \(x^2 -2x + 1\) is in the kernel of \(\pi\) and hence, by the \(0\)-th Isomorphism Theorem we have an induced homomorphism of \(\Q[x]\)-modules
\begin{equation*} \ov{\pi}: \Q[x]/(x^2 - 2x + 1) \onto M \end{equation*}
induced by \(\pi\text{.}\) The map \(\ov{\pi}\) is onto since \(\pi\) is onto. Since the source and target both have dimension two as \(\Q\)-vector spaces, \(\ov{\pi}\) is \(\Q\)-linear, and \(\ov{\pi}\) is onto, it must in fact be an isomorphism of \(\Q[x]\)-modules (by the Rank-Nullity Theorem CITEX).

Remark 15.11.

In fact, these rules determine an “isomorphism of categories’’, whatever that means.

Subsection Rational Heads Prevail

Definition 15.12. Block Diagonal Matrix.

Given square matrices \(A_1 \in \Mat_{n_1 \times n_1}(F), \dots, A_m \in \Mat_{n_m \times n_m}(F)\text{,}\) we define \(A_1 \oplus \cdots \oplus A_m\) to be the block diagonal matrix
\begin{equation*} \begin{bmatrix} A_1 & 0 & \cdots & 0 \\ 0 & A_2 & \cdots & 0 \\ \vdots & & \ddots & \vdots \\ 0 & \cdots & 0 & A_m \\ \end{bmatrix} \end{equation*}
which belongs to \(\Mat_{n \times n}(F)\) for \(n = \sum_i n_i\text{.}\)

Proof.

We know by the Fundamental Theorem of modules over \(F[x]\) (i.e., Corollary ) that there is a \(F[x]\)-module isomorphism
\begin{equation*} V_{g} \cong F[x]/(f_1)\oplus \cdots \oplus F[x]/(f_s) \end{equation*}
for some unique list of monic, non-constant polynomials \(f_1, \cdots, f_s\) with \(f_i \mid f_{i+1}\) for all \(i\text{.}\) Recall that the operator \(g\) on \(V\) is given as \(\l_x\) (multiplication by \(x\)) on \(V_g\text{.}\) Since this is a \(F[x]\)-module isomorphism, \(g\) corresponds to multiplication by \(x\) on each summand \(F[x]/(f_i)\text{.}\) As we have seen before, for each \(i\text{,}\) the matrix representing \(\l_x\) on \(F[x]/f_i\) relative to the basis \(B_i := \{1, x, x^2, \dots, x^{\deg(f_i)-1}\}\) of \(F[x]/f_i\) is the companion matrix \(C(f_i)\) of \(f_i\text{.}\) Let \(B\) be the \(F\)-basis of \(F[x]/(f_1)\oplus \cdots \oplus F[x]/(f_s)\) given by tuples
\begin{equation*} B = \{(b_1, 0, \dots, 0) \mid b_1 \in B_1\} \cup \{(0, b_2, 0, \dots, 0) \mid b_2 \in B_{2\} \cup}\cdots \cup \{(0, 0, \dots, 0, b_s) \mid b_s \in B_s\} \end{equation*}
(in that order). Then the matrix of \(\l_x\) on \(F[x]/(f_1)\oplus \cdots \oplus F[x]/(f_k)\) for \(B\) is \(C(f_1) \oplus \cdots \oplus C(f_s)\text{.}\)
This gives existence. Uniqueness is a consequence of the uniqueness of the list \(f_1, \dots, f_s\text{,}\) but I will omit the details.

Remark 15.14.

The matrix is unique, but the basis \(B\) that realizes it is, in general, not unique. As an extreme example illustrating this: Take \(g\) to be the identity operator on a finite dimensional vector space \(V\text{.}\) Then \([g]_B^B = I_n\) holds for any basis \(B\text{.}\) (Note that \(I_n\) is indeed in rational canonical form: it is equal to \(C(x-1) \oplus \cdots \oplus C(x-1)\text{.}\))

Definition 15.15. Invariant Factor.

Recall that in Theorem [provisional cross-reference: cite], CITEX the number \(r\) is the rank of \(G\text{,}\) the numbers \(n_1,\ldots,n_t\) are the invariant factors of \(G\text{,}\) and the decomposition of \(G\) in part (1) is the invariant factor decomposition of \(G\text{.}\)

Example 15.16. Back to \(\Mat_{2\times 2}(\Q)\).

Let us return to the example of \(A = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}\in \Mat_{2 \times 2}(\Q)\) to illustrate the Theorem and its proof. By the previous example we have an isomorphism of \(\Q[x]\)-module
\begin{equation*} \Q^2_A \cong \Q[x]/(x^2 - 2x + 1). \end{equation*}
Recall that \(\l_x\) (multiplication by \(x\)) on \(\Q^2_A\) is given by multiplication by the matrix \(A\text{.}\) This is an isomorphism of \(\Q[x]\)-modules, and so \(A\) corresponds to the operator \(\l_x\) on \(\Q[x]/(x^2 - 2x + 1)\text{.}\) As we have seen before, relative to the basis \(\{1, x\}\text{,}\) the matrix for \(\l_x\) is
\begin{equation*} C(x^2 - 2x + 1) = \begin{bmatrix} 0 & -1 \\ 1 & 2 \\ \end{bmatrix}. \end{equation*}
This is the Rational Canonical Form of \(A\text{.}\) \(A\) has just one invariant factor, namely \(x^2 - 2x + 1\text{.}\)
By the way, tracking through the calculations that got us here, we see that the basis of \(\Q^2\) that gives the RCF of \(A\) if \(\vectwo01, \vectwo11\) of \(\Q^2\text{.}\)

Example 15.18. Similarity Classes of \(4\times 4\) Matrices.

Let \(F = \Z/p\) be the field with \(p\) elements for some prime \(p\text{.}\) Up to similarity, how many \(4 \times 4\) matrices are there with entries in \(F\text{?}\)
Solution.
Each such matrix is similar to a unique one of the form
\begin{equation*} C(f_1) \oplus \cdots \oplus C(f_k) \end{equation*}
with \(f_1, \dots, f_k\) monic polynomials of positive degree such that \(f_1 \mid \cdots \mid f_k\text{.}\) Moreover, since \(C(f)\) is a \(d \times d\) matrix where \(d = \deg(f)\text{,}\) we must have \(\deg(f_1) + \cdots + \deg(f_k) = 4\text{.}\) So the goal becomes to count all such tuples \((f_1, \dots, f_k)\) of polynomials. We proceed by cases on \(k\text{.}\) Note that \(k \geq 5\) is not possible.
  • Case \(k = 1\).
    Then \(\deg(f_1) = 4\) and the number of such polynomials is \(p^4\) (since \(f_1 = x^4 + a_3 x^3 + a_2 x^2 + a_1 x + a_0\) and \(F\) has \(p\) elements).
  • Case \(k = 2\).
    Note that \(\deg(f_1) \geq 3\) is not possible. If \(\deg(f_1) = 2\) then \(f_1 = f_2\text{,}\) and there are \(p^2\) possibilities. If \(\deg(f_1) = 1\text{,}\) then \(\deg(f_2) = 3\) and \(f_2 = f_1 \cdot g\) with \(g\) monic and \(\deg(g) = 2\text{.}\) There are \(p\) possibilities for \(f_1\) and \(p^2\) for \(g\text{,}\) for a total of \(p^3\) in this subcase. The total for this case is thus \(p^2 + p^3\text{.}\)
  • Case \(k= 3\).
    The only possibilities are \(\deg(f_1) = 1\text{,}\) \(\deg(f_2) = 1\) and \(\deg(f_3) = 2\) so that \(f_2 = f_1\) and \(f_3 = f_1 \cdot g\) with \(\deg(g) =1\text{.}\) We get \(p^3\) possibilities.
  • Case \(k = 4\).
    We must have \(f_1 = f_2 = f_3 = f_4\) with each of degree \(1\text{,}\) for a total of \(p\) possibilities. The total is
    \begin{equation*} p^4 + p^2+ p^3 + p^3 + p. \end{equation*}

Remark 15.19.

The proof of Theorem [provisional cross-reference: cite] CITEX makes clear the following fact: For a field \(F\text{,}\) finite dimensional vector space \(V\text{,}\) and \(F\)-linear operator \(g: V \to V\text{,}\)the invariant factors of the operator \(g\) are identical to the invariant factors of the \(F[x]\)-module \(V_g\text{.}\)
The following result is thus very useful for finding the Rational Canonical Form of an operator (we will state it just for operators given explicitly by matrices):

Proof.

For this proof it is useful to identity \(F[x]^n\) with \(F^n[x]\) where the latter refers to all expressions of the form \(\sum_i v_i x^i = v_0 + v_1 x + \cdots + v_m x^m\) with \(v_i \in F^n\text{.}\) For instance, (when \(n = 2\)) we identify \(\vectwo{x^2 -3x + 1}{x^7 +x + 5}\) with \(\vectwo{1}{5} + \vectwo{-3}{1} x + \vectwo{1}{0} x^2 + \vectwo{0}{1} x^7\text{.}\) Using this identification we define
\begin{equation*} \phi: F[x]^n = F^n[x] \to F^n_A \end{equation*}
by \(\phi(\sum_i v_i x^i) = \sum_i A^i v_i\text{.}\) Then \(\phi\) is a \(F[x]\)-module homomorphism — I leave it to you to verify this. \(\phi\) is onto since, e.g., for any \(v \in F^n\) we have \(\phi(vx^0) = v\text{.}\)
We have
\begin{equation*} \phi((xI_n - A) \cdot \sum_i v_i x^i) = \phi(\sum_i v_i x^{i+1} - \sum_i (Av_i) x^i) = \sum_i A^{i+1}(v_i) - \sum_i A^i(Av_i) = 0 \end{equation*}
and hence \(\ker(\phi) \supseteq \im(xI_n - A)\text{.}\) By the \(0\)-th isomorphism theorem, there is an induced \(F[x]\)-module homomorphism
\begin{equation*} \ov{\phi}: \frac{F^n[x]}{\im(xI_n - A)} \onto F^n_A \end{equation*}
induced by \(\phi\text{,}\) and it is onto since \(\phi\) is onto. It remains to show this map is one-to-one.
Since \(\overline{\phi}\) is \(F[x]\)-linear it is certainly \(F\)-linear. Since \(\dim_F(F^n_A) = n\text{,}\) to prove \(\overline{\phi}\) is one-to-one, it suffices to prove \(\dim_F \frac{F^n[x]}{\im(xI_n - A)} \leq n\) (by Rank-Nullity). I claim the images of the standard basis \(e_1, \dots, e_n\) in \(\frac{F^n[x]}{\im(xI_n - A)}\) span it as an \(F\)-vector space. To see this, note that \(x^je_i\text{,}\) for \(1 \leq i \leq n\text{,}\) \(j \geq 0\) span \(F^n[x]\) as an \(F\)-vector space, and hence they span the quotient. It thus suffices to show \(\overline{x^j e_i}\) lies in the span of \(\overline{e_1}, \dots, \overline{e_n}\) in \(\frac{F^n[x]}{\im(xI_n - A)}\) for all \(i\) and \(j\text{.}\) We have \(x^{j-1} \cdot (A - x) e_i \in \im(xI_n - A)\) and thus
\begin{equation*} \overline{x^j e_i} = \overline{x^{j-1} Ae_i} \end{equation*}
and by repeating this argument we have
\begin{equation*} \overline{x^j e_i} =\overline{x^{j-1} Ae_i} = \overline{x^{j-2} A^2e_i} = \cdots =\overline{A^je_i} \in \Span_F(e_1, \dots, e_n). \end{equation*}

Proof.

Let \(S\) be the Smith Normal Form of \(xI_n - A\) and let \(d_1, \dots, d_n\) be its diagonal entries. As proven before, the matrix \(xI_n - A\) and \(S\) present isomorphic \(F[x]\)-modules, and thus the Theorem gives an isomorphism
\begin{equation*} F^n_A \cong F[x]/(d_1) \oplus \cdots \oplus F[x]/(d_n). \end{equation*}
Since \(\dim_F F^n_A < \infty\text{,}\) none of the \(d_i\)’s can be zero. So, each \(d_i\) is monic and \(d_1 \mid \cdots \mid d_n\text{.}\) Now some of the \(d_i\) might be non-zero constants, in which case \(d_i\) is a unit and \(F[x]/(d_i) = 0\text{.}\) Upon tossing those out, we are left with
\begin{equation*} F^n_A \cong F[x]/(f_1) \oplus \cdots \oplus F[x]/(f_k) \end{equation*}
with each \(f_i\) monic of positive degree and \(f_1 \mid \cdots \mid f_k\text{.}\) These are, by definition, the invariant factors of \(A\text{.}\)

Example 15.22. Once More to Back to \(\Mat_{2\times 2}(\Q)\).

Let’s find the invariant factors of the matrix \(A= \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}\in \M_{2 \times 2}(\Q)\) we looked at before, but this time using the Theorem 15.20 and Corollary 15.21.
Solution.
We have
\begin{equation*} xI_2-A= \begin{bmatrix} x-1 & -1 \\ 0 & x-1 \end{bmatrix}. \end{equation*}
To find the invariant factors of \(A\) we just need to find the Smith Normal Form of \(xI_2-A\text{.}\) I’ll do this two ways:
  • Method I: Do row and column operations using the generalized Euclidean algorithm:.
    \begin{equation*} \begin{bmatrix} x-1 & -1 \\ 0 & x-1 \end{bmatrix} \mapsto \begin{bmatrix} x-1 & -1 \\ (x-1)^2 & 0 \end{bmatrix} \mapsto \begin{bmatrix} 0 & -1 \\ (x-1)^2 & 0 \end{bmatrix} \mapsto \begin{bmatrix} 1 & 0 \\ 0 & (x-1)^2 \end{bmatrix}. \end{equation*}
    Tossing out the unit, we see that the only invariant factor is \((x-1)^2\text{,}\) as before.
  • Method II: Call \(d_1, d_2\) the entries on the diagonal of the SNF of \(xI_2-A\text{:}\).
    \begin{equation*} \begin{bmatrix} x-1 & -1 \\ 0 & x-1 \end{bmatrix} \mapsto \begin{bmatrix} x-1 & -1 \\ (x-1)^2 & 0 \end{bmatrix} \mapsto \begin{bmatrix} 0 & -1 \\ (x-1)^2 & 0 \end{bmatrix} \mapsto \begin{bmatrix} 1 & 0 \\ 0 & (x-1)^2 \end{bmatrix}. \end{equation*}
    Tossing out the unit, we see that the only invariant factor is \((x-1)^2\text{,}\) as before.
Recall from Theorem that \(d_1\) is the gcd of the entries of \(xI_2-A\) and \(d_1d_2=\det(xI_2-A)\text{.}\) Thus \(d_1=1\) and \(d_2=\det(xI_2-A)=(x-1)^2\text{.}\) Therefore the only invariant factor of \(A\) is \((x-1)^2\text{.}\)

Example 15.23. Finding IFs and RCF.

Let
\begin{equation*} A = \begin{bmatrix} 1 & -1 & 1 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \\ \end{bmatrix}. \end{equation*}
Let us find the invariant factors and Rational Canonical Form of \(A\) by finding the Smith Normal Form of \(xI_3 - A\text{.}\)
Solution.
We have
\begin{equation*} xI_3 - A = \begin{bmatrix} x-1 & 1 & -1 \\ 0 & x & -1 \\ 0 & -1 & x \\ \end{bmatrix}. \end{equation*}
A sequence of messy row and column operations yields
\begin{equation*} \begin{bmatrix} 1 & 0 & 0 \\ 0 & x-1 & 0 \\ 0 & 0 & x^2-1 \\ \end{bmatrix}. \end{equation*}
Note that this is indeed in Smith Normal Form. It follows that the invariant factors of \(A\) are \(x-1, x^2-1\) and the RCF of \(A\) is
\begin{equation*} C(x-1) \oplus C(x^2-1) = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \\ \end{bmatrix}. \end{equation*}
For an alternative approach, we could use that the diagonal entries \(d_1, d_2, d_3\) of the Smith Normal Form of \(xI_3 - A\) satisfy \(d_1 = \gcd(xI_3 - A)\text{,}\) \(d_1d_2\) is the gcd of the \(2 \times 2\) minors of \(xI_3 - A\text{,}\) and \(d_1d_2d_3 = \det(xI_3 - A)\text{.}\) It’s clear that \(d_1 = 1\) and an easy calculation gives that \(\det(xI_3 - A) = (x-1)(x^2 -1)\text{.}\) There are nine \(2 \times 2\) minors of \(xI_3 - A\text{,}\) and a tedious check reveals that each of them is one of \(x^2 -1\text{,}\) \(x(x-1)\text{,}\) \(x-1\) or \(0\) (up to signs). So \(d_1d_2 = x-1\text{.}\) We get that
\begin{equation*} d_1 = 1, d_2 = x-1, d_3 = x^2 -1 \end{equation*}
as before.