The Cayley-Hamilton theorem

Posted: August 21, 2022 in Basic Algebra, Matrices
Tags: ,

There’s probably no important theorem in linear algebra better known than the Cayley-Hamilton theorem, which says that every square matrix over a commutative ring with identity R satisfies its characteristic polynomial. If you have no idea what “commutative ring” means, just assume that R is the set of real or complex numbers.

Let R be a commutative ring with identity, and let A \in M_n(R), the ring of n \times n matrices with entries in R. Let I \in M_n(R) be the identity matrix. Recall that the characteristic polynomial of A \in M_n(R) is defined to be \det(xI-A) \in R[x]. Some (or many) authors define the characteristic polynomial of A to be \det(A-xI), which is fine because \det(A-xI)=(-1)^n\det(xI-A) and so nothing is lost with that definition.

Theorem (Cayley-Hamilton). Let R be a commutative ring with identity, A \in M_n(R), and

p(x):=\det(xI - A) \in R[x].

Then p(A)=0.

A Bogus Proof. Substituting x=A gives p(A)=\det(AI-A)=\det(A-A)=\det(0)=0.

OK, why is that “proof” bogus? Well, a quick way to see that is that p(A) \in M_n(R) but \det(AI-A) \in R and so you can’t just put x=A unless n=1, which is the trivial case.

So the bogus proof is a good reason to clarify what the Cayley-Hamilton theorem really says. It says if

p(x):=\det(xI-A)=x^n+r_{n-1}x^{n-1}+ \cdots + r_1x+r_0, \ r_i \in R,

then p(A)=A^n+r_{n-1}A^{n-1}+ \cdots + r_1A+r_0I=0, where 0 on the right is the n \times n zero matrix.

Note. The proof I give here is from Nathan Jacobson’s book Lectures in Abstract Algebra.

Proof of the Theorem. For any matrix B \in M_n(R), let \text{adj}(B) be the adjugate of B. Recall the property of \text{adj}(B) :

\text{adj}(B)B=B \text{adj}(B)=(\det B)I.

So choosing B=xI-A, and writing p(x)=x^n+r_{n-1}x^{n-1} + \cdots + r_1x+r_0, \ r_i \in R, we have

\begin{aligned}\text{adj}(xI-A)(xI-A)=\det(xI-A)I=p(x)I=(x^n+r_{n-1}x^{n-1}+ \cdots + r_1x+r_0)I. \ \ \ \ \ \ \ (1)\end{aligned}

Now, let’s see what \text{adj}(xI-A) looks like. Ignoring \pm sign, an entry of \text{adj}(xI-A) is the determinant of an (n-1) \times (n-1) matrix obtained from xI-A by deleting one row and one column of xI-A. So each entry of \text{adj}(xI-A) is a polynomial of degree at most n-1 in x, i.e. each entry of \text{adj}(xI-A) is in the form of a_{n-1}x^{n-1}+a_{n-2}x^{n-1} + \cdots + a_1x+a_0, for some a_i \in R. So we can write

\text{adj}(xI-A)=A_{n-1}x^{n-1}+ A_{n-2}x^{n-2} + \cdots + A_1x+A_0, \ \ \ \ \ \ \ \ (2)

for some A_i \in M_n(R). Substituting (2) in (1) gives

\begin{aligned}(A_{n-1}x^{n-1}+ A_{n-2}x^{n-2} + \cdots + A_1x+A_0)(xI-A)=(x^n+r_{n-1}x^{n-1}+ \cdots + r_1x+r_0)I,\end{aligned}

which simplifies to

\displaystyle A_{n-1}x^n+\sum_{k=1}^{n-1}(A_{k-1}-A_kA)x^k-A_0A=(x^n+r_{n-1}x^{n-1}+ \cdots + r_1x+r_0)I.

Equating the coefficients of x^k, \ 0 \le k \le n, on both sides of the above gives

A_{n-1}=I, \ \ \ \ A_{k-1}-A_kA=r_kI, \ 1 \le k \le n-1, \ \ \ \ -A_0A=r_0I,

and so

\displaystyle p(A)=A^n+\sum_{k=1}^{n-1}r_kA^k + r_0I=A_{n-1}A^n+\sum_{k=1}^{n-1}(A_{k-1}-A_kA)A^k-A_0A

\displaystyle =A_{n-1}A^n+\sum_{k=1}^{n-1}(A_{k-1}A^k-A_kA^{k+1})-A_0A, \ \ \ \ \ \  \text{telescoping sum}

\displaystyle=A_{n-1}A^n+A_0A-A_{n-1}A^n-A_0A=0. \ \Box

Exercise. Use the Cayley-Hamilton theorem to generalize the theorem as follows. Let R be a commutative ring with identity. Let p(x) be the characteristic polynomial of some matrix A \in M_n(R). Show that for every B \in M_n(R) that commutes with A, there exists C \in M_n(R) such that C commutes with both A,B, and p(B)=(B-A)C.

Leave a Reply