The Cayley-Hamilton theorem

Posted: August 21, 2022 in Basic Algebra, Matrices
Tags: adjugate of a matrix, Cayley-Hamilton theorem

There’s probably no important theorem in linear algebra better known than the Cayley-Hamilton theorem, which says that every square matrix over a commutative ring with identity $R$ satisfies its characteristic polynomial. If you have no idea what “commutative ring” means, just assume that $R$ is the set of real or complex numbers.

Let $R$ be a commutative ring with identity, and let $A \in M_n(R),$ the ring of $n \times n$ matrices with entries in $R.$ Let $I \in M_n(R)$ be the identity matrix. Recall that the characteristic polynomial of $A \in M_n(R)$ is defined to be $\det(xI-A) \in R[x].$ Some (or many) authors define the characteristic polynomial of $A$ to be $\det(A-xI),$ which is fine because $\det(A-xI)=(-1)^n\det(xI-A)$ and so nothing is lost with that definition.

Theorem (Cayley-Hamilton). Let $R$ be a commutative ring with identity, $A \in M_n(R),$ and

$p(x):=\det(xI - A) \in R[x].$

Then $p(A)=0.$

A Bogus Proof. Substituting $x=A$ gives $p(A)=\det(AI-A)=\det(A-A)=\det(0)=0.$

OK, why is that “proof” bogus? Well, a quick way to see that is that $p(A) \in M_n(R)$ but $\det(AI-A) \in R$ and so you can’t just put $x=A$ unless $n=1,$ which is the trivial case.

So the bogus proof is a good reason to clarify what the Cayley-Hamilton theorem really says. It says if

$p(x):=\det(xI-A)=x^n+r_{n-1}x^{n-1}+ \cdots + r_1x+r_0, \ r_i \in R,$

then $p(A)=A^n+r_{n-1}A^{n-1}+ \cdots + r_1A+r_0I=0,$ where $0$ on the right is the $n \times n$ zero matrix.

Note. The proof I give here is from Nathan Jacobson’s book Lectures in Abstract Algebra.

Proof of the Theorem. For any matrix $B \in M_n(R),$ let $\text{adj}(B)$ be the adjugate of $B.$ Recall the property of $\text{adj}(B)$ :

$\text{adj}(B)B=B \text{adj}(B)=(\det B)I.$

So choosing $B=xI-A,$ and writing $p(x)=x^n+r_{n-1}x^{n-1} + \cdots + r_1x+r_0, \ r_i \in R,$ we have

$\begin{aligned}\text{adj}(xI-A)(xI-A)=\det(xI-A)I=p(x)I=(x^n+r_{n-1}x^{n-1}+ \cdots + r_1x+r_0)I. \ \ \ \ \ \ \ (1)\end{aligned}$

Now, let’s see what $\text{adj}(xI-A)$ looks like. Ignoring $\pm$ sign, an entry of $\text{adj}(xI-A)$ is the determinant of an $(n-1) \times (n-1)$ matrix obtained from $xI-A$ by deleting one row and one column of $xI-A.$ So each entry of $\text{adj}(xI-A)$ is a polynomial of degree at most $n-1$ in $x,$ i.e. each entry of $\text{adj}(xI-A)$ is in the form of $a_{n-1}x^{n-1}+a_{n-2}x^{n-1} + \cdots + a_1x+a_0,$ for some $a_i \in R.$ So we can write

$\text{adj}(xI-A)=A_{n-1}x^{n-1}+ A_{n-2}x^{n-2} + \cdots + A_1x+A_0, \ \ \ \ \ \ \ \ (2)$

for some $A_i \in M_n(R).$ Substituting $(2)$ in $(1)$ gives

$\begin{aligned}(A_{n-1}x^{n-1}+ A_{n-2}x^{n-2} + \cdots + A_1x+A_0)(xI-A)=(x^n+r_{n-1}x^{n-1}+ \cdots + r_1x+r_0)I,\end{aligned}$

which simplifies to

$\displaystyle A_{n-1}x^n+\sum_{k=1}^{n-1}(A_{k-1}-A_kA)x^k-A_0A=(x^n+r_{n-1}x^{n-1}+ \cdots + r_1x+r_0)I.$

Equating the coefficients of $x^k, \ 0 \le k \le n,$ on both sides of the above gives

$A_{n-1}=I, \ \ \ \ A_{k-1}-A_kA=r_kI, \ 1 \le k \le n-1, \ \ \ \ -A_0A=r_0I,$

and so

$\displaystyle p(A)=A^n+\sum_{k=1}^{n-1}r_kA^k + r_0I=A_{n-1}A^n+\sum_{k=1}^{n-1}(A_{k-1}-A_kA)A^k-A_0A$

$\displaystyle =A_{n-1}A^n+\sum_{k=1}^{n-1}(A_{k-1}A^k-A_kA^{k+1})-A_0A, \ \ \ \ \ \ \text{telescoping sum}$

$\displaystyle=A_{n-1}A^n+A_0A-A_{n-1}A^n-A_0A=0. \ \Box$

Exercise. Use the Cayley-Hamilton theorem to generalize the theorem as follows. Let $R$ be a commutative ring with identity. Let $p(x)$ be the characteristic polynomial of some matrix $A \in M_n(R).$ Show that for every $B \in M_n(R)$ that commutes with $A,$ there exists $C \in M_n(R)$ such that $C$ commutes with both $A,B,$ and $p(B)=(B-A)C.$