Tag Archives: matrix

Over CC, matrices of finite multiplicative order are diagonalizable

Let A be an n \times n matrix over \mathbb{C}. Show that if A^k=I for some k, then A is diagonalizable.

Let F be a field of characteristic p. Show that A = \begin{bmatrix} 1 & \alpha \\ 0 & 1 \end{bmatrix} has finite order but cannot be diagonalized over F unless \alpha = 0.


Since A^k = I, the minimal polynomial m of A over \mathbb{C} divides x^k-1. In particular, the roots of m are distinct. Since \mathbb{C} contains all the roots of unity, by Corollary 25 on page 494 of D&F, A is diagonalizable over \mathbb{C}.

Note that \begin{bmatrix} 1 & \alpha \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & \beta \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 1 & \alpha+\beta \\ 0 & 1 \end{bmatrix}. By an easy inductive argument, then, \begin{bmatrix} 1 & \alpha \\ 0 & 1 \end{bmatrix}^t = \begin{bmatrix} 1 & t\alpha \\ 0 & 1 \end{bmatrix}, and in particular, A^p = I.

Suppose \alpha \neq 0. Now \frac{1}{\alpha}A is in Jordan canonical form, and is not diagonalizable. (See Corollary 24 on page 493 of D&F.) So A cannot be diagonalizable, for if it were, then so would \frac{1}{\alpha}A. (If P^{-1}AP = D is diagonal, then so is P^{-1}\frac{1}{\alpha}AP = \frac{1}{\alpha}D.)

Exhibit a quadratic field as a field of matrices

Let K = \mathbb{Q}(\sqrt{D}), where D is a squarefree integer. Let \alpha = a+b\sqrt{D} be in K, and consider the basis B = \{1,\sqrt{D}\} of K over \mathbb{Q}. Compute the matrix of the \mathbb{Q}-linear transformation ‘multiplication by \alpha‘ (described previously) with respect to B. Give an explicit embedding of \mathbb{Q}(\sqrt{D}) in the ring \mathsf{Mat}_2(\mathbb{Q}).


We have \varphi_\alpha(1) = a+b\sqrt{D} and \varphi(\alpha)(\sqrt{D}) = bD + a\sqrt{D}. Making these the columns of a matrix M_\alpha, we have M_\alpha = \begin{bmatrix} a & bD \\ b & a \end{bmatrix}, and this is the matrix of \varphi_\alpha with respect to B. As we showed in the exercise linked above, \alpha \mapsto M_\alpha is an embedding of K in \mathsf{Mat}_2(\mathbb{Q}).

Compare to this previous exercise about \mathbb{Z}[\sqrt{D}].

Every degree n extension of a field F is embedded in the set of nxn matrices over F

Let F be a field, and let K be an extension of F of finite degree.

  1. Fix \alpha \in K. Prove that the mapping ‘multiplication by \alpha‘ is an F-linear transformation on K. (In fact an automorphism for \alpha \neq 0.)
  2. Deduce that K is isomorphically embedded in \mathsf{Mat}_n(F).

Let \varphi_\alpha(x) = \alpha x. Certainly then we have \varphi_\alpha(x+ry) = \alpha(x+ry) = \alpha x + r \alpha y = \varphi_\alpha(x) + r \varphi_\alpha(y) for all x,y \in K and r \in F; so \varphi_\alpha is an F-linear transformation. If \alpha \neq 0, then evidently \varphi_{\alpha^{-1}} \circ \varphi_\alpha = \varphi_\alpha \circ \varphi_{\alpha^{-1}} = 1.

Fix a basis for K over F; this yields a ring homomorphism \Psi : K \rightarrow \mathsf{Mat}_n(F) which takes \alpha and returns the matrix of \varphi_\alpha with respect to the chosen basis. Suppose \alpha \in \mathsf{ker}\ \Psi; then \varphi_\alpha(x) = 0 for all x \in K, and thus \alpha = 0. So \Psi is injective as desired.

Solve a given nth order linear differential equation

Find bases for the vector spaces of solutions to the following differential equations.

  1. y^{(3)} - 3y^{(1)} + 2y = 0
  2. y^{(4)} + 4y^{(3)} + 6y^{(2)} + 4y^{(1)} + y = 0

Let y_1 = y^{(0)}, y_2 = y^{(1)}, and y_3 = y^{(2)}, and consider the following linear system of first order differential equations.

\begin{array}{ccccccc} \frac{d}{dt} y_1 & = & & & y_2 & & \\ \frac{d}{dt} y_2 & = & & & & & y_3 \\ \frac{d}{dt} y_3 & = & -2y_1 & + & 3y_2 & & \end{array}

Which we can express as a matrix equation by

\dfrac{d}{dt} \begin{bmatrix} y_1 \\ y_2 \\ y_3 \end{bmatrix} = \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ -2 & 3 & 0 \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \\ y_3 \end{bmatrix}

Now let A denote this coefficient matrix. Let P = \frac{1}{9} \begin{bmatrix} 1 & -2 & 1 \\ -4 & 8 & 5 \\ -6 & 3 & 3 \end{bmatrix}; evidently, P^{-1}AP = \begin{bmatrix} -2 & 0 & 0 \\ 0 & 1 & 1 \\ 0 & 0 & 1 \end{bmatrix} is in Jordan canonical form. (Computations performed by WolframAlpha.) Now P\left(\mathsf{exp}([-2]t) \oplus \mathsf{exp}(\begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}t\right) = P\begin{bmatrix} e^{-2t} & 0 & 0 \\ 0 & e^t & te^t \\ 0 & 0 & e^t \end{bmatrix} is a fundamental matrix of our linear system. Reading off the first row of this matrix (and multiplying by 9), we see that e^{-2t}, -2e^t, and -2te^t + e^t are solutions of our original 3rd order differential equation. Moreover, it is clear that these are linearly independent.

For the second equation, we follow a similar strategy and see that the corresponding coefficient matrix is A = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ -1 & -4 & -6 & -4 \end{bmatrix}. Let Q = \begin{bmatrix} -1 & -3 & -6 & -10 \\ 1 & 2 & 3 & 4 \\ -1 & -1 & -1 & -1 \\ 1 & 0 & 0 & 0 \end{bmatrix}. Evidently, Q^{-1}AQ = \begin{bmatrix} -1 & 1 & 0 & 0 \\ 0 & -1 & 1 & 0 \\ 0 & 0 & -1 & 1 \\ 0 & 0 & 0 & -1 \end{bmatrix} = B is in Jordan canonical form. (Computation performed by WolframAlpha.) Taking the first row of Q\mathsf{exp}(Bt) (and multiplying by -1), we see that e^{-t}, e^{-t}(3t-2), e^{-t}(\frac{1}{2}t^2 + 3t + 6), and e^{-t}(\frac{1}{6}t^3 + \frac{3}{2}t^2 + 6t + 10) are linearly independent solutions of the original 4th order differential equation.

Nonsingular, constant multiples of a fundamental matrix are fundamental

Let \frac{d}{dt} Y = AY be a linear system of first order differential equations over \mathbb{R} as in this previous exercise. Suppose M is a fundamental matrix of this system (i.e. a matrix whose columns are linearly independent solutions) and let Q be a nonsingular matrix over \mathbb{R}. Prove that MQ is also a fundamental matrix for this differential equation.


Write Q = [Q_1\ \ldots\ Q_n] as a column matrix. Now \frac{d}{dt} MQ = \frac{d}{dt} [MQ_1\ ldots\ MQ_n] = [(\frac{d}{dt}M)Q_1\ \ldots\ (\frac{d}{dt} M)Q_n] by the definition of our matrix derivative and part 2 of this previous exercise. This is then equal to (\frac{d}{dt} M)Q = AMQ. In particular, the columns of MQ are solutions of \frac{d}{dt} Y = AY. Moreover, since Q is nonsingular, MQ is nonsingular. So MQ is also a fundamental matrix.

Properties of the derivative of a matrix power series

Let G(x) be a formal power series on \mathbb{C} having an infinite radius of convergence and fix an n \times n matrix A over \mathbb{C}. The mapping t \mapsto G(At) carries a complex number t to a complex matrix G(At); we can think of this as the ‘direct sum’ of n \times n different functions on \mathbb{C}, one for each entry of G(At).

We now define the derivative of G(At) with respect to t to be a mapping \mathbb{C} \rightarrow \mathsf{Mat}_n(\mathbb{C}) as follows: \left[\frac{d}{dt} G(At)\right]_{i,j} = \frac{d}{dt} \left[ G(At)_{i,j}\right]. In other words, thinking of G(At) as a matrix of functions, \frac{d}{dt} G(At) is the matrix whose entries are the derivatives of the corresponding entries of G(At).

We will use the limit definition of derivative (that is, \frac{d}{dt} f(t) = \mathsf{lim}_{h \rightarrow 0} \dfrac{f(t+h) - f(t)}{h}, where it doesnt matter how h approaches 0 in \mathbb{C}) and will assume that all derivatives exist everywhere.

Prove the following properties of derivatives:

  1. If G(x) = \sum_{k \in \mathbb{N}} \alpha_kx^k, then \frac{d}{dt} G(At) = A \sum_{k \in \mathbb{N}} (k+1)\alpha_{k+1}(At)^k.
  2. If V is an n \times 1 matrix with constant entries (i.e. not dependent on t) then \frac{d}{dt} (G(At)V) = \left( \frac{d}{dt} G(At) \right) V.

[My usual disclaimer about analysis applies here: as soon as I see words like ‘limit’ and ‘continuous’ I become even more confused than usual. Read the following with a healthy dose of skepticism, and please point out any errors.]

Note the following.

\left[ \dfrac{d}{dt} G(At) \right]_{i,j}  =  \dfrac{d}{dt} G(At)_{i,j}
 =  \mathsf{lim}_{h \rightarrow 0} \dfrac{G(A(t+h)_{i,j}) - G(At)_{i,j}}{h}
 =  \mathsf{lim}_{h \rightarrow 0} \left[ \dfrac{G(A(t+h)) - G(At)}{h} \right]_{i,j}
 =  \mathsf{lim}_{h \rightarrow 0} \left[ \dfrac{\sum_k \alpha_k(A(t+h))^k - \sum_k \alpha_k(At)^k}{h} \right]_{i,j}
 =  \mathsf{lim}_{h \rightarrow 0} \left[ \dfrac{\sum_k \alpha_k A^k ((t+h)^k - t^k)}{h} \right]_{i,j}
 =  \mathsf{lim}_{h \rightarrow 0} \left[ \dfrac{\sum_{k > 0} \alpha_k A^k \left( \sum_{m=0}^k {k \choose m} t^m h^{k-m} - t^k \right)}{h} \right]_{i,j}
 =  \mathsf{lim}_{h \rightarrow 0} \left[ \dfrac{\sum_{k > 0} \alpha_k A^k \left( \sum_{m=0}^{k-1} {k \choose m} t^m h^{k-m} \right)}{h} \right]_{i,j}
 =  \mathsf{lim}_{h \rightarrow 0} \displaystyle\sum_{k > 0} \alpha_k A^k \sum_{m=0}^{k-1} {k \choose m} t^m h^{k-1-m} (Now we can substitute h = 0.)
 =  \displaystyle\sum_{k > 0} \alpha_k A^k {k \choose {k-1}} t^{k-1} (All terms but m=k-1 vanish.)
 =  \displaystyle\sum_{k > 0} k \alpha_k A^k t^{k-1}
 =  \displaystyle\sum_k (k+1) \alpha_{k+1}A^{k+1}t^k
 =  A \sum_k (k+1)\alpha_{k+1}(At)^k

As desired.

Now say V = [v_i] and G(At) = [c_{i,j}(t)]; we then have the following.

\dfrac{d}{dt} \left( G(At)V \right)  =  \dfrac{d}{dt} \left( [c_{i,j}(t)][v_{i,j}] \right)
 =  \dfrac{d}{dt} [\sum_k c_{i,k}(t)v_k]
 =  [\frac{d}{dt} \sum_k c_{i,k}(t)v_k]
 =  [\sum_k (\frac{d}{dt} c_{i,k}(t)) v_k]
 =  [\frac{d}{dt} c_{i,j}(t)][v_i]
 =  \left(\frac{d}{dt} G(At) \right)V

As desired.

Facts about power series of matrices

Let G(x) = \sum_{k \in \mathbb{N}} \alpha_k x^k be a power series over \mathbb{C} with radius of convergence R. Let A be an n \times n matrix over \mathbb{C}, and let P be a nonsingular matrix. Prove the following.

  1. If G(A) converges, then G(P^{-1}AP) converges, and G(P^{-1}AP) = P^{-1}G(A)P.
  2. If A = B \oplus C and G(A) converges, then G(B) and G(C) converge and G(B \oplus C) = G(B) \oplus G(C).
  3. If D is a diagonal matrix with diagonal entries d_i, then G(D) converges, G(d_i) converges for each d_i, and G(D) is diagonal with diagonal entries G(d_i).

Suppose G(A) converges. Then (by definition) the sequence of matrices G_N(A) converges entrywise. Let G_N(A) = [a_{i,j}^N], P = [p_{i,j}], and P^{-1} = [q_{i,j}]. Now G_N(P^{-1}AP) = P^{-1}G_N(A)P = [\sum_\ell \sum_k q_{i,k} a_{k,\ell}^Np_{\ell,j}]. That is, the (i,j) entry of G_N(P^{-1}AP) is \sum_\ell \sum_k q_{i,k} a_{k,\ell}^Np_{\ell,j}. Since each sequence a_{k,\ell}^N converges, this sum converges as well. In particular, G(P^{-1}AP) converges (again by definition). Now since G_N(P^{-1}AP) = P^{-1}G_N(A)P for each N, the corresponding sequences for each (i,j) are equal for each term, and so have the same limit. Thus G(P^{-1}AP) = P^{-1}G(A)P.

Now suppose A = B \oplus C. We have G_N(B \oplus C) = \sum_{k=0}^N \alpha_k (B \oplus C)^k = \sum_{k=0}^N \alpha_k B^k \oplus C^k = (\sum_{k = 0}^N \alpha_k B^k) \oplus (\sum_{k=0}^N \alpha_k C^k) = G_N(B) \oplus G_N(C). Since G_N(A) converges in each entry, each of G_N(B) and G_N(C) converge in each entry. So G(B) and G(C) converge. Again, because for each (i,j) the corresponding sequences G_N(A)_{i,j} and (G_N(B) \oplus G_N(C))_{i,j} are the same, they converge to the same limit, and thus G(B \oplus C) = G(B) \oplus G(C).

Finally, suppose D is diagonal. Then in fact we have D = \bigoplus_{t=1}^n [d_t], and so by the previous part, G(D) = \bigoplus_{t=1}^n G(d_t). In particular, G(d_t) converges, and G(D) is diagonal with diagonal entries G(d_t) as desired.

On the convergence of formal power series of matrices

Let G(x) = \sum_{k \in \mathbb{N}} \alpha_kx^k be a formal power series over \mathbb{C} with radius of convergence R. Let ||A|| = \sum_{i,j} |a_{i,j}| be the matrix norm introduced in this previous exercise.

Given a matrix A over \mathbb{C}, we can construct a sequence of matrices by taking the Nth partial sum of G(A). That is, G_N(A) = \sum_{k = 0}^N a_kA^k. This gives us n^2 sequences \{c_{i,j}^N\} where c_{i,j}^N is the (i,j) entry of G_N(A). Suppose c_{i,j}^N converges to c_{i,j} for each (i,j), and let C = [c_{i,j}]. In this situation, we say that G_N(A) converges to C, and that G(A) = C. (In other words, G_N(A) converges precisely when each entrywise sequence G_N(A)_{i,j} converges.)

  1. Prove that if ||A|| \leq R, then G_N(A) converges.
  2. Deduce that for all matrices A, the following power series converge: \mathsf{sin}(A) = \sum_{k \in \mathbb{N}} \dfrac{(-1)^k}{(2k+1)!}A^{2k+1}, \mathsf{cos}(A) = \sum_{k \in \mathbb{N}} \dfrac{(-1)^k}{(2k)!}A^{2k}, and \mathsf{exp}(A) = \sum_{k \in \mathbb{N}} \dfrac{1}{k!} A^k.

[Disclaimer: My facility with even simple analytical concepts is laughably bad, but I’m going to give this a shot. Please let me know what’s wrong with this solution.]

We begin with a lemma.

Lemma: For all (i,j), (A^k)_{i,j} \leq ||A||^k, where the subscripts denote taking the (i,j) entry. Proof: By the definition of matrix multiplication, we have A^k = [\sum_{t \in n^{k-1}} \prod_{i=0}^{k-1} a_{t(i), t(i+1)}], where t(0) = i and t(k+1) = j. (I’m abusing the notation a bit here; t is an element of \{1,\ldots,n\}^{k-1}, which we think of as a function on \{1,\ldots,k-1\}.) Now |A^k_{i,j}| \leq \sum_{t \in n^{k-1}} \prod_{i=0}^{k-1} |a_{t(i), t(i+1)}| by the triangle inequality. Note that ||A||^k is the sum of all possible k-fold products of (absolute values of) entries of A, and that we have |A^k_{i,j}| bounded above by a sum of some distinct k-fold products of (absolute values of) entries of A. In particular, |A^k_{i,j}| \leq ||A||^k since the missing terms are all positive. Thus |\alpha_k A^k_{i,j}| = |(\alpha_k A^k)_{i,j}| \leq |\alpha_k| ||A||^k.

Let us define the formal power series |G|(x) by |G|(x) = \sum |\alpha_k| x^k. What we have shown (I think) is that \sum_{k=0}^N |\alpha_k A^k_{i,j}| \leq |G|_N(||A||), using the triangle inequality.

Now recall, by the Cauchy-Hadamard theorem, that the radius of convergence of G(x) satisfies 1/R = \mathsf{lim\ sup}_{k \rightarrow \infty} |\alpha_k|^{1/k}. In particular, |G| has the same radius of convergence as G. So in fact the sequence |G|_N(||A||) converges. Now the sequence \sum_{k=0}^N |\alpha_k A^k_{i,j}| is bounded and monotone increasing, and so must also converge. That is, \sum_{k=0}^N (\alpha_k A^k)_{i,j} = G_N(A)_{i,j} is absolutely convergent, and so is convergent. Thus G(A) converges.

Now we will address the convergence of the series \mathsf{sin}. Recall that the radius of convergence R of \mathsf{sin} satisfies R^{-1} = \mathsf{lim\ sup}_{k \rightarrow \infty} |\alpha_k|^{1/k} \mathsf{lim}_{k \rightarrow \infty} \mathsf{sup}_{n \geq k} |\alpha_k|^{1/k}. Now \alpha_k = 0 if k is even and (-1)^t/k! if k = 2t+1 is odd. So |\alpha_k|^{1/k} = 0 if k is even and |1/k!|^{1/k} = 1/(k!)^{1/k} > 0 if k is odd.

Brief aside: Suppose k \geq 1. Certainly 1 \leq k! < (k+1)^k, so that (k!)^{1/k} < k+1. Now 1 \leq (k!)^{1 + 1/k} < (k+1)!, so that (k!)^{1/k} < ((k+1)!)^{1/(k+1)}. In particular, if n > k, then (k!)^{1/k} < (n!)^{1/n}.

By our brief aside, we have that \mathsf{sup}_{n \geq k} |\alpha_k|^{1/k} = 1/(k!)^{1/k} if k is odd, and 1/((k+1)!)^{1/(k+1)} if k is even. So (skipping redundant terms) we have \mathsf{sin}(x) satisfies R^{-1} = \mathsf{lim}_{k \rightarrow \infty} 1/(k!)^{1/k}. We know from the Brief Aside that this sequence is monotone decreasing. Now let \varepsilon > 0. Consider now the sequence \varepsilon, 2 \varepsilon, 3 \varepsilon, et cetera. only finitely many terms of this sequence are less than or equal to 1. So there must exist some k such that the product \prod_{i=1}^k \varepsilon i is greater than 1. Now \varepsilon^k k! > 1, so that \varepsilon > 1/(k!)^{1/k}. Thus the sequence 1/(k!)^{1/k} converges to 0, and so the radius of convergence of \mathsf{sin}(x) is \infty.

By a similar argument, the radii of convergence for \mathsf{cos}(X) and \mathsf{exp}(X) are also \infty.

Properties of a matrix norm

Given an n \times n matrix A over \mathbb{C}, define ||A|| = \sum_{i,j} |a_{i,j}|, where bars denote the complex modulus. Prove the following for all A,B \in \mathsf{Mat}_n(\mathbb{C}) and all \alpha \in \mathbb{C}.

  1. ||A+B|| \leq ||A|| + ||B||
  2. ||AB|| \leq ||A|| \cdot ||B||
  3. ||\alpha A|| = |\alpha| \cdot ||A||

Say A = [a_{i,j}] and B = [b_{i,j}].

We have ||A+B|| = \sum_{i,j} |a_{i,j} + b_{i,j}| \leq \sum_{i,j} |a_{i,j}| + |b_{i,j}| by the triangle inequality. Rearranging, we have ||A+B|| \leq (\sum_{i,j} |a_{i,j}| + \sum_{i,j} |b_{i,j}| = ||A|| + ||B|| as desired.

Now ||AB|| = \sum_{i,j} |\sum_k a_{i,k}b_{k,j}| \leq latex \sum_{i,j} \sum_k |a_{i,k}|b_{k,j}|$ by the triangle inequality. Now ||AB|| \leq \sum_{i,j} \sum_{k,t} |a_{i,k}| |b_{t,j}| since all the new terms are positive, and rearranging, we have ||AB|| \leq \sum_{i,k} \sum_{j,t} |a_{i,k}||b_{t,j}| = (\sum_{i,k} |a_{i,k}|)(\sum_{j,t} |b_{t,j}|) = ||A|| \cdot ||B||.

Finally, we have ||\alpha A|| = \sum_{i,j} |\alpha a_{i,j}| = |\alpha| \sum_{i,j} |a_{i,j}| = |\alpha| \cdot ||A||.

Matrices with square roots over fields of characteristic 2

Let F be a field of characteristic 2. Compute the Jordan canonical form of a Jordan block J of size n and eigenvalue \lambda over F. Characterize those matrices A over F which are squares; that is, characterize A such that A = B^2 for some matrix B.


Let J = [b_{i,j}] be the Jordan block with eigenvalue \lambda and size n. That is, b_{i,j} = \lambda if j = i, 1 if j = i+1, and 0 otherwise. Now J^2 = [\sum_k b_{i,k}b_{k,j}]; if k \neq i or k \neq i+1, then b_{i,k} = 0. Evidently then we have (J^2)_{i,j} = \lambda^2 if j = i, 1 if j = i+2, and 0 otherwise, Noting that 2 = 0. So J^2 - \lambda^2 I = \begin{bmatrix} 0 & I \\ 0_2 & 0 \end{bmatrix}, where 0_2 is the 2 \times 2 zero matrix and I is the (n-2) \times (n-2) identity matrix. Now let v = \begin{bmatrix} V_1 \\ V_2 \end{bmatrix}, where V_1 has dimension 2 \times 1. Now (J^2 - \lambda^2 I)v = \begin{bmatrix} V_2 \\ 0 \end{bmatrix}. That is, J^2 - \lambda^2 I ‘shifts’ the entries of v– so e_{i+2} \mapsto e_i and e_1, e_2 \mapsto 0. In particular, the kernel of J^2 - \lambda^2 I has dimension 2, so that by this previous exercise, the Jordan canonical form of J^2 has two blocks (both with eigenvalue \lambda^2.

Now J^2 - \lambda^2 I = (J + \lambda I)(J - \lambda I) = (J - \lambda I)^2, since F has characteristic 2. Note that J-\lambda I has order n, since (evidently) we have (J - \lambda I)e_{i+1} = e_i and (J - \lambda I)e_1 = 0. So J-\lambda I has order n. If n is even, then (J^2 - \lambda^2 I)^{n/2} = 0 while (J^2 - \lambda^2 I)^{n/2-1} \neq 0, and if n is odd, then (J^2-\lambda^2 I)^{(n+1)/2} = 0 while (J^2 - \lambda^2 I)^{(n+1)/2-1} \neq 0. So the minimal polynomial of J^2 is (x-\lambda^2)^{n/2} if n is even and (x-\lambda^2)^{(n+1)/2} if n is odd.

So the Jordan canonical form of J^2 has two Jordan blocks with eigenvalue \lambda^2. If n is even, these have size n/2,n/2, and if n is odd, they have size (n+1)/2, (n-1)/2.

Now let A be an arbitrary n \times n matrix over F (with eigenvalues in F). We claim that A is a square if and only if the following hold.

  1. The eigenvalues of A are square in F
  2. For each eigenvalue \lambda of A, the Jordan blocks with eigenvalue \lambda can be paired up so that the sizes of the blocks in each pair differ by 0 or 1.

To see the ‘if’ part, suppose P^{-1}AP = \bigoplus H_i \oplus K_i is in Jordan canonical form, where H_i and K_i are Jordan blocks having the same eigenvalue \lambda_i and whose sizes differ by 0 or 1. By the first half of this exercise, H_i \oplus K_i is the Jordan canonical form of J_i^2, where J_i is a Jordan block with eigenvalue \sqrt{\lambda_i}. Now A is similar to the direct sum of these J_i^2, and so Q^{-1}AQ = J^2. Then A = (Q^{-1}JQ)^2 = B^2 is square.

Conversely, suppose A = B^2 is square, and say P^{-1}BP = J is in Jordan canonical form. So P^{-1}AP = J^2. Letting J_i denote the Jordan blocks of J, we have P^{-1}AP = \bigoplus J_i^2. The Jordan canonical form of J_i^2 has two blocks with eigenvalue \lambda_i^2 and whose sizes differ by 0 or 1, by the first half of this exercise. So the Jordan blocks of A all have eigenvalues which are square in F and can be paired so that the sizes in each pair differ by 0 or 1.