Monthly Archives: December 2011

Compute the exponential of a matrix

Let D = \begin{bmatrix} 1 & 2 & -4 & 4 \\ 2 & -1 & 4 & -8 \\ 1 & 0 & 1 & -2 \\ 0 & 1 & -2 & 3 \end{bmatrix}. Compute \mathsf{exp}(D).


Let P = \begin{bmatrix} 0 & 1 & 2 & 0 \\ 2 & 0 & -2 & 1 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix}. Evidently, P^{-1}DP = \begin{bmatrix} 1 & 1 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 \end{bmatrix} is in Jordan canonical form.

Now \mathsf{exp}(P^{-1}DP) = \mathsf{exp} \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} \oplus \mathsf{exp} \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}, using this previous exercise. By this previous exercise, we have \mathsf{exp} \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} e & e \\ 0 & e \end{bmatrix}. So P^{-1}\mathsf{exp}(D)P = e P^{-1}DP, and we have \mathsf{exp}(D) = eD.

(WolframAlpha agrees.)

Compute the exponential of a Jordan block

Let N be an n \times n matrix over \mathbb{C} with 1 on the first superdiagonal and 0 elsewhere. Compute \mathsf{exp}(N\alpha). Next, suppose J is an n \times n Jordan block with eigenvalue \lambda, and compute \mathsf{exp}(J\alpha).


First, given a parameter 0 \leq t \leq n, we define a matrix N_t as follows: N_t = [a_{i,j}], where a_{i,j} = 1 if j = i+t and 0 otherwise. (That is, N_t is the n \times n matrix with 1 on the tth superdiagonal and 0 elsewhere, and N = N_1.)

Lemma: N_1^k = N_k for k \in \mathbb{N}. Proof: We proceed by induction. Certainly the result holds for k = 0 and k = 1. Now suppose it holds for some k. We have N_1(K+1) = N_1N_1^k = N_1N_k = [a_{i,j}][b_{i,j}] = [\sum_\ell a_{i,\ell}b_{\ell,j}]. Consider entry (i,j): \sum_\ell a_{i,\ell}b_{\ell,j}. If \ell \neq i+1, then a_{i,\ell} = 0. So this sum is a_{i,i+1}b_{i+1,j}. Now if j \neq i+1+t, then the whole sum is 0; otherwise, it is 1. So in fact N_1^{k+1} = N_{k+1}. \square

Now \mathsf{exp}(N\alpha) = \sum_{k \in \mathbb{N}} \dfrac{1}{k!}(\alpha N)^k = \sum_{k \in \mathbb{N}} \dfrac{1}{k!} \alpha^k N_k. As we saw in this previous exercise, powers of N past the n-1th are all zero; so in fact we have \mathsf{exp}(N\alpha) = \sum_{k=0}^{n-1} \dfrac{1}{k!} \alpha^k N_k. Note that the indices of nonzero entries of the N_k are mutually exclusive. So \mathsf{exp}(N\alpha) is the n \times n matrix whose (i,j) entry is \dfrac{\alpha^{j-i}}{(j-i)!} if j \geq i and 0 otherwise.

Now let J be the n \times n Jordan block with eigenvalue \lambda. Note that J\alpha = I\alpha + N\alpha; using this previous exercise, we have \mathsf{exp}(J\alpha) = e^\alpha \mathsf{exp}(N\alpha). (Where N is as above.)

A fact about matrix exponentials

Let \lambda \in \mathbb{C} and let M be an n \times n matrix over \mathbb{C}. Prove that \mathsf{exp}(\lambda I + M) = e^{\lambda}\mathsf{exp}(M).


Note that \lambda I and M commute. Using two previous exercises (here and here), we have \mathsf{exp}(\lambda I + M) = \mathsf{exp}(\lambda I)\mathsf{exp}(M) = \mathsf{exp}(\lambda)I \mathsf{exp}(M) = e^\lambda \mathsf{exp}(M).

For commuting matrices, the exponential of a sum is the product of exponentials

Let A and B be commuting matrices. Prove that \mathsf{exp}(A+B) = \mathsf{exp}(A)\mathsf{exp}(B), where \mathsf{exp}(x) = \sum_{k \in \mathbb{N}} \dfrac{1}{k!}x^k is a formal power series over a field with characteristic zero.


We will work in the ring F[[x]][[y]] of formal power series over y with coefficients which are formal power series over x. (Note that x and y commute.) Note that if G(x) is a formal power series and p(x) a polynomial, then G(p(x)) is the formal power series obtained by ‘collecting like terms’. Remember the binomial theorem.

\mathsf{exp}(x+y)  =  \sum_{t \in \mathbb{N}} \dfrac{1}{t!}(x+y)^t)
 =  \sum_{t \in \mathbb{N}} \dfrac{1}{t!} \sum_{k=0}^t {t \choose k} x^ky^{t-k}
 =  \sum_{t \in \mathbb{N}} \sum_{k = 0}^t \dfrac{1}{k!}x^k \cdot \dfrac{1}{(t-k)!}y^{t-k}
 =  \sum_{t \in \mathbb{N}} \sum_{h+k=t} \dfrac{1}{h!}x^h \cdot \dfrac{1}{k!}y^{k}
 =  \sum_{h \in \mathbb{N}} \sum_{k \in \mathbb{N}} \dfrac{1}{h!}x^h \cdot \dfrac{1}{k!}y^k
 =  \left( \sum_{h \in \mathbb{N}} \dfrac{1}{h!}x^h \right) \left( \sum_{k \in \mathbb{N}} \dfrac{1}{k!}y^{k} \right)
 =  \mathsf{exp}(x) \mathsf{exp}(y).

In particular, if A and B commute, then \mathsf{exp}(A+B) = \mathsf{exp}(A)\mathsf{exp}(B).

Facts about power series of matrices

Let G(x) = \sum_{k \in \mathbb{N}} \alpha_k x^k be a power series over \mathbb{C} with radius of convergence R. Let A be an n \times n matrix over \mathbb{C}, and let P be a nonsingular matrix. Prove the following.

  1. If G(A) converges, then G(P^{-1}AP) converges, and G(P^{-1}AP) = P^{-1}G(A)P.
  2. If A = B \oplus C and G(A) converges, then G(B) and G(C) converge and G(B \oplus C) = G(B) \oplus G(C).
  3. If D is a diagonal matrix with diagonal entries d_i, then G(D) converges, G(d_i) converges for each d_i, and G(D) is diagonal with diagonal entries G(d_i).

Suppose G(A) converges. Then (by definition) the sequence of matrices G_N(A) converges entrywise. Let G_N(A) = [a_{i,j}^N], P = [p_{i,j}], and P^{-1} = [q_{i,j}]. Now G_N(P^{-1}AP) = P^{-1}G_N(A)P = [\sum_\ell \sum_k q_{i,k} a_{k,\ell}^Np_{\ell,j}]. That is, the (i,j) entry of G_N(P^{-1}AP) is \sum_\ell \sum_k q_{i,k} a_{k,\ell}^Np_{\ell,j}. Since each sequence a_{k,\ell}^N converges, this sum converges as well. In particular, G(P^{-1}AP) converges (again by definition). Now since G_N(P^{-1}AP) = P^{-1}G_N(A)P for each N, the corresponding sequences for each (i,j) are equal for each term, and so have the same limit. Thus G(P^{-1}AP) = P^{-1}G(A)P.

Now suppose A = B \oplus C. We have G_N(B \oplus C) = \sum_{k=0}^N \alpha_k (B \oplus C)^k = \sum_{k=0}^N \alpha_k B^k \oplus C^k = (\sum_{k = 0}^N \alpha_k B^k) \oplus (\sum_{k=0}^N \alpha_k C^k) = G_N(B) \oplus G_N(C). Since G_N(A) converges in each entry, each of G_N(B) and G_N(C) converge in each entry. So G(B) and G(C) converge. Again, because for each (i,j) the corresponding sequences G_N(A)_{i,j} and (G_N(B) \oplus G_N(C))_{i,j} are the same, they converge to the same limit, and thus G(B \oplus C) = G(B) \oplus G(C).

Finally, suppose D is diagonal. Then in fact we have D = \bigoplus_{t=1}^n [d_t], and so by the previous part, G(D) = \bigoplus_{t=1}^n G(d_t). In particular, G(d_t) converges, and G(D) is diagonal with diagonal entries G(d_t) as desired.

On the convergence of formal power series of matrices

Let G(x) = \sum_{k \in \mathbb{N}} \alpha_kx^k be a formal power series over \mathbb{C} with radius of convergence R. Let ||A|| = \sum_{i,j} |a_{i,j}| be the matrix norm introduced in this previous exercise.

Given a matrix A over \mathbb{C}, we can construct a sequence of matrices by taking the Nth partial sum of G(A). That is, G_N(A) = \sum_{k = 0}^N a_kA^k. This gives us n^2 sequences \{c_{i,j}^N\} where c_{i,j}^N is the (i,j) entry of G_N(A). Suppose c_{i,j}^N converges to c_{i,j} for each (i,j), and let C = [c_{i,j}]. In this situation, we say that G_N(A) converges to C, and that G(A) = C. (In other words, G_N(A) converges precisely when each entrywise sequence G_N(A)_{i,j} converges.)

  1. Prove that if ||A|| \leq R, then G_N(A) converges.
  2. Deduce that for all matrices A, the following power series converge: \mathsf{sin}(A) = \sum_{k \in \mathbb{N}} \dfrac{(-1)^k}{(2k+1)!}A^{2k+1}, \mathsf{cos}(A) = \sum_{k \in \mathbb{N}} \dfrac{(-1)^k}{(2k)!}A^{2k}, and \mathsf{exp}(A) = \sum_{k \in \mathbb{N}} \dfrac{1}{k!} A^k.

[Disclaimer: My facility with even simple analytical concepts is laughably bad, but I’m going to give this a shot. Please let me know what’s wrong with this solution.]

We begin with a lemma.

Lemma: For all (i,j), (A^k)_{i,j} \leq ||A||^k, where the subscripts denote taking the (i,j) entry. Proof: By the definition of matrix multiplication, we have A^k = [\sum_{t \in n^{k-1}} \prod_{i=0}^{k-1} a_{t(i), t(i+1)}], where t(0) = i and t(k+1) = j. (I’m abusing the notation a bit here; t is an element of \{1,\ldots,n\}^{k-1}, which we think of as a function on \{1,\ldots,k-1\}.) Now |A^k_{i,j}| \leq \sum_{t \in n^{k-1}} \prod_{i=0}^{k-1} |a_{t(i), t(i+1)}| by the triangle inequality. Note that ||A||^k is the sum of all possible k-fold products of (absolute values of) entries of A, and that we have |A^k_{i,j}| bounded above by a sum of some distinct k-fold products of (absolute values of) entries of A. In particular, |A^k_{i,j}| \leq ||A||^k since the missing terms are all positive. Thus |\alpha_k A^k_{i,j}| = |(\alpha_k A^k)_{i,j}| \leq |\alpha_k| ||A||^k.

Let us define the formal power series |G|(x) by |G|(x) = \sum |\alpha_k| x^k. What we have shown (I think) is that \sum_{k=0}^N |\alpha_k A^k_{i,j}| \leq |G|_N(||A||), using the triangle inequality.

Now recall, by the Cauchy-Hadamard theorem, that the radius of convergence of G(x) satisfies 1/R = \mathsf{lim\ sup}_{k \rightarrow \infty} |\alpha_k|^{1/k}. In particular, |G| has the same radius of convergence as G. So in fact the sequence |G|_N(||A||) converges. Now the sequence \sum_{k=0}^N |\alpha_k A^k_{i,j}| is bounded and monotone increasing, and so must also converge. That is, \sum_{k=0}^N (\alpha_k A^k)_{i,j} = G_N(A)_{i,j} is absolutely convergent, and so is convergent. Thus G(A) converges.

Now we will address the convergence of the series \mathsf{sin}. Recall that the radius of convergence R of \mathsf{sin} satisfies R^{-1} = \mathsf{lim\ sup}_{k \rightarrow \infty} |\alpha_k|^{1/k} \mathsf{lim}_{k \rightarrow \infty} \mathsf{sup}_{n \geq k} |\alpha_k|^{1/k}. Now \alpha_k = 0 if k is even and (-1)^t/k! if k = 2t+1 is odd. So |\alpha_k|^{1/k} = 0 if k is even and |1/k!|^{1/k} = 1/(k!)^{1/k} > 0 if k is odd.

Brief aside: Suppose k \geq 1. Certainly 1 \leq k! < (k+1)^k, so that (k!)^{1/k} < k+1. Now 1 \leq (k!)^{1 + 1/k} < (k+1)!, so that (k!)^{1/k} < ((k+1)!)^{1/(k+1)}. In particular, if n > k, then (k!)^{1/k} < (n!)^{1/n}.

By our brief aside, we have that \mathsf{sup}_{n \geq k} |\alpha_k|^{1/k} = 1/(k!)^{1/k} if k is odd, and 1/((k+1)!)^{1/(k+1)} if k is even. So (skipping redundant terms) we have \mathsf{sin}(x) satisfies R^{-1} = \mathsf{lim}_{k \rightarrow \infty} 1/(k!)^{1/k}. We know from the Brief Aside that this sequence is monotone decreasing. Now let \varepsilon > 0. Consider now the sequence \varepsilon, 2 \varepsilon, 3 \varepsilon, et cetera. only finitely many terms of this sequence are less than or equal to 1. So there must exist some k such that the product \prod_{i=1}^k \varepsilon i is greater than 1. Now \varepsilon^k k! > 1, so that \varepsilon > 1/(k!)^{1/k}. Thus the sequence 1/(k!)^{1/k} converges to 0, and so the radius of convergence of \mathsf{sin}(x) is \infty.

By a similar argument, the radii of convergence for \mathsf{cos}(X) and \mathsf{exp}(X) are also \infty.

Properties of a matrix norm

Given an n \times n matrix A over \mathbb{C}, define ||A|| = \sum_{i,j} |a_{i,j}|, where bars denote the complex modulus. Prove the following for all A,B \in \mathsf{Mat}_n(\mathbb{C}) and all \alpha \in \mathbb{C}.

  1. ||A+B|| \leq ||A|| + ||B||
  2. ||AB|| \leq ||A|| \cdot ||B||
  3. ||\alpha A|| = |\alpha| \cdot ||A||

Say A = [a_{i,j}] and B = [b_{i,j}].

We have ||A+B|| = \sum_{i,j} |a_{i,j} + b_{i,j}| \leq \sum_{i,j} |a_{i,j}| + |b_{i,j}| by the triangle inequality. Rearranging, we have ||A+B|| \leq (\sum_{i,j} |a_{i,j}| + \sum_{i,j} |b_{i,j}| = ||A|| + ||B|| as desired.

Now ||AB|| = \sum_{i,j} |\sum_k a_{i,k}b_{k,j}| \leq latex \sum_{i,j} \sum_k |a_{i,k}|b_{k,j}|$ by the triangle inequality. Now ||AB|| \leq \sum_{i,j} \sum_{k,t} |a_{i,k}| |b_{t,j}| since all the new terms are positive, and rearranging, we have ||AB|| \leq \sum_{i,k} \sum_{j,t} |a_{i,k}||b_{t,j}| = (\sum_{i,k} |a_{i,k}|)(\sum_{j,t} |b_{t,j}|) = ||A|| \cdot ||B||.

Finally, we have ||\alpha A|| = \sum_{i,j} |\alpha a_{i,j}| = |\alpha| \sum_{i,j} |a_{i,j}| = |\alpha| \cdot ||A||.

Matrices with square roots over fields of characteristic 2

Let F be a field of characteristic 2. Compute the Jordan canonical form of a Jordan block J of size n and eigenvalue \lambda over F. Characterize those matrices A over F which are squares; that is, characterize A such that A = B^2 for some matrix B.


Let J = [b_{i,j}] be the Jordan block with eigenvalue \lambda and size n. That is, b_{i,j} = \lambda if j = i, 1 if j = i+1, and 0 otherwise. Now J^2 = [\sum_k b_{i,k}b_{k,j}]; if k \neq i or k \neq i+1, then b_{i,k} = 0. Evidently then we have (J^2)_{i,j} = \lambda^2 if j = i, 1 if j = i+2, and 0 otherwise, Noting that 2 = 0. So J^2 - \lambda^2 I = \begin{bmatrix} 0 & I \\ 0_2 & 0 \end{bmatrix}, where 0_2 is the 2 \times 2 zero matrix and I is the (n-2) \times (n-2) identity matrix. Now let v = \begin{bmatrix} V_1 \\ V_2 \end{bmatrix}, where V_1 has dimension 2 \times 1. Now (J^2 - \lambda^2 I)v = \begin{bmatrix} V_2 \\ 0 \end{bmatrix}. That is, J^2 - \lambda^2 I ‘shifts’ the entries of v– so e_{i+2} \mapsto e_i and e_1, e_2 \mapsto 0. In particular, the kernel of J^2 - \lambda^2 I has dimension 2, so that by this previous exercise, the Jordan canonical form of J^2 has two blocks (both with eigenvalue \lambda^2.

Now J^2 - \lambda^2 I = (J + \lambda I)(J - \lambda I) = (J - \lambda I)^2, since F has characteristic 2. Note that J-\lambda I has order n, since (evidently) we have (J - \lambda I)e_{i+1} = e_i and (J - \lambda I)e_1 = 0. So J-\lambda I has order n. If n is even, then (J^2 - \lambda^2 I)^{n/2} = 0 while (J^2 - \lambda^2 I)^{n/2-1} \neq 0, and if n is odd, then (J^2-\lambda^2 I)^{(n+1)/2} = 0 while (J^2 - \lambda^2 I)^{(n+1)/2-1} \neq 0. So the minimal polynomial of J^2 is (x-\lambda^2)^{n/2} if n is even and (x-\lambda^2)^{(n+1)/2} if n is odd.

So the Jordan canonical form of J^2 has two Jordan blocks with eigenvalue \lambda^2. If n is even, these have size n/2,n/2, and if n is odd, they have size (n+1)/2, (n-1)/2.

Now let A be an arbitrary n \times n matrix over F (with eigenvalues in F). We claim that A is a square if and only if the following hold.

  1. The eigenvalues of A are square in F
  2. For each eigenvalue \lambda of A, the Jordan blocks with eigenvalue \lambda can be paired up so that the sizes of the blocks in each pair differ by 0 or 1.

To see the ‘if’ part, suppose P^{-1}AP = \bigoplus H_i \oplus K_i is in Jordan canonical form, where H_i and K_i are Jordan blocks having the same eigenvalue \lambda_i and whose sizes differ by 0 or 1. By the first half of this exercise, H_i \oplus K_i is the Jordan canonical form of J_i^2, where J_i is a Jordan block with eigenvalue \sqrt{\lambda_i}. Now A is similar to the direct sum of these J_i^2, and so Q^{-1}AQ = J^2. Then A = (Q^{-1}JQ)^2 = B^2 is square.

Conversely, suppose A = B^2 is square, and say P^{-1}BP = J is in Jordan canonical form. So P^{-1}AP = J^2. Letting J_i denote the Jordan blocks of J, we have P^{-1}AP = \bigoplus J_i^2. The Jordan canonical form of J_i^2 has two blocks with eigenvalue \lambda_i^2 and whose sizes differ by 0 or 1, by the first half of this exercise. So the Jordan blocks of A all have eigenvalues which are square in F and can be paired so that the sizes in each pair differ by 0 or 1.

Matrices with square roots

Let F be a field whose characteristic is not 2. Characterize those matrices A over F which have square roots. That is, characterize A such that A = B^2 for some matrix B.


We claim that A has a square root if and only if the following hold.

  1. Every eigenvalue of A is square in F.
  2. The Jordan blocks of A having eigenvalue 0 can be paired up in such a way that the sizes of the blocks in each pair differ by 0 or 1.

First we tackle the ‘if’ part. Suppose P^{-1}AP = \bigoplus J_i \oplus \bigoplus H_i \oplus \bigoplus K_i is in Jordan canonical form, where J_i are the Jordan blocks having nonzero eigenvalue \lambda_i and H_i and K_i are the blocks with eigenvalue 0 such that the sizes of H_i and K_i differ by 0 or 1. As we showed in this previous exercise, J_i = Q_iM_i^2Q_i^{-1} is the Jordan canonical form of the square of the Jordan block M_i with eigenvalue \sqrt{\lambda_i}, and H_i \oplus K_i = T_iN_i^2T_i^{-1} is the Jordan canonical form of the square of the Jordan block whose size is \mathsf{dim}\ H_i + \mathsf{dim}\ K_i with eigenvalue 0. So we have P^{-1}AP = C^2, and so A = (P^CP^{-1})^2 = B^2 is a square.

Conversely, suppose A = B^2 is a square, and say P^{-1}BP = J is in Jordan canonical form. Now PAP^{-1} = J^2. Now say Q^{-1}J^2Q = K is in Jordan canonical form; then Q^{-1}PAP^{-1}Q = K. By the previous exercise, the nonzero eigenvalues of K are square in F, and the Jordan blocks having eigenvalue 0 can be paired so that the sizes in each pair differ by 0 or 1.

On the square of a Jordan block

Let J be a Jordan block of size n and eigenvalue \lambda over a field K whose characteristic is not 2.

  1. Suppose \lambda \neq 0. Prove that the Jordan canonical form of J^2 is the Jordan block of size n with eigenvalue \lambda^2.
  2. Suppose \lambda = 0. Prove that the Jordan canonical form of J^2 has two blocks (with eigenvalue 0) of size n/2, n/2 if n is even and of size (n+1)/2, (n-1)/2 if n is odd.

First suppose \lambda \neq 0.

Lemma: Let e_i denote the ith standard basis element (i.e. e_i = [\delta_{i,1}\ \delta_{i,2}\ \ldots\ \delta_{i,n}]^\mathsf{T}. If 1 \leq i < n-2, then (J^2-\lambda^2I)e_{i+2} = 2\lambda e_{i+1} + e_i, (J^2-\lambda^2I)e_{2} = 2\lambda e_1, and (J^2-\lambda^2I)e_1 = 0. Proof: We have J^2-\lambda^2I = (J+\lambda I)(J-\lambda I). Evidently, (J-\lambda I)e_{i+2} = e_{i+1}. Now J+\lambda I = [b_{j,k}], where b_{j,k} = 2\lambda if j = k and 1 if k = j+1 and 0 otherwise. So (J+\lambda I)e_{i+1} = [\sum_k b_{j,k} \delta_{k,i+1}] = [b_{j,i+1}] = 2\lambda e_{i+1} + e_i if i > 1, and similarly (J^2-\lambda^2 I)(e_2) = 2\lambda e_1. \square

In particular, note that (J^2 - \lambda^2 I)^{n-1} e_n = 2^{n-1}\lambda^{n-1} e_1 is nonzero (here we use the noncharacteristictwoness of K), while (J^2 - \lambda^2 I)^n = 0. So the minimal polynomial of J^2 is (x-\lambda^2)^n. Thus the Jordan canonical form of J^2 has a single Jordan block, of size n, with eigenvalue \lambda^2.

Now suppose \lambda = 0. Evidently, we have Je_{i+1} = e_i and Je_1 = 0. Now J^n = 0 and J^{n-1} \neq 0. If n is even, we have (J^2)^{n/2} = 0 and (J^2)^{n/2-1} = J^{n-2} \neq 0, so that the minimal polynomial of J^2 is x^{n/2}. If n is odd, we have (J^2)^{(n+1)/2} = 0 while (J^2)^{(n+1)/2-1} = J^{n-1} \neq 0, so the minimal polynomial of J^2 is x^{(n+1)/2}.

Next, we claim that \mathsf{ker}\ J^2 has dimension 2. To this end, note that J^2e_{i+2} = e_i is nonzero, while J^2e_2 = J^2e_1 = 0. By this previous exercise, J^2 has 2 Jordan blocks; thus the blocks of J^2 both have eigenvalue 0 and are of size n/2,n/2 if n is even, and (n+1)/2, (n-1)/2 if n is odd.