Here's why:
Look at a smallest size matrix A for which the result is false.
Suppose that A has size n > 1. (Any 1 by 1 matrix is already
upper triangular.)
Pick one eigenvector v = v_1 and include v_1 in a basis for C^n. Call the corresponding eigenvalue c, i.e., Av = cv. Consider a matrix for the transformation given by A with respect to this basis. Since Av = cv, the first column of the new matrix is
c
0
0
.
.
.
0
and so A is similar to a matrix A' in block form:
c B
0 D
where [c] is a 1 by 1 block, B is a 1 by n-1 block, 0 denotes an n-1 by 1 block, and D is an n-1 by n-1 block.
Because D is size n-1, we can choose an invertible matrix Q (size n-1 by n-1) such that Q^{-1} D Q is upper triangular. Let S be the block matrix
1 0
0 Q
where the block of 0's in the first row is 1 by n-1. Then S^{-1} =
1 0 0 Q^{-1}
and it is easy to check that
S^{-1} A' S =
c E 0 Q^{-1} D Q
which is upper triangular. Thus, we can get a similar matrix that is upper triangular, and there is no smallest size for which the result fails. QED
The elements on the diagonal of the similar upper triangular matrix will, of course, be the eigenvalues of the original matrix.
Fact: An n by n matrix is nilpotent (some power equal to 0) if and only if all eigenvalues are 0. In this case, A^n = 0.
Here's why:
the issue is unaffected by replacing the matrix by
a similar matrix. Therefore, we can assume that the matrix is
upper triangular. If any eigenvalue c is not 0, the matrix cannot
be nilpotent: c^t will be an eigenvalue of A^t, which means
that A^t cannot be zero. Therefore, we only need show that an
upper triangular matrix such that the diagonal is 0 has n th power
equal to 0. The condition implies that e_1 maps to 0, e_2 maps
into Span{e_1}, e_3 maps into Span{e_1, e_2}, etc. From
this one checks that the image of A^k is contained in the span
of e_1, e_2, ..., e_{n-k} for each k = 1, 2, 3, ..., n-1.
E.g., the image of A^{n-1} is contained in Span{e_1}, and
applying A one more time produces 0.
A much more precise result, called Jordan canonical form, is known. A Jordan block of size k is an upper triangular matrix with the following properties:
(1) the main diagonal consists of a single constant c (2) the diagonal just above the main diagonal consists of k-1 1's (3) all other entries are 0.
For example, a Jordan block looks like
c (k = 1) or c 1 0 c (k = 2) or c 1 0 0 0 0 c 1 0 0 0 0 c 1 0 0 0 0 c 1 0 0 0 0 c (k = 5)
The main theorem is that every square matrix over C is similar to a matrix which in block form looks like
A_1 0 0 0 ... 0 0 A_2 0 0 ... 0 0 0 A_3 0 ... 0 . . . . ... . . . . . ... . . . . . ... . 0 0 0 . ... A_h
where every A_s is a Jordan block. The Jordan form is unique except for the order in which the Jordan blocks are "strung out" along the main diagonal. Thus, every matrix is similar to a matrix in Jordan form, and two matrices in Jordan form are similar if and only if they have the same blocks, although not necessarily arranged in the same order.
E.g. if there are two blocks of size one, one of size 2, and one of size 3, the matrix would look like this
a 0 0 0 0 0 0
0 b 0 0 0 0 0
0 0 c 1 0 0 0
0 0 0 c 0 0 0
0 0 0 0 d 1 0
0 0 0 0 0 d 1
0 0 0 0 0 0 d
When a matrix is in Jordan form, it is upper triangular. Any nonzero entries not on the main diagonal must be 1's, and these must be just above the main diagonal.
One can say more. One can write a matrix in Jordan form as the sum of a diagonal matrix D with the same diagonal entries, and an upper triangular matrix N with all 0's on the diagonal. If the size of the matrix is n, then N is nilpotent: in fact, N^n = 0. Morever, D and N commute: DN = ND.
The punchline:
Every square matrix A over the complex numbers can be written
as S (D + N) S^{-1}, where D is diagonal, N is nilpotent,
and DN = ND.
It is important to note that when A is diagonalizable, the Jordan form of A is simply a diagonal matrix with the eigenvalues of A in some order on the diagonal. The Jordan blocks all have size 1 by 1, and N = 0.
There is an algorithmic process for finding a matrix S such that S^{-1} A S is in Jordan form, but we shall not give the details here: the method, along with an explanation of uniqueness, can be found in any text for a second course in linear algebra. However, we shall explain some applications.
By computing the Jordan form, one can give a "closed" expression for the high powers of a matrix A, which is what one wants for studying a discrete dynamical system. The idea is that, using Jordan form, one has
A = S (D+N) S^{-1}
where D is diagonal, N is nilpotent, and DN = ND.
Then A^t = S (D+N)^t S^{-1}
To carry this further we illustrate what happens if, saym N^3 = 0. There are similar formulas no matter which power of N is 0. When D and N commute, one can apply the binomial theorem.
Thus, (D + N)^t = D^t + t D^{t-1} N + (1/2)t(t-1) D^{t-2} N^2,
and all other terms are 0 because they involve N^3 or a higher power of N.
Because D is diagonal, it is easy to find powers of D.
Suppose we want to find e^A, the exponential of the square matrix A. As before, we can write
A = S (D + N) S^{-1}
with D diagonal, N nilpotent, and DN = ND.
Then e^{D + N} = e^D e^N. Both terms can be calculated because e^D is diagonal (exponentiate each diagonal entry of D), and e^N is a polynomial in N: this high powers of N.
Thus, e^A = S (e^D e^N) S^{-1} can be calculated. If u represents a variable taking on real (or complex) values, we have that uA = uD + uN as well, where uD is diagonal, uN is nilpotent, and the two still commute. Thus, we get a formula
e^{uA} = S (e^{uD} e{uN}) S^{-1}
as well, valid for every value of the scalar u.
Let y_1, ..., y_n be functions of a variable x. For the moment we let n = 2.
We may have a linear system of differential equations like:
dy_1/dx = 3 y_1 + 4 y_2
dy_2/dx = 7 y_1 - 5 y_2
If we make the unknown functions into a vector
Y =
y_1 y_2
this can be rewritten as
dY/dx = AY
where A =
3 4
7 -5
More generally, we consider systems
dY/dx = AY
where Y =
y_1
y_2
.
.
.
y_n
and A is an n by n matrix of real scalars.
If n = 1, we just have the simple differential equation dy/dx = ay, and the solution is
y = e^{xa}c
where c is y(0).
It turns out that in the general case the linear system
dY/dx = AY
has the solution
Y = e^{xA}C
where C = Y(0) is a vector. Thus, the solution of such linear systems reduces to finding the Jordan form of A, which, when A is diagonalizable over C, comes down to diagonalizing A. This explains the crucial importance of eigenvalues in solving linear systems of differential equations.