THE FUNDAMENTAL THEOREM OF ALGEBRA

Our object is to prove that every nonconstant polynomial f(z) in one variable z over the complex numbers C has a root, i.e. that there is a complex number r in C such that f(r) = 0. Suppose that

f(z) = a_n z^n + a_{n-1} z^{n-1} + ... + a_1 z + a_0

where n is at least 1, a_n is not 0 and the coefficients a_i are fixed complex numbers. The idea of the proof is as follows: we first show that as |z| approaches infinity , |f(z)| approaches infinity as well. Since |f(z)| is a continuous function of z, it follows that it has an absolute minimum. We shall prove that this minimum must be zero, which establishes the theorem.

The key point: one can get the absolute value of a nonconstant COMPLEX polynomial at a point where it does not vanish to decrease by moving along a line segment in a suitably chosen direction.

We first review some relevant facts from calculus.

Properties of real numbers and continuous functions

Fact 1. Every sequence of real numbers has a monotone (nondecreasing or nonincreasing) subsequence.

Proof. If the sequence has some term which occurs infinitely many times this is clear. Otherwise, we may choose a subsequence in which all the terms are distinct and work with that. Hence, assume that all terms are distinct. Call an element "good" if it is bigger than all the terms that follow it. If there are infinitely many good terms we are done: they will form a decreasing subsequence. If there are only finitely many pick any term beyond the last of them. It is not good, so pick a term after it that is bigger. That is not good, so pick a term after it that is bigger. Continuing in this way (officially, by mathematical induction) we get a strictly increasing subsequence. QED

Fact 2. A bounded monotone sequence of real numbers converges.

Proof. This is sometimes taken as an axiom about the reals. What is given here is an intuitive justification. We assume the sequence is nondecreasing: for the other case, take the negatives. The boundedness forces the integer parts of the terms in the sequence to stabilize, since a bounded montone sequence of integers is eventually constant. Thus, eventually, the terms have the form m + f_n where m is a fixed integer, 0 is less than or equal to f_n < 1, and the f_n are nondecreasing. The first digits of the f_n (after the decimal point) don't decrease and eventually stabilize: call the stable value a_1. For the f_n which begin with .a_1 ... the second digits increase and eventually stabilize: call the stable value a_2. Continuing in this fashion we construct a number f = .a_1 a_2 ... a_h ... with the property that eventually all the f_n agree with it to as many decimal places as we like. It follows that f_n approaches f as n approaches infinity, and that the original sequence converges to m + f. QED

Fact 3. A bounded sequence of real numbers has a convergent subsequence. If the sequence is in the closed interval [a,b], so is the limit.

Proof. Using Fact 1, it has a monotone subsequence, and this is also bounded. By Fact 2, the monotone subsequence converges. We leave it to the reader to check that if all terms are at most b (respectively, at least a) then so is the limit. QED

Fact 4. A sequence of points inside a closed rectangle (say a is less than or equal to x is less than or equal to b, c is less than or equal to y is less than or equal to d) has a convergent subsequence. The limit is also in the rectangle.

Proof. Using Fact 3 we can pick out a subsequence such that the x-coordinates converge to a point in [a,b]. Applying Fact 3 again, from this subsequence we can pick out a further subsequence such that the y-coordinates converge to a point in [c,d]. The x-coordinates of the points in this last subsequence still converge to a point in [a,b]. QED

Fact 5. A continuous real-valued function f defined on a closed rectangle in the plane is bounded and takes on an absolute minimum and an absolute maximum value.

Proof. We prove the result for the maximum: for the other case consider -f. For each integer n = 1, 2, 3, ... divide the rectangle into n^2 congruent subrectangles by drawing n-1 equally spaced vertical lines and the same number of equally spaced horizontal lines. Choose a point in each of these subrectangles (call these the special points at step n) and evaluate f at these points. From among these choose a point where f is biggest: call it P_n. The sequence P_1, P_2, P_3, ... has a convergent subsequence: call it Q_1, Q_2, Q_3, ..., where Q_k = P_{n_k}. Let Q be the limit of the sequence Q_k. It will suffice to show that f(Q) is bigger than or equal to f(P) for every point P in the rectangle. If not choose P in the rectangle such that f(P) > f(Q). For each k let P'_k be a special point at step n_k in a subrectangle (among the (n_k)^2) that contains P. It follows that P'_k approaches P as k approaches infinity, since both sides of the subrectangles approach zero as k approaches infinity. For every k, f(Q_k) is at least f(P'_k), by the choice of Q_k. Taking the limit as k approaches infinity gives a contradiction, since f(Q_k) approaches f(Q) and, by the continuity of f, f(P'_k) approaches f(P) as k approaches infinity. QED

Remark. The result is valid for a continuous real-valued function on any closed bounded set in R^2 (or R^n), where a set is closed if whenever a sequence of points in the set converges, the limit point is in the set.

Fact 6. Let f be a continuous real-valued function on the plane such that f(x,y) approaches infinity as (x,y) approaches infinity. (This means that given any real number B, no matter how large, there is a real number m > 0 such that if x^2 + y^2 is at least m then f(x,y) is at least B.) Then f takes on an absolute minimum value at a point in the plane.}

Proof. Let B = f(0,0). Choose m > 0 such that if x^2 + y^2 is at least m then f(x,y) is at least B. Choose a rectangle that contains the circle of radius m centered at the origin. Pick Q in the rectangle so that the minimum value of f on the rectangle occurs at Q. Since (0,0) is in the rectangle f(Q) is at most B. Since outside the rectangle all values of f are at least B, the value of f at Q is a minimum for the whole plane, not just the rectangle. QED

Fact 7. Let g be a continuous function of one real variable which takes on the values c and d on a certain interval. Then g takes on every value r between c and d on that interval.

Proof. Let g(a) = c and g(b) = d. We may assume without loss of generality that a < b. Replacing g by g - r we may assume that g is positive at one endpoint, negative at the other, and never vanishes. We shall get a contradiction. We shall construct a sequence of intervals I_1 = [a,b], I_2, ...., I_n, ... such that I_{n+1} is contained in I_n for each n, g has values of opposite sign at the endpoints of every I_n, and I_n has length (b-a)/2^{n-1}. I n fact if I = [a_n,b_n] has already been constructed and M is the midpoint of I_n, then g(M) has opposite sign from at least one of the numbers g(a_n), g(b_n), and so we can choose one of [a_n, M] or [M, b_n] for I_{n+1}. The a_n are nondecreasing, the b_n are nonincreasing, and a_n < b_n for all n. It follows that both sequences have limits. But b_n - a_n approaches 0 as n approaches infinity, so the two limits are the same. Call the limit h. Since a_n approaches h, g(a_n) approaches g(h). Similarly, g(b_n) approaches g(h). Since g(a_n) and g(b_n) have opposite signs for all n, this can only happen if g(h) = 0. QED

Remark. Consider a polynomial f(x) with real coefficients of odd degree. Then Fact 7 implies that f has at least one real root. To see this, we may assume that f has a positive leading coefficient (otherwise, replace f by -f). It is then easy to see that f(x) approaches +infinity as x approaches +infinity while f(x) approaches -infinity as x approaches -infinity. Since f(x) takes on both positive and negative values, Fact 7 implies that f takes on the value zero.

More about the complex numbers

We write <= for "less than or equal to", and >= for "greater than or equal to". We want to note that if u, u' are complex numbers then |u + u'| <= |u| + |u'|.

To see this note that, since both sides are nonnegative, it suffices to prove this after squaring both sides, i.e. to show that |u + u'|^2 <= |u|^2 + 2|uu'| +|u'|^2. Now, it is easy to see that for any complex number v, |v|^2 = v(v-), where v- denotes the complex conjugate of v. Using this the inequality above is equivalent to (u + u')(u + u')- <= u u- + 2|uu'| + u'(u')-. Multiplying out, and canceling the terms which occur on both sides, yields the equivalent inequality u(u')- + (u-)u' <= 2|uu'| = 2|u||u'| = 2|u||(u')-| = 2|u(u')-|. Let w = u(u')-. Then w- = (u-)(u'-)- = (u-) u'. Thus, what we want to show is that w + w- <= 2|w|. If w = a + bi where a, b are real this becomes the assertion that 2a <= 2 {a^2+b^2}^{1/2} or a <= \sqrt{a^2 + b^2}, which is clear. Moreover, equality holds if and only if a is nonnegative and b is zero, i.e., if and only if w = u(u')- is a nonnegative real number.

We also get that |u plus or minus u'| >= |u| - |u'|: replacing u' by -u' if necessary we can assume the sign is -, and we already know that |u| <= |u-u'| + |u'|, which is equivalent.

Finally, we want to justify carefully why, when n is a positive integer, every complex number has an n^{th} root. First note that the fact holds for nonnegative real numbers r using Fact 6 applied to the function F:R to R given by F(x) = x^n - r: the function is nonpositive at x = 0 and positive for all sufficiently large x, and so takes on the value 0. We can now construct an n^{th} root for r cis t, namely r^{1/n} cis (t/n), using DeMoivre's formula.

Proof of the fundamental theorem of algebra

Let f(z) = a_n z^n + ... + a_0, where the a_i are in C, n > 0, and a_n is not 0. If we let z = x + yi with x and y varying in R, we can think of f(z) as a C-valued function of x and y. In fact if we multiply out and collect terms we get a formula f(z) = P(x,y) + iQ(x,y) where P and Q are polynomials in two real variables x, y. We can therefore think of |f(x+yi)| = (P(x,y)^2 +Q(x,y)^2)^{1/2} as a continuous function of two real variables. We want to apply Fact 6 to conclude that it takes on an absolute minimum. Thus, we need to understand what happens as (x,y) approaches infinity. But we have |f(z)| = |z|^n |a_n||1 + (b_{n-1}/z + b_{n-2}/z^2 + ... + b_0/z^n )|, where b_i = a_i/a_n for 0 <= i <= n-1. Now

|1 + (b_{n-1}/z + b_{n-2}/z^2 + ... + b_0/z^n )| >
         |1| - |(b_{n-1}/z + b_{n-2}/z^2 + ... + b_0/z^n )|       (##)

The term that we are subtracting on the right is at most

|b_{n-1}|/|z| + |b _{n-2}|/|z|^2 + ... + |b_0|/|z|^n,

and this approaches 0 at |z| approaches infinity. Hence, for all sufficiently large |z|, the quantity on the left in the inequality labeled (##) is at least 1/2, and so |f(z)| is at least |z|^n |a_n|(1/2) for large |z|. Thus, |f(z)| approaches infinity as |z| approaches infinity. This implies, by Fact 6, that we can choose a point z = r = a + bi where |f(z)| is an absolute minimum.

The rest of the argument is devoted to showing that f(r) must be zero. We assume otherwise and get a contradiction. To simplify calculations we are going to make several changes of variable.

Simplification 1. Let g(z) = f(z+r), which is also a polynomial in z of degree n. g (resp. |g|) takes on the same set of values as f (resp. |f|). But |g| is minimum at z = 0, where the value is |f(0+r)| = |f(r)|. Thus, we may assume without loss of generality that |f| is minimum at z = 0. (We change back the notation and don't refer to g any more.) We are now assuming that a_0 = f(0) is not 0. Let a = |a_0|.

Simplification 2. Replace f by (1/a_0)f. All values of f are divided by a_0. All values of |f| are divided by a. The new function still has its minimum absolute value at z = 0. But now the minimum is 1. We still write f for the function. Thus, we can assume that f(0) is 1 (this means that a_0 = 1) and that 1 is the minimum of |f|.

Simplification 3. We know that a_n is not 0. Let k be the least positive integer such that a_k is not 0. (It might be 1 or n.) We can write f(z) = 1 + a_k z^k + ... + a_n z^n. We next observe that if we replace f(z) by f(cz) where c is a fixed nonzero complex number the set of values of f (and of |f|) does not change, nor does the constant term, and 0 = c(0) stays fixed. The new f has all the same properties as the old: its absolute value is still minimum at z = 0. It makes life simplest if we choose c to be a k^{th} root of (-1/a_k). The new f we get when we substitute cz for z then turns out to be 1 - z^k + a'_{k+1}z^{k+1} + ... + a'_n z^n. Thus, there is no loss of generality in assuming that a_k = -1. Therefore, we may assume that f(z) = 1 - z^k + a_{k+1}z^{k+1}+ ... +a_n z^n. If n = k then f(z) = 1 - z^n and we are done, since f(1) = 0. Assume from here on that k is less than n.

The main point. We are now ready to finish the proof. All we need to do is show with f(z) = 1 - z^k + a_{k+1}z^{k+1} + ...+ a_n z^n that the minimum absolute value of f is less than 1, contradicting the situation we have managed to set up by assuming that f(r) is not 0. We shall show that the absolute value is indeed less than one when z is a small positive real number (as we move in the complex plane away from the origin along the positive real axis or x-axis, the absolute value of f(z) drops from 1.) To see this assume that z is a positive real number with 0 < z < 1. Note that 1-z is then positive. We can then write

|f(z)|
= |1 - z^k + a_{k+1}z^{k+1} + ... + a_n z^n |
<= |1-z^k| + |a_{k+1}z^{k+1} + ... +a_n z^n|
= 1-z^k + |a_{k+1}z^{k+1} + ... + a_n z^n|
<= 1-z^k + |a_{k+1}|z^{k+1} + ... +|a_n|z^n
(keep in mind that z is a positive real number)
= 1 - z^k (1 - w_z), where
w_z = |a_{k+1}|z + ... +|a_n|z^{n-k}.

When z approaches 0 through positive values so does w_z. Hence, when z is a small positive real number, so is z (1-w_z), and so for z a small positive real number we have that

0 < 1 - z^k(1-w_z) < 1.

Since |f(z)| < 1 - z^k(1-w_z)

it follows that |f(z)| takes on values smaller than 1, and so |f(z)| is not minimum at z = 0 after all. This is a contradiction. It follows that the minimum value of |f(z)| must be 0, and so f has a root. QED