# On the location of the Ritz values in the Arnoldi process.

1. Introduction. Approximations to (a few of) the eigenvalues (and eigenvectors) of large sparse non-Hermitian matrices are often computed with (variants of) the Arnoldi process. One of the most popular software packages is ARPACK [13]. It is used, for instance, in the Matlab function eigs. It uses the Implicitly Restarted Arnoldi Method. In this paper we are concerned with the standard Arnoldi process, which, for a matrix A of order n and a starting vector v (assumed to be of unit norm), computes a unitary matrix V with columns [v.sub.i], where [v.sub.1] = V[e.sub.1] = v, and an upper Hessenberg matrix H with positive real subdiagonal entries [h.sub.j+1,j], j = 1, ..., n - 1, such that

AV = VH,

if it does not stop before iteration n, a situation that we assume throughout this paper. The approximations of the eigenvalues of A (called the Ritz values) at step k are the eigenvalues [[theta].sup.(k).sub.i] of [H.sub.k], the leading principal submatrix of order k of H. The approximate eigenvectors are [x.sub.i] = [V.sub.n,k][z.sup.(k).sub.i], where [z.sup.(k).sub.i] is the eigenvector associated with [[theta].sup.(k).sub.i] and [V.sub.n,k] is the matrix of the first k columns of V. In the sequel we will mainly consider the step k, so we will sometimes drop the superscript (k). The relation satisfied by [V.sub.n,k] is

A[V.sub.n,k] = [V.sub.n,k][H.sub.k] + [h.sub.k+1,k][v.sub.k + 1][e.sup.T.sub.k],

where [e.sub.k] is the last column of the identity matrix of order k. This equation indicates how to compute the next column [v.sub.k+1] of the matrix V and the kth column of H. When A is symmetric or Hermitian, the Arnoldi process reduces to the Lanczos algorithm, in which the matrix H is a symmetric tridiagonal matrix. There are many results on the convergence of the Lanczos Ritz values in the literature; see, for instance, [17, 18, 19, 21]. Most of them are based on the Cauchy interlacing theorem, which states that the Ritz values satisfy

[[theta].sup.(k+1).sub.1] < [[theta].sup.(k).sub.1] < [[theta].sup.(k+1).sub.2] < [[theta].sup.(k).sub.2] < ... < [[theta].sup.(k).sub.k] < [e.sup.(k+1).sub.k+1],

and they are related to the eigenvalues A j by

[[lambda].sub.j] < [[theta].sup.(k).sub.j], [[theta].sup.(k).sub.k+1-j] < [[lambda].sub.n+1-j], 1 [less than or equal to] j [less than or equal to] k.

It is generally admitted that convergence of the Lanczos process for Hermitian matrices is well understood. Unfortunately in the non-Hermitian case, concerning the convergence of the Ritz values, this is not the case in general. However, some results are known about the eigenvectors; see, for instance, [1,2]. In fact, the Arnoldi process may even not converge at all before the very last iteration. One can construct matrices with a given spectrum and starting vectors such that the Ritz values at all iterations are prescribed at an arbitrary location in the complex plane; see [9]. It means that we can construct examples for which the Ritz values do not converge to the eigenvalues of A before the last step.

However, the matrices that can be built using this result have generally poor mathematical properties. In particular, they are not normal. In many practical cases, we do observe convergence of the Ritz values toward the eigenvalues. For understanding the convergence when it occurs, an interesting problem is to know where the location of the Ritz values is for a given matrix, in particular, for matrices with special properties like (real) normal matrices. Of course, it is well known that they are inside the field of values of A, which is defined as

w(A) = {[theta] | [theta] = [v.sup.*]Av, v [member of] [c.sup.n], [parallel]v[parallel] = 1}.

If the matrix A is normal, the field of values is the convex hull of the eigenvalues, and if the matrix is real, it is symmetric with respect to the real axis.

The inverse problem described in Carden's thesis [4] and the paper [7] is, given a matrix A and complex values [[theta].sub.1], ..., [[theta].sub.k], to know if there is a subspace of dimension k such that the values [[theta].sub.i] are the corresponding Ritz values. If we restrict ourselves to Krylov subspaces and the Arnoldi algorithm, this amounts to know if there is a unit vector v such that the values [[theta].sub.i] are Ritz values for the Krylov subspace

[K.sub.k](A, v) = span{v, Av, ..., [A.sup.k-1]v}.

A closely related problem has been considered for normal matrices by Bujanovic [3]. He was interested in knowing what the location of the other Ritz values is if one fixes some of the Ritz values in the field of values of A. He gave a necessary and sufficient condition that characterize the set of k complex values occurring as Ritz values of a given normal matrix. Carden and Hansen [7] also gave a condition that is equivalent to Bujanovic's. For normal matrices and k = n - 1, see [14], and for general matrices, see [6].

In this paper we first give a necessary and sufficient condition for a set of complex values [[theta].sub.1], ..., [[theta].sub.k] to be the Arnoldi Ritz values at iteration k for a given general diagonalizable matrix A. This generalizes Bujanovic's condition. Then we restrict ourselves to real normal matrices and real starting vectors. We particularly study the case k = 2, for which we characterize the boundary of the region in the complex plane contained in W (A), where pairs of complex conjugate Ritz values are located. We give several examples with computations of the boundary for real normal matrices of order up to 8. Finally, after describing some numerical experiments with real random starting vectors, we state some conjectures and open problems for k > 2 for real normal matrices in Section 7. The aim of this section, which provides only numerical results, is to motivate other researchers to look at these problems.

The paper is organized as follows. In Section 2 we study the matrices [H.sub.k] and characterize the coefficients of their characteristic polynomial. Section 3 gives expressions for the entries of the matrix M = K * K, where K is the Krylov matrix, as a function of the eigenvalues and eigenvectors of A for diagonalizable matrices. This is used in Section 4, where we give a necessary and sufficient condition for a set of k complex numbers to be the Arnoldi Ritz values at iteration k for diagonalizable matrices. The particular case of normal matrices is studied in Section 5. The case of A being real normal and k = 2 is considered in Section 6, in which we characterize the boundary of the region where pairs of complex conjugate Ritz values are located. Open problems and conjectures for k > 2 and real normal matrices are described in Section 7. Finally we give some conclusions.

2. The matrix [H.sub.k] and the Ritz values. In this section, since the Ritz values are the eigenvalues of [H.sub.k], we are interested in characterizing the matrix [H.sub.k] and the coefficients of its characteristic polynomial. It is well known (see [9, 15]) that the matrix H can be written as H = [UCU.sup.-1], where U is a nonsingular upper triangular matrix such that K = VU with K = [v Av ... [A.sup.n-1]v] and C is the companion matrix corresponding to the eigenvalues of A. We have the following theorem which characterizes [H.sub.k] as a function of the entries of U.

Theorem 2.1 ([10]). For k < n, the Hessenberg matrix [H.sub.k] can be written as [H.sub.k] = [U.sub.k][C.sup.(k)][U.sup.-1.sub.k], with [U.sub.k], the principal submatrix of order k of U, being upper triangular and [C.sup.(k)] = [E.sub.k] + [0 [U.sup.-1.sub.k][U.sub.[1:k],k+1]], a companion matrix where [E.sub.k] is a square down-shift matrix of order k,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Moreover, the subdiagonal entries of H are [h.sub.j+1,j] = [U.sub.j+1,j+1]/[U.sub.j,j], j = 1, ..., n - 1.

Clearly the Ritz values at step k are the eigenvalues of [C.sup.(k)]. We see that they only depend on the matrix U and its inverse. They are also the roots of the monic polynomial defined below. By considering the inverse of an upper triangular matrix, we note that the last column of [C.sup.(k)] can be written as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Hence, up to a multiplying coefficient, the last column of [C.sup.(k)] is obtained from the k first components of the (k + 1)st column of the inverse of U. The last column of [C.sup.(k)] gives the coefficients of the characteristic polynomial of [H.sub.k]. Let

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The Ritz values are the roots of the polynomial [q.sup.k] ([lambda]) = [[lambda].sup.k] + [[summation].sup.k-1.sub.j=0] [[lambda].sup.j] = [[PI].sup.k.sub.i=1] ([lambda] - [[theta].sup.(k).sub.i]).

Since the entries of U and [U.sup.-1] are intricate functions of the eigenvalues and eigenvectors of A, the following theorem provides a simpler characterization of the coefficients of the characteristic polynomial of [H.sub.k].

Theorem 2.2. Let M = K * K, where K is the Krylov matrix. The vector of the coefficients of the characteristic polynomial of [H.sub.k], denoted as [[[beta].sup.(k).sub.0], ..., [[beta].sup.(k).sub.k-1]], is the solution of the linear system,

(2.1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where [M.sub.k] = [U*.sub.k][U.sub.k].

Proof. From what we have seen above, the proof is straightforward. We have

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Multiplying by [U*.sub.k], we obtain

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Clearly [U*.sub.k][U.sub.[1:k],k+1] = [M.sub.[1:k],k+1].

Therefore it is interesting to consider the matrix M = K*K = U*U and its principal submatrices. This is done in the next section.

3. The matrix M. In this section we characterize the entries of M = U* U = K*K as functions of the eigenvalues and eigenvectors of A and of the starting vector v for diagonalizable matrices A.

Theorem 3.1. Let the spectral decomposition of A be A = X [lambda] [X.sup.-1] with the eigenvalues [[lambda].sub.i], i = 1, ..., n. The entries of M = U*U are given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

with c = [X.sup.-1]v and [[bar.[lambda]].sub.i] denoting the complex conjugate of [[lambda].sub.i]. If the matrix A is normal, we have the simpler expression,

[M.sub.l,m] = [n.summation over (i=l)], [[absolute value of ([c.sub.i])].sup.2] [[bar.[lambda]].sup.l-1.sub.i][[lambda].sup.m-1.sub.i], l, m = 1, ..., n,

with c = X*v.

Proof. Since we assumed that the matrix A is diagonalizable with eigenvalues [[lambda].sub.i], we have

K = X [c [LAMBDA]c ... [[LAMBDA].sup.n-1]c],

where c = [X.sup.-1]v. Let [D.sub.c] be the diagonal matrix with diagonal entries [c.sub.j], j = 1, ..., n. The matrix K is

K = X[D.sub.c] V

with the Vandermonde matrix

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

We note that this factorization of the Krylov matrix has been used in [11]; see also [22]. Therefore M = K*K = V*[D.sub.[bar.c]][X.sup.*]X[D.sub.c]V. If A is normal, [X.sup.*]X = I and M = [V.sup.*][D.sub.[omega]]V

with [[omega].sub.j] = [[absolute value of ([c.sub.j])].sup.2]. The entries of M can be obtained as functions of the eigenvalues and eigenvectors of A by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

If A is normal, we have X*X = I and

[M.sub.l,m] = [n.summation over (i=1)] [[absolute value of ([c.sub.i])].sup.2] [[bar.[lambda]].sup.l-1.sub.i][[lambda].sup.m-1.sub.i].

This last result is already known from [20].

4. The inverse problem for diagonalizable matrices. For the first Arnoldi iteration (that is, k = 1) the inverse problem is always solvable. We have [h.sub.1,1] = [v.sup.*]Av. For [[theta].sup.(1)] [member of] W(A), there exists a vector v such that [[theta].sup.(1)] = [v.sup.*]Av. Algorithms for computing such vectors are given in [5, 8, 16]. We note that if A and v are real, the first Ritz value [[theta].sup.(1).sub.1] is real.

For the inverse problem at the Arnoldi iteration k > 1, we assume that we have a set of k given complex numbers [[theta].sub.1], ..., [[theta].sub.k] belonging to W(A), and we would like to find (if possible) a vector v of unit norm such that the values [[theta].sub.j] are the Ritz values at iteration k when running the Arnoldi algorithm with (A, v).

From (2.1) we have an equation relating the coefficients of the characteristic polynomial of [H.sub.k] and the entries of a submatrix of M. Since the Ritz values are zeros of the polynomial [[lambda].sup.k] + [[summation].sup.k-1.sub.j=0] [[beta].sup.(k).sub.j] [[lambda].sup.j] = [[PI].sup.k.sub.i=1]([lambda] - [[theta].sub.i]), the coefficients [[beta].sup.(k).sub.j] are (up to the sign) elementary symmetric functions of the numbers [[theta].sub.j]. Therefore,

(4.1) [[beta].sup.(k).sub.j] = [(-1).sup.k-j] [e.sub.(k-j)]([[theta].sub.1], ..., [[theta].sub.k]), j = 0, ..., k - 1,

with

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Thus, we have the following characterization of the existence of a starting vector.

Theorem 4.1. There exists a starting vector v = Xc of unit norm such that [[theta].sub.1], ..., [[theta].sub.k] are the Arnoldi Ritz values at iteration k if and only if the nonlinear system (2.1) with the unknowns [c.sub.j], j = 1, ..., n, (where the coefficients [[beta].sup.(k).sub.j] are defined by (4.1)), to which we add the equation

(4.2) [n.summation over (i,j=1)] [[bar.c].sub.i][c.sub.j][(X * X).sub.i,j] = 1,

has at least one solution vector c.

Proof. Let us assume that there exists a vector v such that [[theta].sub.1], ..., [[theta].sub.k] are the Arnoldi Ritz values at iteration k. They are the roots of the characteristic polynomial whose coefficients [[beta].sup.(k).sub.j] are given by (4.1). Hence, by Theorem 2.2, the coefficients are solution of the linear system (2.1) and the vector c is a solution of the nonlinear system defined by (2.1) plus (4.2) because the vector v is of unit norm.

Conversely, if there is a solution c to the nonlinear system (2.1)-(4.2), then there exists a solution of the linear system (2.1) with the unknowns [[beta].sup.(k).sub.j], which, by Theorem 2.2, are the coefficients of the characteristic polynomial of [H.sub.k], and the complex numbers defined as the roots of the polynomial are the Ritz values at Arnoldi iteration k.

To make things clear, let us consider the case k = 2 with [[theta].sub.1] = [[theta].sup.(2).sub.1], [[theta].sub.2] = [[theta].sup.(2).sub.2] given. Let P = [[theta].sub.1][[theta].sub.2] and s = [[theta].sub.1] + [[theta].sub.2] be known. We note that [M.sub.2] is an Hermitian matrix. Then (2.1) is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Therefore, we have the two equations,

p - s[M.sub.1,2] = -[M.sub.1,3], s[M.sub.2,2] = [M.sub.2,3] + p[bar.[M.sub.1,2]].

The equations to be satisfied are

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Since we need to find a vector v of unit norm, we have to add the condition [[parallel]Xc[parallel].sup.2] = [c.sup.*][X.sup.*]Xc = 1, which yields the equation

[n.summation over (i,j=1)] [[bar.c].sub.i][c.sub.j][([X.sup.*]X).sub.i,j] = 1.

Because s and p are known, these are three nonlinear complex equations in n complex unknowns [c.sub.i], i = 1, ..., n. Whether or not this system has solutions determines if [[theta].sub.1] and [[theta].sub.2] are feasible values since, if a solution c exists, we can then find a vector v such that the two given values [[theta].sub.1] and [[theta].sub.2] are Ritz values for [K.sub.2](A, v).

We remark that this is in general not a polynomial system because of the conjugacy in the expression [[bar.c].sub.i][c.sub.j]. However, we can convert this system into a polynomial system by considering the real and imaginary parts of c; as unknowns. We have then a polynomial system of six equations in 2n unknowns with complex coefficients that can be converted to a polynomial system with real coefficients by taking the real and imaginary parts. The trouble then is that we have to know if there are real solutions. Unfortunately there are not many results about this problem in algebraic geometry literature. The situation is much simpler if we assume that the matrix A is normal. This case is considered in the next section.

5. The inverse problem for normal matrices. For a normal matrix and assuming that we know the coefficients [[beta].sup.(k).sub.0], ..., [[beta].sup.(k).sub.k-1], we obtain a (k + 1) x n linear system for the moduli squared, [[omega].sub.i] = [[absolute value of ([c.sub.i])].sup.2]. It yields a linear system

[C.sub.C][omega] = [f.sub.C].

Putting the normalizing equation [[summation].sup.n.sub.i=1][[omega].sub.i] = 1 wi = 1 first, the entries of [C.sub.C] are all 1 in the first row. The entries of the second row are

[([C.sub.C]).sub.2,m] = [k-1.summation over (i=1)] [[beta].sup.(k).sub.i][[lambda].sup.i.sub.m] + [[lambda].sup.k.sub.m], m = 1, ..., n,

and the other entries are

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The right-hand side is all zero exceptforthe first two components, [([f.sub.C]).sub.1] = 1, [([f.sub.C]).sub.2] = -[[beta].sup.(k).sub.0]. We can also turn this linear system of k + 1 complex equations in n real unknowns into a real linear system by taking the real and imaginary parts of rows 2 to k. It gives a (2k +1) x n matrix Cr , and the right-hand side is zero except for the first three components [([f.sub.R]).sub.1] = 1, [([f.sub.R]).sub.2] = -Re[[[beta].sup.(k).sub.0])], [([f.sub.R]).sub.3] = -Im[[[beta].sup.(k).sub.0]].

Compared to the case of a general diagonalizable matrix studied in the previous section, there are good and bad things. The good thing is that we have a linear system for the unknowns w* instead of a nonlinear one. The bad thing is that we need to find a solution which is real and positive. Obtaining (if possible) a real solution is easy by solving [C.sub.R][omega] = [f.sub.R], but we still need a positive solution. The characterization of [[theta].sub.1], ..., [[theta].sub.k] being feasible is given in the following theorem.

Theorem 5.1. Let A be a normal matrix. There exists a starting vector v = Xc of unit norm such that [[theta].sub.1], ..., [[theta].sub.k] are the Arnoldi Ritz values at iteration k if and only if the linear system [C.sub.R][omega] = [f.sub.R], where the coefficients [[beta].sup.(k).sub.j] are defined by (4.1), has at least one solution vector [omega] with [[omega].sub.i] [greater than or equal to] 0, i = 1, ..., n. Then c is any vector such that [[absolute value of ([c.sub.i])].sup.2] = [[omega].sub.i].

Proof. The proof is similar to that of Theorem 4.1.

The condition given in Theorem 5.1 must be equivalent to the condition recently proposed by Bujanovic ([3, Theorem 4]).

For further use let us write down the equations for k = 2. We have

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The problem can be further simplified if the matrix A and the starting vector are real. To the best of our knowledge, this case has not been considered by other authors. Then the eigenvalues of A are real or occur in complex conjugate pairs. If the starting vector v is real, all computed results are real in the Arnoldi algorithm (in particular the matrix H) and the Ritz values are real or appear as complex conjugate pairs which are the roots of a polynomial with real coefficients [[beta].sup.(k).sub.j]. The two eigenvectors of A corresponding to a complex conjugate pair are conjugate, and the eigenvectors corresponding to real eigenvalues are real. Then, with v being real, if c = X*v and [[lambda].sub.i] = [[bar.[lambda]].sub.j], we have [c.sub.i] = [[bar.c].sub.j]. This means that when the Ritz values are known, we have only one unknown [c.sub.i] for each pair of complex conjugate eigenvalues. Let us assume that the matrix A has [p.sub.C] pairs of complex conjugate eigenvalues (with 2[p.sub.C] [less than or equal to] n) that are listed first and n - 2[p.sub.C] real eigenvalues denoted by ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]). Then, we have only n - [p.sub.C] unknowns that, to avoid some confusion, we denote by their initial indices ranging from 1 to n as usual for eigenvalues. That is, the unknowns are the components of the vector

(5.1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Then, in the equations derived from the matrix M, we have to group the terms containing [[lambda].sub.i] and [[bar.[lambda]].sub.i]. Since only real numbers are involved, we denote the matrix as [C.sub.R] even though it is different from the matrix described above. The first row of the matrix [C.sub.R] is now

(5.2) [([C.sub.R]).sub.1,m] 2, m = 1, ..., [p.sub.C], (Cr) [([C.sub.R]).sub.1,m] = 1, m = [p.sub.C] + 1, ..., n - [p.sub.C].

The second row is

(5.3) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

and the other entries are

(5.4) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The right-hand side is all zero except for the first two components, [([f.sub.R]).sub.1] = 1, [([f.sub.R]).sub.2] = - [[beta].sup.(k).sub.0]. Therefore, the real matrix [C.sub.R] is only of size (k + 1) x (n - [p.sub.C]), and we have n - [p.sub.C] unknowns. For k = 2 and with s = [[theta].sub.1] + [[theta].sub.2], p = [[theta].sub.1][[theta].sub.2], the second row is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

and the other entries are

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

To find out if there exist a positive solution, we have to consider the cases k + 1 > n - [p.sub.C] (overdetermined system), k + 1 = n - [p.sub.C] (square system), and k + 1 < n - [p.sub.C] (underdetermined system). When we have a positive solution, we can find a real vector c by expanding the solution and taking square roots and finally obtain a real starting vector v = [X.sub.c]. The previous discussion is summarized in the following theorem.

THEOREM 5.2. Let A be a real normal matrix. There exists a real starting vector v = Xc of unit norm such that [[theta].sub.1], ..., [[theta].sub.k], where these values are real or occur in complex conjugate pairs, are the Arnoldi Ritz values at iteration k if and only if the linear system [C.sub.R][??] = [f.sub.R], where the coefficients [[beta].sub.j.sup.(k)] are defined by (4.1), the matrix [C.sub.R] is defined by (5.2)-(5.4), and [??] by (5.1), has at least one solution vector [??] with [[??].sub.i] [greater than or equal to] 0, for i = 1, ..., n - [p.sub.C]. Then c is any real vector such that [[absolute value of [c.sub.i]].sup.2] = [[omega].sub.i] where [omega] is given by the expansion of [??].

Let us now consider the problem of finding a positive solution in the case that the linear system [C.sub.R][??] = [f.sub.R] is underdetermined, that is, k + 1 <n - pC. Solutions of a system like this can be found by using the Singular Value Decomposition (SVD). Let us consider the generic case where [C.sub.R] has full rank k + 1. The matrix can be factorized as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The orthonormal matrix [??] is of order k + 1 as well as D, and [??] is of order n - [p.sub.C]. The diagonal of D contains the singular values. Since all the singular values are non-zero, we can find solutions to

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The solutions are given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where the symbol x denotes an arbitrary real number. Let us decompose the matrix [??] as [??] = [[[??].sub.1] [[??].sub.2]] with [[??].sub.1] having k +1 columns. Then, we have a positive solution if and only if there exists a vector z such that

(5.5) -[[??].sub.2]z [less than or equal to] [[??].sub.i]y

and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

To check if there is a solution to the system of inequalities (5.5), we use the algorithm described in [12] that was intended to convert a system of linear inequalities into a representation using the vertices of the polyhedron defined by the inequalities. It relies on computing the rank of submatrices and tells us if the system is feasible or not.

6. The case A real normal and k = 2. In this section we further simplify the problem and concentrate on the case k = 2 for a real normal matrix and a real starting vector. The matrix [H.sub.2] is real and has either two real eigenvalues or a pair of complex conjugate eigenvalues. We are interested in the latter case for which we have [[theta].sub.2] = [[[bar.theta]].sub.1] . Hence, it is enough to look for the location of the complex Ritz value [[theta].sub.1] and this considerably simplifies the problem. We call the set of all the complex values [[theta].sub.1] in the field of values yielding a positive solution the feasible region. To obtain a graphical representation of the feasible region we can proceed as in Bujanovic's paper [3]. We set up a regular Cartesian mesh over the field of values (in fact over the smallest rectangle containing the upper part, y [greater than or equal to] 0, of the field of values) of A for the values of [[theta].sub.1], and we check if there are positive solutions to the 3 x (n - [p.sub.C]) linear system [C.sub.R][??] = [f.sub.R] for each value of [[theta].sub.1] = (x, y) in the mesh by considering the system of inequalities (5.5). When the system is feasible for a given value of [[theta].sub.1] on the mesh, we flag this location. Hence, for each [[theta].sub.1] in the marked area, we can find a real vector v such that [[theta].sub.1], [[theta].sub.2] = [[[bar.theta]].sub.1] are the Ritz values at iteration 2. This gives an approximation of the feasible region. For [[theta].sub.1] outside of the feasible region, there does not exist a real vector v that yields ([[theta].sub.1], [[[bar.theta]].sub.1]) as Arnoldi Ritz values at iteration 2. Of course this way of detecting the feasible location of [[theta].sub.1] by discretizing the field of values has some drawbacks since some tiny feasible parts may be missing if the discretization is not fine enough.

Figures 6.1 and 6.2 display an example (Example 1) of a matrix of order 4 with two real eigenvalues on each side of the real part of a pair of complex conjugate eigenvalues. More precisely, in Example 1 the matrix A is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The eigenvalues of A are

[[[lambda].sub.1], [[bar.[lambda]].sub.1], [[lambda].sub.3], [[lambda].sub.4]] = [-0.432565 + 1.66558i, -0.432565 - 1.66558i, 0.187377, -1.20852] .

In this example the matrix [C.sub.R] is square of order 3 since n - [p.sub.C] = 4 -1 = 3 and nonsingular. The field of values is shown in red and the eigenvalues of A are the red circles. The feasible values of [[theta].sub.1] (respectively [[theta].sub.2]) are marked with blue (respectively red) crosses. In this example the feasible region is a surface in the complex plane. In this case it is connected and convex (if we add the real values inside the region), but we will see later that this is not always the case. Of course, we can also have two real Ritz values outside this region. In this example it seems that the real Ritz values can be anywhere in the interval defined by the two real eigenvalues of A. Figure 6.2 was obtained by using 700 random real starting vectors and running the Arnoldi algorithm. We see that we obtain approximately the same shape for the feasible region.

Since we may miss some parts of the feasible region due to a too coarse discretization, it is interesting to characterize its boundary. This can be done by explicitly writing down the inverse of the matrix [C.sub.R] and looking at the inequalities given by the positivity constraints for [[omega].sub.j]. It corresponds to the elimination of the components of [omega] in the equations. For simplicity let us denote

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[FIGURE 6.1 OMITTED]

[FIGURE 6.2 OMITTED]

The inverse is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

We apply the inverse to the right-hand side (which, after a change of signs, is [[1 p 0].sup.T], p = [absolute value of [[[theta].sub.1]].sup.2]), and we get

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

We are interested in the components of [omega] being positive. The outside region of the feasible region is characterized by the fact that at least one component [[omega].sub.j] is negative. Therefore the boundary must be given by some of the components of the solution being zero. Hence, we have to look at the three equations

[FIGURE 6.3 OMITTED]

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The coefficients a, b, c, d, e, f are functions of the unknowns quantities s = 2x = 2Re([[theta].sub.1]) and p = [x.sup.2] + [y.sup.2] = [absolute value of [[[theta].sub.1]].sup.2]. These equations define three curves in the (x, y) complex plane. Some (parts) of these curves yield the boundary of the feasible region for [[theta].sub.1]. However, we note that one component can be zero on one of the curves without changing sign. Therefore, not all the curves might be relevant for the boundary. We just know that the boundary is contained in the union of the curves. Moreover, we are only interested in the parts of the curves contained in the convex hull of the eigenvalues of A. For completeness, remember that we have

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The first curve involves only the real eigenvalues of A. The two other curves pass through [[lambda].sub.1] and [[[bar.lambda]].sub.1].

Figure 6.3 displays the boundary curves that we obtain for Example 1 as well as the approximation of the feasible region using a smaller number of discretization points than before. These curves were obtained using a contour plot of the level 0 for the three functions of x and y. We see that we do get the boundary of the feasible region for [[theta].sub.1]. We have only two curves since the one which depends only on the real eigenvalues (corresponding to the first equation) does not have points in the window we are interested in (that is, the field of values).

Let us now consider n = 5. The matrix A has either two pairs of complex conjugate eigenvalues ([[lambda].sub.1], [[[bar.lambda]].sub.1]), ([[lambda].sub.3], [[lambda].sub.3]) and a real eigenvalue [[lambda].sub.4] (that can be on the boundary or inside the field of values) or one pair of complex conjugate eigenvalues and three real eigenvalues (there is at least one inside the field of values, eventually two). In the first case the matrix [C.sub.R] is of size 3 x 3 as it was for the previous example with n = 4. Then, we can apply the same techniques by eliminating the components of [omega] to obtain the boundary of the feasible region for [[theta].sub.1]. The only differences are that [([C.sub.R]).sub.1,2] = 2 and the values of the coefficients. We have

[FIGURE 6.4 OMITTED]

[FIGURE 6.5 OMITTED]

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The equations are

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

These equations only differ slightly from those above by a multiplicative factor of 2 at some terms. Figure 6.4 displays the feasible region and the boundary for a case with two pairs of complex conjugate eigenvalues and a real eigenvalue inside the field of values (Example 2).

In this example the feasible region is neither connected nor convex. Figure 6.5 shows that random starting vectors do not always yield a good rendering of the feasible region. In Example 2 the matrix is

[FIGURE 6.6 OMITTED]

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The eigenvalues of A are

[- 0.178102 + 0.498599i, -0.178102 - 0.498599i, 1.5788 + 0.584391i, 1.5788 - 0.584391i, 1.00762 ].

Figure 6.6 displays an example with only one pair of complex conjugate eigenvalues and three real eigenvalues with two of them on the same side of the real part of the complex eigenvalues (Example 3). We see that we have one piece of the curve which is inside the feasible region. It can be considered a "spurious" curve (even though we will see later that these curves can also have some interest). The matrix of Example 3 is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The eigenvalues of A are

[-0.600433 + 2.06392i, -0.600433 - 2.06392i, -1.40594, 1.56985, 0.161981].

For n larger than 5, the problem of computing the boundary is more complicated. We generally have more than 3 unknowns (except for n = 6 with three pairs of complex conjugate eigenvalues) and therefore an underdetermined linear system for the unknowns [[omega].sub.j]. When prescribing a value of [[theta].sub.1] (with [[theta].sub.2] = [[[bar.theta]].sub.1]), as we have seen before, we can check the feasibility by using the SVD of the rectangular matrix [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Concerning the boundary of the feasible region, the pieces of the boundary correspond to some of the components of [omega] being zero. Therefore, we can apply the same elimination technique as before by considering the matrices of order 3 corresponding to all the triples of eigenvalues, a pair of complex conjugate ones counting only for one. It corresponds to considering only three components of [omega] putting the other components to zero. We have to consider 3 x 3 matrices similar as the ones we had before with the first row being (2,2,2), (2,1,1), or (1,1,1). The number of curves is three times the number of triples of eigenvalues.

Doing this corresponds to the handling of linear constraints in linear programming (LP) whose solution components must be positive. Let us assume that we have linear equality constraints [C.sub.x] = b defined by a real m x n matrix C of full rank with m < n. This procedure just amounts to taking m independent columns of C, putting the other components of the solution to zero, and solving. By possibly permuting columns, we can write C = [B E] with B nonsingular of order m. Then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

is called a basic solution. It is degenerate if some components of [B.sup.-1]b are zero. A basic feasible solution (BFS) is a basic solution that satisfies the constraints of the LP. The feasible region defined by the constraints is convex, closed, and bounded from below. The feasible region is a polyhedron, and it can be shown that the BFS are extreme points (vertices or corners) of the feasible region.

This is similar to what we are doing. We have a polyhedron in the [omega]-space defined by the system with the matrix [C.sub.R], consider all the 3 x 3 matrices (provided they are nonsingular), and symbolically compute the basic solutions. The feasible ones (with [[omega].sub.j] [greater than or equal to] 0) correspond to some vertices of the polyhedron. Clearly these curves are located where components of [omega] may change signs as a function of x = Re([[theta].sub.1]) and y = Im([[theta].sub.1]). They also give a parametric description of the vertices of the polyhedron.

Figure 6.7 corresponds to an example with n = 6 and three pairs of complex conjugate eigenvalues (Example 4). In this example the matrix [C.sub.R] is square of order 3, [omega] has only three components, and there is no spurious curve. We see that the shape of the feasible region can be quite complicated. The matrix A of Example 4 is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The eigenvalues of A are

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Figure 6.8 corresponds to an example with two pairs, one real eigenvalue on the boundary of the field of values, and one real eigenvalue inside (Example 5). We have two spurious curves. The matrix A is

[FIGURE 6.7 OMITTED]

[FIGURE 6.8 OMITTED]

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The eigenvalues of A are

[-0.432565 + 1.66558i, -0.432565 - 1.66558i, 1.19092 + 1.18916i, 1.19092 - 1.18916i, -1.20852, 0.187377.]

To visualize the feasible region for [[theta].sub.1], it is useful to get rid of the "spurious" curves. This can be done approximately in the following way. We can compute points on the curves by solving an equation in x for a given value of y (or vice-versa) for each equation defining the boundary. When we get a point on a curve, we can check points surrounding it in the complex plane. If there is at least one of those points which is not feasible, then our given point is on the boundary and the piece of the curve on which it is located is a part of the boundary. This is not always easy because of rounding errors and because we could have some curves which are almost tangent to each other. Of course this process is not foolproof since the result depends on the choice of the surrounding points and also on some thresholds. But in many cases it works fine. Figure 6.9 shows what we get for the previous example. The blue stars are the points computed on the boundary (using the Matlab function fzero). Note that we get rid of the two spurious curves since we keep only the curves on which there is at least one boundary point.

[FIGURE 6.9 OMITTED]

There is another way to visualize the boundary of the feasible region in Matlab. The contour function that we use is evaluating the function on a grid and then finding the curve of level 0 by interpolation. Therefore, we can set up a routine that, given x and y, computes a solution of the underdetermined system for the point (x + iy, x - iy) using the SVD. If the point is not feasible, then we return a very small negative value. However, this process is very expensive since the evaluation of the function cannot be vectorized. An example is given below in Figure 6.11 for the next Example 6. Of course we do not have spurious curves and not even the parts of the curves that are not relevant. But we have some wiggles in the curve because we set the values for non-feasible points to a small negative value introducing discontinuities in the function values.

Figure 6.10 displays an example with two pairs (one inside the field of values) and two real eigenvalues (Example 6). The feasible region has an interesting shape. Figure 6.11 shows the boundary for Example 6 computed using the SVD. The matrix is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The eigenvalues of A are

[-1.2413 + 3.27037i, -1.2413 - 3.27037i, 0.566382 + 0.768588i, 0.566382 - 0.768588i, 0.917215, 1.82382].

[FIGURE 6.10 OMITTED]

[FIGURE 6.11 OMITTED]

Figure 6.12 displays an example with n = 8 (Example 7). We can see that (unfortunately) we have many spurious curves that are useless for the boundary. On the right part of Figure 6.12, we got rid of some of these curves but not all of them. The matrix of Example 7 is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The eigenvalues of A are

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

We remark that, using the same technique as before, we can compute the boundary of the feasible region for [[theta].sub.2] when [[theta].sub.1] is prescribed for k = 2 and for complex normal matrices. Here we have to consider basic solutions for the real matrix which is of size 5 x n. Hence, we compute the solutions for all 5 x 5 matrices extracted from the system for [omega]. In this case we compute the solutions numerically and not symbolically for a given point (x, y). Then we check that the curves are indeed parts of the boundary using the same perturbation technique as before. We consider the problem of Bujanovic [3, Figure 1 (b)]. The eigenvalues of A are

[FIGURE 6.12 OMITTED]

[FIGURE 6.13 OMITTED]

[-5, -3 + 2i, -3 - 2i, 4 + i, 4 - i, 6].

We fix [[theta].sub.1.sup.(2)] = [[theta].sub.1] = -4. The boundary of the feasible region for 02 for this particular value of [[theta].sub.1] is displayed in Figure 6.13.

One can compare with [3] and see that we indeed find the boundary of the region for [[theta].sub.2]. However, such regions do not give a good idea of the location of the Ritz values because we would have to move [[theta].sub.1] all over the field of values to see where the Ritz values can be located. Figure 6.14 displays the location of the Ritz values for k = 2 to 5 when running the Arnoldi method with a complex diagonal matrix with the given eigenvalues and random real starting vectors. We see that we have Ritz values almost everywhere. Things are strikingly different if we construct a real normal matrix with the given eigenvalues (which are real or occur in complex conjugate pairs) and run the Arnoldi method with real starting vectors. The Ritz values are shown in Figure 6.15. We see that they are constrained in two regions of the complex plane and on the real axis. Of course things would have been different if we would have used complex starting vectors. The Ritz values would have looked more like those in Figure 6.14. There is much more structure in the feasible region if everything is real-valued.

[FIGURE 6.14 OMITTED]

[FIGURE 6.15 OMITTED]

7. Open problems and conjectures for k > 2 and real normal matrices. In this section we describe some numerical experiments with k > 2 for real normal matrices. We also state some open problems and formulate some conjectures. We are interested in the iterations k = 3 to k = n - 1. We would like to know where the Ritz values are located when using real starting vectors. Clearly we cannot do the same as for k = 2 because, for instance, for k = 3, we have either three real Ritz values or a pair of complex conjugate Ritz values and a real one. Of course, we can fix the location of the real Ritz values and look for the region where the pairs of complex conjugate Ritz values may be located, but this is not that informative since it is not practical to explore all the possible positions of the real Ritz values.

Let us do some numerical experiments with random starting vectors and first consider Example 6 of the previous section with n = 6. For each value of k = 2 to n - 1, we generate 700 random initial vectors of unit norm, and we run the Arnoldi algorithm computing the Ritz values at iteration k. In Figure 7.1 we plot the pairs in blue and red and the real eigenvalues in green for all the values of k, and we superimpose the boundary curves computed for k = 2. We observe that all the Ritz values belong to the feasible region that was computed for k = 2. We conjecture that this is true for any real normal matrix and a real starting vector. But there is more than that.

[FIGURE 7.1 OMITTED]

[FIGURE 7.2 OMITTED]

Figure 7.2 displays the Ritz values at iteration 4. We see that some of the Ritz values are contained in a region for which one part of the boundary is one piece of a curve that was considered as "spurious" for k = 2. Figure 7.3 shows the Ritz values at iteration 5 (that is, the next to last one); there is an accumulation of some Ritz values on this spurious curve as well as close to the other pair of complex conjugate eigenvalues. it seems that some of the spurious curves look like "attractors" for the Ritz values, at least for random real starting vectors. It would be interesting to explain this phenomenon.

Figures 7.4-7.8 illustrate results for Example 7 with n = 8. Here again we observe that the Ritz values are inside the boundary for k = 2 and, at some iterations, Ritz values are located preferably on or close to some of the spurious curves.

Another open question is if there exist real normal matrices for which the feasible region for k = 2 completely fill the field of values for real starting vectors. In this paper we concentrated on pairs of complex conjugate Ritz values, but an interesting problem is to locate the real Ritz values in the intersection of the field of values with the real axis.

[FIGURE 7.3 OMITTED]

[FIGURE 7.4 OMITTED]

[FIGURE 7.5 OMITTED]

[FIGURE 7.6 OMITTED]

[FIGURE 7.7 OMITTED]

[FIGURE 7.8 OMITTED]

Numerical experiments not reported here seem to show that the properties described above for the Arnoldi Ritz values are not restricted to the Arnoldi algorithm. For a real normal matrix, if one constructs a real orthogonal matrix V and defines H = [V.sup.T]AV, the Ritz values, being defined as the eigenvalues of [H.sub.k], the principal submatrix of order k of H, are also constrained in some regions inside the field of values of A. This deserves further studies.

8. Conclusion. In this paper we gave a necessary and sufficient condition for a set of complex values [[theta].sub.1], ..., [[theta].sub.k] to be the Arnoldi Ritz values at iteration k for a general diagonalizable matrix A. This generalizes previously known conditions. The condition stated in this paper simplifies for normal matrices and particularly for real normal matrices and real starting vectors. We studied the case k = 2 in detail, for which we characterized the boundary of the region in the complex plane contained in W (A), where pairs of complex conjugate Ritz values are located. Several examples with a computation of the boundary of the feasible region were given. Finally, after describing some numerical experiments with random real starting vectors, we formulated some conjectures and open problems for k > 2 for real normal matrices.

Acknowledgments. The author thanks J. Duintjer Tebbens for some interesting comments and the referees for remarks that helped improve the presentation.

REFERENCES

[1] M. BELLALIJ, Y. SAAD, AND H. SADOK, On the convergence of the Arnoldi process for eigenvalue problems, Tech. Report umsi-2007-12, Minnesota Supercomputer Institute, University of Minnesota, 2007.

[2] --, Further analysis of the Arnoldi process for eigenvalue problems, SIAM J. Numer. Anal., 48 (2010), pp. 393-407.

[3] Z. Bujanovic, On the permissible arrangements of Ritz values for normal matrices in the complex, Linear Algebra Appl., 438 (2013), pp. 4606-4624.

[4] R. CARDEN, Ritz Values andArnoldi Convergence for Non-Hermitian Matrices, PhD. Thesis, Computational and Applied Mathematics, Rice University, Houston, 2011.

[5] --, A simple algorithm for the inverse field of values problem, Inverse Problems, 25 (2009), 115019 (9 pages).

[6] R. CARDEN AND M. EMBREE, Ritz value localization for non-Hermitian matrices, SIAM J. Matrix Anal. Appl., 33 (2012), pp. 1320-1338.

[7] R. CARDEN and D. J. Hansen, Ritz values of normal matrices and Ceva 's theorem, Linear Algebra Appl., 438 (2013), pp. 4114-4129.

[8] C. CHORIANOPOULOS, P. PSARRAKOS, AND F. UHLIG, A method for the inverse numerical range problem, Electron. J. Linear Algebra, 20 (2010), pp. 198-206.

[9] J. DUINTJER Tebbens and G. Meurant, AnyRitz value behavior is possibleforArnoldi and for GMRES, SIAM J. Matrix Anal. Appl., 33 (2012), pp. 958-978.

[10] --, On the convergence of QOR and QMR Krylov methods for solving linear systems, in preparation.

[11] I. C. F. IPSEN, Expressions and bounds for the GMRES residual, BIT, 40 (2000), pp. 524-535.

[12] A. KovaCec AND B. RIBEIRO, Convex hull calculations: aMatlab implementation and correctness proofs for the lrs-algorithm, Tech. Report 03-26, Department of Mathematics, Coimbra University, Coimbra, Portugal, 2003.

[13] R. B. Lehoucq, D. C. Sorensen, and C.-C. Yang, Arpack User's Guide, SIAM, Philadelphia, 1998.

[14] S. M. MALAMUD, Inverse spectralproblemfor normal matrices and the Gauss-Lucas theorem, Trans. Amer. Math. Soc., 357 (2005), pp. 4043-4064.

[15] G. MEURANT, GMRES and theArioli, Ptak, and Strakos parametrization, BIT, 52 (2012), pp. 687-702.

[16] --, The computation of isotropic vectors, Numer. Algorithms, 60 (2012), pp. 193-204

[17] C. C. PAIGE, The Computation of Eigenvalues and Eigenvectors of Very Large Sparse Matrices, PhD. Thesis, Institute of Computer Science, University of London, London, 1971.

[18] --, Computational variants of the Lanczos method for the eigenproblem, J. Inst. Math. Appl., 10 (1972), pp. 373-381.

[19] --, Accuracy and effectiveness of the Lanczos algorithmfor the symmetric eigenproblem, Linear Algebra Appl., 34 (1980), pp. 235-258.

[20] B. N. PARLETT, Normal Hessenberg and moment matrices, Linear Algebra Appl., 6 (1973), pp. 37-43.

[21] --, The Symmetric Eigenvalue Problem, Prentice Hall, Englewood Cliffs, 1980.

[22] H. SADOK, Analysis of the convergence of the minimal and the orthogonal residual methods, Numer. Algorithms, 40 (2005), pp. 201-216.

GERARD MEURANT ([dagger])

* Received September 30, 2013. Accepted November 26, 2014. Published online on January 23, 2015. Recommended by H. Sadok.

([dagger]) 30 rue du sergent Bauchat, 75012 Paris, France (gerard.meurant@gmail.com).
COPYRIGHT 2014 Institute of Computational Mathematics
No portion of this article can be reproduced without the express written permission from the copyright holder.