# THE EXTENDED GLOBAL LANCZOS METHOD FOR MATRIX FUNCTION APPROXIMATION.

1. Introduction. Let A [member of] [R.sup.nxn] be a large, symmetric matrix, and let V [member of] [R.sup.nxs] be a block vector with 1 [less than or equal to] s << n. We are interested in approximating expressions of the form

(1.1) I(f) := trace([V.sup.T] f(A)V),

where f is a function that is defined on the convex hull of the spectrum of A. The need to evaluate bilinear forms (1.1) arises in various applications such as in network analysis (when f(t) := exp(t) or f(t) := [t.sup.3]) [3, 9], machine learning [17], electronic structure computation [2, 5, 19], and the solution of ill-posed problems [10, 11].

Bellalij et al. [3] observed that the right-hand side of (1.1) can be expressed as a Stieltjes integral as follows. Introduce the spectral factorization

(1.2) A = U[LAMBDA][U.sup.T], [LAMBDA] = diag[[[lambda].sub.1],..., [[lambda].sub.n]],

where [[lambda].sub.1],..., [[lambda].sub.n] denote the eigenvalues of A and the matrix U [member of] [R.sup.nxn] of eigenvectors is orthogonal. Substituting the spectral factorization (1.2) into the expression (1.1) and setting V = [U.sup.T]V gives

(1.3) [mathematical expression not reproducible],

where [alpha]([lambda]) is a nondecreasing real-valued piecewise constant distribution function with possible discontinuities at the eigenvalues [[lambda].sub.i] of A; d[alpha]([lambda]) is the associated measure.

Consider the global Krylov subspace

(1.4) [K.sup.m.sub.g](A, V) = span{V, AV,...,[A.sup.m-1] V} = {p(A)V : p [member of] [[PI].sub.m-1]},

where [[PI].sub.m-1] denotes the set of polynomials of degree at most m - 1. We tacitly assume here that the block vectors V, AV,..., [A.sup.m-1]V are linearly independent. This is the generic situation. An application of m steps of the global Lanczos method to A with initial block vector V then determines the decomposition

(1.5) A[V.sub.m] = [V.sub.m]([T.sub.m] [cross product] [I.sub.s]) + [[beta].sub.m+1][V.sub.m+1][E.sub.m.sup.T],

where the n x s block columns of the matrix [V.sub.m] = [[V.sub.1],..., [V.sub.m]] [member of] [R.sup.nxms] form an F-orthonormal basis for the global Krylov subspace (1.4), the initial block vector [V.sub.1] is proportional to V, and we assume that ms << n. The block columns [V.sub.1],..., [V.sub.m+1] are said to be F-orthonormal if

(1.6) [mathematical expression not reproducible]

The inner product (*, *)F is associated with the Frobenius norm; we have ||M[||.sub.F] := [(M, M).sub.F.sup.1/2] for any matrix M. Moreover, the matrix [T.sub.m] [member of] [R.sup.mxm] in (1.5) is symmetric and tridiagonal, [I.sub.s] [member of] [R.sup.sxs] denotes the identity matrix, [[beta].sub.m+1] [greater than or equal to] 0, and [E.sub.m] [member of] [R.sup.nxs] is the mth block axis vector, i.e., [E.sub.m] is made up of the columns (m - 1)s + 1,..., ms of [I.sub.n]. The superscrip[t.sup.T] denotes transposition, and [cross product] stands for the Kronecker product. We refer to [8, 15] for derivations of the global Lanczos decomposition (1.5). When the matrix A is large, the dominating cost of computing the decomposition (1.5) is the evaluation of m - 1 matrix-block-vector products with the matrix A.

It is attractive to approximate the expression (1.1) by

(1.7) [G.sub.m](f) := [||V||.sup.2.sub.F][e.sup.T.sub.1] f([T.sub.m])[e.sub.1],

because this approximation can be interpreted as an m-point Gauss quadrature rule for the Stieltjes integral (1.3). We have

(1.8) I(p) = [G.sub.m](p) [for all]p [member of] [[PI].sub.2m-1];

see Bellalij et al. [3] for details. The right-hand side of (1.7) can be written as a sum of m terms by a substitution of the spectral factorization of [T.sub.m]. This substitution shows that the Gaussian nodes are the eigenvalues of [T.sub.m] and the Gaussian weights are the squares of the first component of the normalized eigenvectors times the square Frobenius norm of V; see [3] for details. We remark that the main reason for using the approximation (1.7) of (1.1) is that it obviates the need to evaluate f(A), which may be prohibitively expensive when A is large. Instead, only the evaluation of f([T.sub.m]), where [T.sub.m] typically is a fairly small matrix, is required. Numerical methods for the accurate evaluation of f for a small matrix are described by Higham [13].

In the context of approximating expressions of the form f(A)b for some vector b [member of] [R.sup.n], Druskin, Knizhnerman, and Simoncini [7, 16] have shown that it may be possible to approximate such an expression more accurately when using an approximant from an extended Krylov subspace

[K.sup.m+1,m+1] (A, b) = span{[A.sup.-m]b,..., [A.sup.-1] b,b, Ab,..., [A.sup.m]b}

than when using an approximant in the standard Krylov subspace

[K.sup.2m+1](A, b) = span{b, Ab,..., [A.sup.2m]b}

of the same dimension 2 m + 1. We are interested in exploring whether approximations of (1.1) from an extended global Krylov subspace

(1.9) [K.sub.g.sup.m+1,m+1] (A, V) = span{V, [A.sup.-1] V, AV,..., [A.sup.-m]V, [A.sup.m]V} = {q(A)V, q = [p.sub.1]/[p.sub.2], [p.sub.1],[p.sub.2] [member of] [[PI].sub.m]}

give a higher accuracy than an approximant from the standard global Krylov subspace [K.sub.g.sup.2m+1](A, V) of the same dimension.

A possible drawback of the extended global Lanczos method is that the generation of an F-orthonormal basis for the subspace (1.9) requires the solution of m linear systems of equations with the matrix A and right-hand sides that are block vectors; see Algorithm 1 for details. When the matrix A is positive definite and banded with a small bandwidth or when the systems of equations can be solved efficiently by a multigrid method, the generation of an orthonormal basis for the subspace (1.9) may not be much more demanding than computing an F-orthonormal basis for the (standard) global Krylov subspace [K.sub.g.sup.2m+1](A, V) with the (standard) global Lanczos method; the latter method does not require the solution of linear systems of equations with the matrix A; see [3, 8, 15]. We illustrate in Section 4 that the extended global Lanczos method may give approximations of (1.1) of higher accuracy than the (standard) global Lanczos method when these methods use subspaces of the same dimension.

This paper is organized as follows. Section 2 introduces the extended global Lanczos process. The orthonormal basis generated by this method is expressed with the aid of orthonormal Laurent polynomials. Section 3 describes the application of the extended global Lanczos process to the approximation of matrix functions (1.1). We also discuss the analogue of property (1.8) for the approximation by Laurent polynomials. Section 4 presents a few numerical experiments that illustrate the quality of the computed approximations of (1.1). Concluding remarks can be found in Section 5.

2. The extended global Lanczos method. We describe the extended global Lanczos method for generating an orthonormal basis for the extended global Krylov subspace (1.9). We assume that the symmetric matrix A [member of] [R.sup.nxn] and the block vector V [member of] [R.sup.nxs] are such that the subspace (1.9) is of full dimension 2 m + 1. The extended global Lanczos method uses short recursion relations for generating an orthonormal basis for (1.9). The existence of short recursion relations follows from the fact that orthogonal Laurent polynomials with respect to an inner product determined by a nonnegative measure with support on (part of) the real axis satisfy short recursion relations. The latter was first observed by Njastad and Thron [18]. Applications in linear algebra have been described by Simoncini [20] and in [14]. Both the latter references are concerned with the situation when V in (1.9) is a single vector. Simoncini [20] also discusses the generation of an orthonormal basis for an extended Krylov subspace when the matrix A is not symmetric and V is a single vector. Then an orthonormal basis cannot be generated with short recurrence relations.

Heyouni [12] derives an extended global Arnoldi method for generating an orthonormal basis for an extended global Krylov subspace (1.9) when A [member of] [R.sup.nxn] is nonsymmetric and applies this method to solve continuous-time algebraic Riccati equations. The generation of the basis requires "long" recursion relations. We first describe properties and recursion formulas for this method and subsequently explore simplifications that arise when A is symmetric.

Following Heyouni [12], the recursion formulas for the extended global Arnoldi method can be written as

(2.1) [w.sub.1,1][V.sub.1] = V, [V.sub.2] = [A.sup.-1] V - [w.sub.1,2][V.sub.1], [w.sub.2,2][V.sub.2] = [V.sub.2],

where [w.sub.1,1] = ||V[||.sub.F], [w.sub.1,2] = ([A.sup.-1]V, [V.sub.1])F, [w.sub.2,2] = ||[V.sub.2]||F, and

(2.2) [mathematical expression not reproducible].

The coefficients [h.sub.i,j] are determined so that the block vectors [V.sub.1], [V.sub.2],..., [V.sub.2m+2] are F-orthonormal.

Before describing our proposed algorithm based on pairs of three-term recurrence relations, we provide some useful properties of the extended global Krylov subspaces generated. We will tacitly assume that a breakdown does not occur.

PROPERTIES 1. Let the matrix A [member of] [R.sup.nxn] be nonsingular, and let V [member of] [R.sup.nxs]. Then

[K.sub.g.sup.m+1,m](A, V) = [K.sup.g.sub.2m](A, [A.sup.-m]V), A[K.sub.g.sup.m+1,m](A, V) [[subset].bar] [K.sup.g.sub.2m](A, [A.sup.-m+1] V) [[subset].bar] [K.sub.g.sup.m+2,m+1] (A,V), [A.sup.-1] [K.sub.g.sup.m+1,m](A, V) [[subset].bar] [K.sup.g.sub.2m](A, [A.sup.-m-1] V) [[subset].bar] [K.sub.g.sup.m+2,m+1] (A, V).

Proof. The properties follow straightforwardly from the definition of the global Krylov subspace (1.4) and the extended global Krylov subspace (1.9).

PROPERTIES 2. Let the matrix A [member of] [R.sup.nxn] be nonsingular, and let V [member of] [R.sup.nxs]. The F-orthonormal basis [V.sub.1], [V.sub.2],..., [V.sub.2m+2] determined by the recursion formulas (2.1) and (2.2) satisfy, for j = 1, 2,..., m,

(2.3) A[V.sub.2j-1] [member of] span{[V.sub.1],..., [V.sub.2j+1]},

(2.4) A[V.sub.2j] [member of] span{[V.sub.1],..., [V.sub.2j+1]}, [A.sup.-1] [V.sub.2j-1] [member of] span{[V.sub.1],..., [V.sub.2j]}, [A.sup.-1] [V.sub.2j] [member of] span{[V.sub.1],..., [V.sub.2j+2]}.

Proof. These properties follow from equations (2.1), (2.2), and the formulas of Properties 1.

The existence of three-term recurrence relations for the block vectors [V.sub.1],..., [V.sub.2m] is a consequence of the following properties.

PROPERTIES 3. Let A [member of] [R.sup.nxn] be a nonsingular symmetric matrix, and let V [member of] [R.sup.nxs]. Then, for j = 1,2,...,m,

(2.5) A[V.sub.2j-1] [member of] span{[V.sub.2j-3], [V.sub.2j-2], [V.sub.2j-1], [V.sub.2j], [V.sub.2j+1]},

(2.6) A[V.sub.2j] [member of] span{[V.sub.2j-1], [V.sub.2j], [V.sub.2j+1]},

(2.7) [A.sup.-1][V.sub.2j-1] [member of] span{[V.sub.2j-2], [V.sub.2j-1], [V.sub.2j]},

(2.8) [A.sup.-1][V.sub.2j] [member of] span{[V.sub.2j-2], [V.sub.2j-1], [V.sub.2j], [V.sub.2j+1], [V.sub.2j+2]}.

Proof. The assertions require A to be symmetric, i.e., that [(AV, W).sub.F] = (AW, V)F with V,W [member of] [R.sup.nxs]. We first show (2.6). It follows from (2.4) that A[V.sub.2j] can be written as

[mathematical expression not reproducible],

where [[alpha].sub.1,2j], [[alpha].sub.2,2j],... , [[alpha].sub.2j+1,2j] are scalars. It suffices to show that the coefficients [[alpha].sub.1,2j,] [[alpha].sub.2,2j],... , [[alpha].sub.2j-2,2j] vanish. Since A is a symmetric matrix, we have

[[alpha].sub.2i-1,2j] = ([V.sub.2i-1], A[V.sub.2j]) F = (A[V.sub.2i-1], [V.sub.2j])F, [[alpha].sub.2i,2j] = ([V.sub.2i], A[V.sub.2j])F = [(A[V.sub.2i],[V.sub.2j]).sub.F],

and using (2.3) and (2.4), we obtain

A[V.sub.2i-1] [member of] span{[V.sub.1],..., [V.sub.2i+1]} and A[V.sub.2i] [member of] span{[V.sub.1],..., [V.sub.2i+1]}.

Due to the F-orthonormality of [V.sub.1],..., [V.sub.2j], the coefficients [[alpha].sub.1,2j],... , [[alpha].sub.2j-2,2j] vanish. It follows that

A[V.sub.2j] [member of] span{[V.sub.2j-1], [V.sub.2j], [V.sub.2j+1]}.

The properties (2.5), (2.7), and (2.8) can be shown in a similar fashion.

We are now in a position to describe the extended global Lanczos method. Notice that the algorithm that implements this method is based on Properties 2. To link the method to a sequence of orthonormal Laurent polynomials, we use (2.7) and (2.6). The properties (2.5) and (2.8) will be used implicitly in Propositions 3.1 and 3.2 below to determine the projected tridiagonal matrices [T.sub.2m] and [S.sub.2m]. The algorithm for the proposed extended global Lanczos method computes an F-orthonormal basis {[V.sub.1],..., [V.sub.2m]} by pairs of three-term recurrence relations. In order to be able to solve all required linear systems of equations with the matrix A in the algorithm, we require that A satisfies

trace([V.sup.T] AV) [not equal to] 0 [for all] V [member of] [R.sup.nxs].

A sufficient condition for this to hold is that A is symmetric positive definite. We will assume this to be the case.

The extended global Lanczos algorithm can be expressed as follows. After initialization

[V.sub.1] = V/[[delta].sub.1] with [[delta].sub.1] = ||V[||.sub.F], [[beta].sub.0][V.sub.0] = 0,

the algorithm is determined by the recursion relations which follow from (2.7) and (2.6), for j = 1,..., m:

(2.9) [[delta].sub.2j][V.sub.2j] = ([A.sup.-1] - [[beta].sub.2j-1] [I.sub.n]) [V.sub.2j-1] - [[beta].sub.2j-2][V.sub.2j-2], [[delta].sub.2j+2][V.sub.2j+1] = (A - [[alpha].sub.2j] [I.sub.n]) [V.sub.2j] - [[alpha].sub.2j-1][V.sub.2j-1],

with

[[beta].sub.i] = [([A.sup.-1] [V.sub.2j-1], [V.sub.i]).sub.F], for i [member of] {2j - 2,2j - 1}, [[alpha].sub.i] = [(A[V.sub.2j], [V.sub.i]).sub.F], for i [member of] {2j - 1, 2j}.

The [[delta].sub.i] > 0 are normalization coefficients such that ||[V.sub.i][||.sub.F] = 1 for i [member of] {2j, 2j + 1}. Algorithm 1 summarizes the computations.

We note that Algorithm 1 may break down when some [[delta].sub.i] becomes (numerically) zero because then the next block vector [V.sub.i] cannot be computed. Breakdown is very rare, and we will assume that all [[delta].sub.i] determined by the algorithm are numerically positive. Then Algorithm 1 computes an F-orthonormal basis [V.sub.1],..., [V.sub.2m+2] of block vectors [V.sub.j] [member of] [R.sup.nxs] of the extended global Krylov subspace [K.sup.m+1,m+1].
Algorithm 1 The extended global Lanczos algorithm.

Input: symmetric positive definite matrix A [member of] [R.sup.nxn],
initial block vector V [member of] [R.sup.nxs]
Output: F-orthonormal basis [V.sub.1],..., [V.sub.2(m+1)] of
[K.sup.m+1,m+1](A, V)
1: [[delta].sub.1] = ||V[||.sub.F]; [V.sub.1] = V/[[delta].sub.1];
[V.sub.0] = 0;
2: for j = 1 to m
3: V = [A.sup.-1][V.sub.2j-1];
4: [[beta].sub.2j-2] = [(V, [V.sub.2j-2]).sub.F];
V = V -[[beta].sub.2j-2][V.sub.2j-2];
5: [[beta].sub.2j -1] = [(V, [V.sub.2j-1]).sub.F];
V = V -[[beta].sub.2j-1][V.sub.2j-1];
6: [[delta].sub.2j] = ||V[||.sub.F]; [V.sub.2j] = V /[[delta].sub.2j];
7: V = A[V.sub.2j];
8: [[alpha].sub.2j-1] = (V, [V.sub.2j-1])F; V = V -[[alpha].sub.2j-1]
[V.sub.2j-1];
9: [[alpha].sub.2j] = (V, [V.sub.2j])F; V = V - [[alpha].sub.2j]
[V.sub.2j];
10: [[delta].sub.2j+1] = ||V[||.sub.F]; [V.sub.2j+1] =
V/[[delta].sub.2j+1];
11: end for
12: V = [A.sup.-1] [V.sub.2m+1];
13: [[beta].sub.2m] = [(V,[V.sub.2m]).sub.F]; V = V - [[beta].sub.2m]
[V.sub.2m];
14: [[beta].sub.2m+1] = [(V, [V.sub.2m+1]).sub.F]; V =
V[[beta].sub.2m+1] [V.sub.2m+1];
15: [[delta].sub.2m+2] ||V[||.sub.F]; [V.sub.2m+2] V
/[[delta].sub.2m+2];

Below we show some properties of the basis [V.sub.1],... , [V.sub.2m+2] determined by Algorithm 1. In particular, we show how the block vectors [V.sub.j] are related to orthogonal Laurent polynomials. The following spaces of Laurent polynomials of degree 2m - 1 and 2m will be used:

[[DELTA].sub.-m,m-1] span{1,[x.sup.-1], x,...,[x.sup.m-1], [x.sup.-m]}, [[DELTA].sub.-m,m] span{1, [x.sup.-1], x,... , [x.sup.m-1], [x.sup.-m], [x.sup.m]}.

Elements of these spaces are known as Laurent polynomials.

PROPOSITION 2.1. Let [V.sub.1], [V.sub.2],..., [V.sub.2m+1], [V.sub.2m+2] be the F-orthonormal basis of [K.sub.g.sup.m+2,m+1](A, V) determined by Algorithm 1. Then

[V.sub.2j-1] = [p.sub.j-1](A)V [q.sub.j-1]([A.sup.-1])V = [R.sub.2j-1](A)V, [V.sub.2j] = [r.sub.j-1]([A.sup.-1])[A.sup.-1] V + [s.sub.j-1](A)V = [R.sub.2j-1](A)V,

where deg([p.sub.j-1]) = deg([r.sub.j-1]) = j -1, deg([q.sub.j-1]) [less than or equal to] j -1, anddeg([s.sub.j-1]) [less than or equal to] j-1. Moreover, the Laurent polynomials [R.sub.2j-1] and [R.sub.2j-2] live in the spaces [[DELTA].sub.-j,j-1] and [[DELTA].sub.-(j-1),j-1], respectively.

Proof. The result follows from the recursion formulas of Algorithm 1.

The following proposition shows that the Laurent polynomials of Proposition 2.1 satisfy pairs of three-term recursion relations.

PROPOSITION 2.2. Let A be a symmetric positive definite matrix. Then there is a sequence of Laurent polynomials [R.sub.0], [R.sub.1],..., [R.sub.2m] that are orthonormal with respect to the bilinear form

(2.10) (P, Q)F = trace(P(A)V, Q(A)V) = [Florin] P([lambda])Q([lambda])d[alpha]([lambda]),

where d[alpha] is the measure in (1.3). Its support lives on the positive real axis. Moreover, these Laurent polynomials satisfy a pair of three-term recurrence relations of the form

(2.11) [[delta].sub.2j] [R.sub.2j-1](x) = ([x.sup.-1] -[[beta].sub.2j-1]) [R.sub.2j-2] (x) - [[beta].sub.2j-2] [R.sub.2j-3](x), [[delta].sub.2j+1][R.sub.2j](x) = (x - [[alpha].sub.2j])[R.sub.2j-1](x) - [[alpha].sub.2j-1][R.sub.2j] - 2(x), j = 1,...,m,

where [R.sub.0] = 1/[[delta].sub.1] and [R.sub.-1] = 0.

Proof. According to Proposition 2.1, there exists a sequence of Laurent polynomials [R.sub.0], [R.sub.1],... , [R.sub.2m] such that [V.sub.i] = [R.sub.i-1](A)V, for i = 1,..., 2m + 2. Let [R.sub.0] = 1/[[delta].sub.1]. It follows from the spectral factorization (1.2) that

[mathematical expression not reproducible]

Thus, the Laurent polynomials [R.sub.j] are orthonormal with respect to the bilinear form (2.10). Substituting [V.sub.i] = [R.sub.i-1](A)V into (2.9) gives (2.11).

Next we express the Laurent polynomials [R.sub.j] in terms of the power basis,

(2.12) [mathematical expression not reproducible]

The coefficients [r.sub.2j-1,j-1] and [r.sub.2j,j] are the leading coefficient of [R.sub.2j-1] and [R.sub.2j], respectively, and [r.sub.2j-1,-j] and [r.sub.2j,-j] are the trailing coefficient of [R.sub.2j-1] and [R.sub.2j], respectively. The sequence [R.sub.0], [R.sub.1],... , [R.sub.2m] of Laurent polynomials is said to be regular if the leading and trailing coefficients of each Laurent polynomial are nonvanishing. The following result shows the relation between leading and trailing coefficients and the coefficients in the recursion formulas of Proposition 2.2.

PROPOSITION 2.3. Let the sequence of orthonormal Laurent polynomials be defined by the recursion formulas of Proposition 2.2, and let the [[alpha].sub.j], [[beta].sub.j], and [[delta].sub.j] be the recursion coefficients. Then the leading and trailing coefficients of the Laurent polynomials can be computed recursively. Let [[alpha].sub.0] = 1 and [r.sub.0,0] = 1/[[delta].sub.1]. Then the leading and trailing coefficients are given by, for j = 1,2,..., m,

(2.13) [mathematical expression not reproducible],

(2.14) [mathematical expression not reproducible],

(2.15) [mathematical expression not reproducible],

(2.16) [mathematical expression not reproducible].

Proof Equation (2.11) with j = 1 and the fact that [R.sub.0] is a constant yields that the coefficient of the trailing reciprocal power [x.sup.-1] of [R.sub.1] is [r.sub.1,-1] = 1/[[delta].sub.1][[delta].sub.2] and its constant term is [r.sub.1,0] = -[[beta].sub.1] /[[delta].sub.1][[delta].sub.2]. The result follows by observing how the leading positive powers and trailing reciprocal powers are incremented in the recursion formulas. The coefficient of the leading term of [R.sub.2j], [r.sub.2j,2j-1], is determined by the coefficient of the leading term of [R.sub.2j-1] divided by [[delta].sub.2j+1]; the coefficient of its trailing term is the coefficient of the trailing term of [R.sub.2j-1] multiplied by - [[alpha].sub.2j] and divided by [[delta].sub.2j+1]. The leading and trailing coefficients of [R.sub.2j+1] are found in a similar way. The right-hand sides of (2.13)-(2.16) follow by substitution.

PROPOSITION 2.4. Let Abe a symmetric positive definite matrix. Then the Laurent polynomials defined by (2.12) form a regular sequence.

Proof. The bilinear form [(U, AW).sub.F] defines an inner product for a normed vector space when A is a symmetric positive definite matrix. Hence, the coefficients [[alpha].sub.2i] = ([V.sub.2i], A[V.sub.2i])F and [[beta].sub.2i-1] = ([V.sub.2i-1], [A.sup.-1][V.sub.2i-1]) F are nonvanishing for i = 1,2,..., m. An application of Proposition 2.3 shows that the sequence of Laurent polynomials [{[R.sub.i]}.sup.2m.sub.i=0] is regular.

The following proposition allows us to express the leading and trailing coefficients in terms of inner products involving Laurent polynomials.

PROPOSITION 2.5. The leading and trailing coefficients of the Laurent polynomials [R.sub.0], [R.sub.1],... , [R.sub.2m], defined by (2.12), can be expressed as

(2.17) [mathematical expression not reproducible],

(2.18) [mathematical expression not reproducible],

(2.19) [mathematical expression not reproducible],

(2.20) [mathematical expression not reproducible],

Due to the F-orthogonality of [R.sub.0], [R.sub.1],..., [R.sub.2m], we have ([R.sub.i],[x.sup.-([i/2]+1]))F [not equal to] 0 and ([R.sub.i], [x.sup.[(i+1)/ 2]])F [not equal to] 0, for all i = 0,1,..., 2m, where [t] denotes the integer part of t [greater than or equal to] 0. This secures that the denominators in the expressions (2.17)-(2.20) are nonvanishing.

Proof. It follows from (2.12) that

[mathematical expression not reproducible]

The orthonormality of the Laurent polynomials [R.sub.0],... , [R.sub.2m] yields

[([R.sub.2j-1], [R.sub.2j-1]).sub.F] = [r.sub.2j-1,-j] [([R.sub.2j-1],[x.sup.-j]).sub.F],

which shows (2.17). To show (2.18), we multiply the first equation in (2.12) by x. Then

[mathematical expression not reproducible]

Again using the orthonormality of the [R.sub.i], we obtain

[(x[R.sub.2j-1], [R.sub.2j-1]).sub.F] = [2.sub.j-1,j-1] [([R.sub.2j-1], [x.sup.j]).sub.F],

which gives (2.18). The relations (2.19) and (2.20) can be shown similarly.

PROPOSITION 2.6. Let the sequence of orthonormal Laurent polynomials be determined by Proposition 2.2. The recursion coefficients [{[[alpha].sub.j]}.sup.2m.sub.j=1], [{[[beta].sub.j]}.sup.2m+1.sub.j=1], and [{[[delta].sub.j]}.sup.2m+2.sub.j=2] can be expressed as

[mathematical expression not reproducible]

Proof. First consider the coefficient [[alpha].sub.2j-1]. We have

[[alpha].sub.2j-1] = [(x[R.sub.2j-1], [2.sub.j-2]).sub.F] = [([R.sub.2j-1], x[R.sub.2 j-2]).sub.F].

It follows from (2.12) that

[mathematical expression not reproducible]

The orthonormality of [R.sub.0],... , [R.sub.2m] now yields

[mathematical expression not reproducible].

The expression for [[beta].sub.2j] can be shown similarly. It follows from the fact that

[beta]2j = [([x.sup.-1] [R.sub.2j], [R.sub.2j-1]).sub.F] = [([R.sub.2j],[x.sup.-1] [R.sub.2j-1]).sub.F].

We turn to [alpha]2j. It follows from (2.14) that

[mathematical expression not reproducible].

Now using (2.18) and (2.19) yields

(2.21) [mathematical expression not reproducible].

On the other hand, we have

[mathematical expression not reproducible]

and using (2.17) and (2.20), we obtain

(2.22) [mathematical expression not reproducible].

Combining (2.21) and (2.22) gives [[alpha].sub.2j]. To show the expression for [[beta].sub.1], we use the fact that [R.sub.1](x) = [r.sub.1-1][x.sup.-1] + [r.sub.1,0] and the expressions for [r.sub.1,0] and [r.sub.1,-1] in (2.13) and (2.14). This yields

[mathematical expression not reproducible].

The expression for [[beta].sub.2j+1] follows from applications of Propositions 2.3 and 2.5. We obtain

(2.23) [mathematical expression not reproducible],

(2.24) [mathematical expression not reproducible].

Combining (2.23) and (2.24) gives the expression for [[beta].sub.2j+1]. Finally, the expressions for [[delta].sub.2j] and [[delta].sub.2j+1] can be shown by using (2.21) and (2.23).

We summarize the above results in the following theorem.

THEOREM 2.7. Let Abe a symmetric positive definite matrix, let V [member of] [R.sup.nxs] be a block vector, and let d[alpha] denote the measure defined in (1.3). Then there is a sequence of regular Laurent polynomials [R.sub.0], [R.sub.1],..., [R.sub.2m] that are orthonormal with respect to the bilinear form (2.10). The Laurent polynomials can be computed recursively by pairs of three-term recurrence relations of the form

(2.25) [[delta].sub.2j] [r.sub.2j-1](x) = ([x.sup.-1] - [[beta].sub.2j-1]) [R.sub.2j-2] (x) -[[beta].sub.2j-2] [R.sub.2j-3](x), [[delta].sub.2j+1][R.sub.2j](x) = (x - [[alpha].sub.2j])[R.sub.2j-1](x) - [[alpha].sub.2j-1][R.sub.2j-2](x), j = 1,2,...,m,

with [R.sub.0] = 1/[[mu].sub.0], [R.sub.-1] = 0, and [[mu].sub.0] = ||V[||.sub.F], where

[mathematical expression not reproducible]

Moreover,

[[mu].sub.n] = [Florin] [x.sup.n]d[alpha](x), [[rho].sub.n] = ([R.sub.n],[x.sup.[(n+1)/2]])F, [[sigma].sub.n] = [([R.sub.n],[x.sup.-([n/2]+1])).sub.F].

We assume that ms << n is sufficiently small to avoid breakdown of the recursion relations. Proof. The result follows from Propositions 2.2, 2.4, and 2.6.

3. Application to the approximation of matrix functions. We begin by introducing some notation. Partition the matrices M = [[M.sub.1],..., [M.sub.p]] [member of] [R.sup.nxps] and N = [[N.sub.1],..., [N.sub.l]] [member of] [R.sup.nxls] into block columns [M.sub.i], [N.sub.j] [member of] [R.sup.nxs], and define the [??]-product of the matrices M and N as

[M.sup.T] [??] N = [[trace([M.sub.i.sup.T] [N.sub.j])].sup.j=1,...,l.sub.i=1,...,l],p [member of] [R.sup.pxl].

The matrix M is said to be F-orthonormal if [M.sup.T] [??] M = [I.sub.p]. The [??]-product satisfies

[M.sup.T] [??] (B(L (g [cross product] [I.sub.s])) = ([M.sup.T] [??] B)L,

where L [member of] [R.sup.lxl]; see [1, 3, 4] for further details.

Let the block columns of the matrix [V.sub.2m] = [[V.sub.1], [V.sub.2],..., [V.sub.2m]] [member of] [R.sup.nx2ms] form an orthonormal basis for the extended Krylov subspace [K.sub.g.sup.m+1,m]. Thus, the block columns satisfy (1.6). Introduce the matrices

(3.1) [T.sub.2m]=[V.sup.T.sub.2m] [??]A[V.sub.2m] [member of][R.sup.2mx2m], [S.sub.2m]=[V.sup.T.sub.2m] [??][A.sup.-1][V.sub.2m] [member of][R.sup.2mx2m].

The expression (1.1) can be approximated by

(3.2) [G.sup.e.sub.2m](f) = [||V||.sup.2.sub.F][e.sup.T.sub.1] f([T.sub.2m])[e.sub.1].

We will see that this approximation is analogous to (1.7). Define the spectral factorization

(3.3) [T.sub.2m] = [U.sub.2m][THETA]2m[U.sup.T.sub.2m],

where

[U.sub.2m] = [[u.sub.1],...,[u.sub.2m]] [member of] [R.sup.2mx2m], [U.sup.T.sub.2m][U.sub.2m] = [I.sub.2m], [[THETA].sub.2m] = diag[[[theta].sub.1],... , [[theta].sub.2m]].

Substituting (3.3) into (3.2) yields

(3.4) [mathematical expression not reproducible],

where [w.sub.i] = [||V||.sup.2.sub.F] [u.sup.2.sub.i,1]. Here [u.sub.i,1] denotes the first component of the normalized eigenvector [u.sub.i] of [T.sub.2m]. We will see below that (3.4) is a Gauss-Laurent quadrature rule. An adaptation of the Christoffel-Darboux formula to the extended global Lanczos method shows that the weights [w.sub.i] satisfy

[mathematical expression not reproducible];

see, e.g., Bultheel et al. [6] for details. The next two propositions express the entries of [T.sub.2m] and [S.sub.2m] in terms of the recursion coefficients. This will allows us to compute the entries quite efficiently.

PROPOSITION 3.1. Let the matrix A be symmetric positive definite, and let [{[[alpha].sub.j]}.sup.2m.sub.j=1], [{[[beta].sub.j]}.sup.2m-1.sub.j=1], and [{[[delta].sub.j]}.sup.2m+2.sub.j=2] be the recursion coefficients for the orthonormal Laurent polynomials; cf. Proposition 2.2. The matrix [T.sub.2m] = [[t.sub.i,j]] in (3.1) is symmetric andpentadiagonal with the nontrivial entries, for j = 1,2,..., m,

(3.5) [t.sub.i,2j] = [[alpha].sub.i] for i [member of] {2j - 1,2j},

(36) [2.sub.j+1,2j] = [[delta].sub.2j +1],

(3.7) [mathematical expression not reproducible],

(3.8) [t.sub.2j-2,2j-1] = [[delta].sub.2j-1],

(3.9) [mathematical expression not reproducible],

(3.10) [t.sub.2j,2j-1] = [[alpha].sub.2j-1],

(3.11) [mathematical expression not reproducible].

Proof. We have [t.sub.i,j] = trace([V.sub.i.sup.T]A[V.sub.j]). Therefore, (3.5) and (3.6) follow from the definitions of [[alpha].sub.2j-1], [[alpha].sub.2j], and [[delta].sub.2j+1] in (2.9). Using the symmetry of [T.sub.2m] and the definitions of [[alpha].sub.2j-1] and [[delta].sub.2j-1], the formulas (3.8) and (3.10) are obtained from

[t.sub.2j-2,2j- 1] = [t.sub.2j-1,2j-2] = trace([V.sup.T.sub.2j-1] - A[V.sub.2j-2]) = [[delta].sub.2j-1], [t.sub.2j,2j-1] = [t.sub.2j-1,2j] = trace([V.sup.T.sub.2j-1] - A[V.sub.2j]) [[alpha].sub.2j-1].

The formulas (3.7), (3.9), and (3.11) are obtained from the expressions for A[V.sub.2j-1] for j = 1,2,..., m. Thus, multiplying the first equality in (2.9) by A from the left gives

[[beta].sub.2j-1]A[V.sub.2j-1] = [V.sub.2j - 1] - [[beta].sub.2j - 2]A[V.sub.2j - 2] - [[delta].sub.2j] A[V.sub.2j],

and using the second equation in (2.9), we see that

(3.12) [[beta].sub.2j - 1]A[V.sub.2j - 1] = -[[beta].sub.2j - 2][[alpha].sub.2 j-3] [V.sub.2j-3] - [[beta].sub.2j - 2][[alpha].sub.2j - 2] [V.sub.2j - 2] + (1 - [[beta].sub.2j - 2][[delta].sub.2j - 1 ]- [[delta].sub.2j][[alpha].sub.2j - 1]) [V.sub.2j - 1] - [[delta].sub.2j][[alpha].sub.2j][V.sub.2j] - [[delta].sub.2j][[delta].sub.2j +1][V.sub.2j +1].

In view of that [t.sub.2j-1,2j-1] = trace([V.sup.T.sub.2j-1]A[V.sub.2j-1]) and [t.sub.2j+1,2j-1] = trace([V.sup.T.sub.2j+1]A[V.sub.2j-1]) and due to the orthonormality of the set of block vectors [{[V.sub.i]}.sup.2(m+1).sub.i=1], we obtain (3.9) and (3.11). Finally,

[mathematical expression not reproducible],

which shows (3.7). This concludes the proof of the proposition.

PROPOSITION 3.2. Let the matrix A be symmetric positive definite, and let [{[alpha]j}.sup.2m.sub.j=1], [{[[beta].sub.j]}.sup.2m-1.sub.j=1], and [{[[delta].sub.j]}j.sup.2m+2.sub.j=2] be recursion coefficients for the orthonormal Laurent polynomials; cf. (2.2). The matrix [S.sub.2m] = [[s.sub.i,j]] in (3.1) is symmetric andpentadiagonal with the nontrivial entries, for j = 1, 2,..., m,

[mathematical expression not reproducible],

Proof. The proof is analogous to the proof of Proposition 3.1. The relation

(3.13) [[alpha].sub.2j][A.sup.-1] [V.sub.2j] = -[[alpha].sub.2j - 1][[beta].sub.2j - 2] [V.sub.2j-2] - [[alpha].sub.2j - 1][[beta].sub.2j - 1][V.sub.2j - 1] + (1 - [[alpha].sub.2j-1][[delta].sub.2j] - [[delta].sub.2j+1][[beta].sub.2j])[V.sub.2j] -[[delta].sub.2j+1] [[beta].sub.2j+1] [V.sub.2j+1] -[[delta].sub.2j +1][[delta].sub.2j +2] [V.sub.2j+2]

is used.

The following relations can be shown by expressing equations (2.9), (3.12), and (3.13) in matrix form:

(3.14) A[V.sub.2m] = [V.sub.2m]([T.sub.2m] [cross product] [I.sub.s]) + [V.sub.2m+1] [[t.sub.2m+1,2m-1], [t.sub.2m+1,2m]] [E.sup.T.sub.2m-1] [cross product] [I.sub.s]],

(3.15) [mathematical expression not reproducible]

where the matrix [E.sub.2m-1] = [[e.sub.2m-1],[e.sub.2m]] [member of] [R.sup.2mx2] is made up of the last two columns of the identity matrix [I.sub.2m] [member of] [R.sup.2mx2m].

Algorithm 1 breaks down when [[delta].sub.j] = 0. The next result addresses this situation. PROPOSITION 3.3. Assume that the coefficients [[delta].sub.j] generated by Algorithm 1 satisfy [[delta].sub.j] > 0, for 2 [less than or equal to] j [less than or equal to] 2m, and [[delta].sub.2m+1] = 0. Then

trace([V.sup.T] f(A)V) = [||V||.sup.2.sub.F] [e.sup.T.sub.1] f([T.sub.2m])[e.sub.1]

for any function f that is continuous on the convex hull of the spectrum of A.

Proof. Under the conditions of the proposition, the decomposition (3.14) simplifies to

A[V.sub.2m] = [V.sub.2m]([T.sub.2m] [cross product] [I.sub.s]), [V.sub.2m] = [[V.sub.1],... , [V.sub.2m]].

Then an application of [3, Proposition 2.1] gives

f(A)[V.sub.2m] = [V.sub.2m]f([T.sub.2m] [cross product] [I.sub.s]) = [V.sub.2m]f([T.sub.2m]) [cross product] [I.sub.s].

Multiplying the above expression by [E.sub.1] = [e.sub.1] [cross product] [I.sub.s] from the right-hand side yields

f(A)[V.sub.1] = [V.sub.2m](f([T.sub.2m]) [cross product] [I.sub.s])[E.sub.1] = [V.sub.2m](f([T.sub.2m])[e.sub.1] [cross product] [I.sub.s])

and

trace([V.sup.T.sub.1] f(A)[V.sub.1]) = [V.sub.1.sup.T] [??] [[V.sub.2m](f([T.sub.2m])[e.sub.1] [??] [I.sub.s]) = ([V.sup.T.sub.1] [??] [V.sub.2m])(f([T.sub.2m])[e.sub.1]) = [e.sup.T.sub.1] f([T.sub.2m])[e.sub.1].

Substituting V = ||V[||.sub.F] [V.sub.1] into trace([V.sub.1.sup.T]f(A)[V.sub.1]) completes the proof.

PROPOSITION 3.4. Let the Laurent polynomials [R.sub.0], [R.sub.1],..., [R.sub.2m] be determined by the recursion relations (2.2). Then the 2m zeros of the Laurent polynomial [R.sub.2m] are the eigenvalues of the matrix [T.sub.2m] in (3.1).

Proof. Consider the vector of Laurent polynomials

[[[right arrow].R].sub.2m](x) = [[[R.sub.0](x), [R.sub.1](x),... , [R.sub.2m-1](x)].sup.T].

Then the recurrence relation (2.25) can be expressed as

x[[[right arrow].R].sub.2m] (x) = [[[right arrow].R].sub.2m] (x)[T.sub.2m] + [R.sub.2m] (x)[[t.sub.2m+1,2m-1], [2.sub.m+1,2m]][E.sup.T.sub.2m-1],

where [E.sub.2m-1] is defined below of (3.15). Let [x.sub.j] be a zero of [R.sub.2m]. Then

[T.sub.2m][[[right arrow].R].sub.2m]([x.sub.j]) = [x.sub.j][[[right arrow].R].sub.2m]([x.sub.j]).

This completes the proof.

Proposition 3.3 shows that in case of a breakdown of Algorithm 1, the trace (1.1) can be computed exactly after m steps of the algorithm. However, breakdown is rare. We are therefore interested in how well (1.1) is approximated when no breakdown occurs. It is shown in [3] that when the (standard) global Lanczos method is applied, we have

trace([V.sup.T]p(A)V) = [||V||.sup.2.sub.F] [e.sup.T.sub.1]p([T.sub.m])[e.sub.1], [for all]p [member of] [PI][2.sub.m-1].

The following result is analogous for Laurent polynomials. We show that the approximation (3.2) corresponds to a 2m-point Gauss-Laurent quadrature rule, i.e., the rule is exact for Laurent polynomials of positive degree at most 2m - 1 and negative degree at most 2m.

THEOREM 3.5. Let A be a symmetric positive definite matrix. Carry out m steps of Algorithm 1 with an initial block vector V [member of] [R.sup.nxs]. Assume that no breakdown occurs. The algorithm then determines the entries of the symmetric pentadiagonal matrix [T.sub.2m] defined by (3.1), and

trace([V.sup.T] p(A)V) = [||V||.sup.2.sub.F] [e.sup.T.sub.1]p([T.sub.2m])[e.sub.1], [for all] p [member of] [[DELTA].sub.-2m,2m-1].

To show this theorem, we require some auxiliary results on the properties of the matrices [T.sub.2m] and [S.sub.2m] in(3.1).

LEMMA 3.6. Let the matrices [T.sub.2m] and [S.sub.2m] be defined by (3.1), and let [E.sub.2m-1] = [[e.sub.2m-1], [e.sub.2m]] [member of] [R.sup.2mx2]. Then

[E.sup.T.sub.2m-1][T.sup.j.sub.2m][e.sub.1] = [[0 0].sup.T], j = 0, 1,... , m - 2, [E.sup.T.sub.2m-1][S.sup.j.sub.2m][e.sub.1] = [[0 0].sup.T], j = 0, 1,... , m - 1.

Proof The result follows by exploiting the structure of the symmetric pentadiagonal matrices [T.sub.2m] and [S.sub.2m]; see Propositions 3.1 and 3.2.

Th following result relates positive powers of [S.sub.2m] to negative powers of [T.sub.2m].

LEMMA 3.7. Let [T.sub.2m] and [S.sub.2m] be given by (3.1), and let [e.sub.1] = [[1,0,..., 0].sup.T] [member of] [R.sup.2m].

Then

(3.16) [S.sup.j.sub.2m][e.sub.1] = [T.sup.-j.sub.2m][e.sub.1] for j = 1, 2,... , m.

Proof. Using induction, we begin by showing that

[T.sup.j.sub.2m][S.sup.j.sub.2m] = [T.sup.j-1.sub.2m] [S.sup.j-1.sub.2m] - [T.sup.j-1.sub.2m] ([V.sup.T.sub.2m] [??] A[[V.sub.2m+1], [V.sub.2m+2]])[[~.S].sub.m+1,m][E.sup.T.sub.2m-1][S.sup.j-1.sub.2m], j = 1,..., m,

with [mathematical expression not reproducible]. Using the decomposition (3.14), (3.15), and the properties of the Kronecker product, we obtain

[I.sub.2m] = [T.sub.2m][S.sub.2m] + ([V.sup.T.sub.2m] [??] A[[V.sub.2m+1], [V.sub.2m+2]]) [[~.S].sub.m+1,m][E2.sup.T.sub.m-1].

Let j = 2,3,..., m, and assume that

[T.sup.k.sub.2m][S.sup.k.sub.2m] = [T.sup.k-1.sub.2m] [S.sup.k-1.sub.2m] - [T.sup.k-1.sub.2m] ([V.sup.T.sub.2m] [??] A[[V.sub.2m+1], [V.sub.2m+2]])[[~.S].sub.m+1,m][E.sup.T.sub.2m-1][S.sup.k-1.sub.2m], k = 1,..., j - 1.

By induction, we have

[T.sup.j.sub.2m] [S.sup.j.sub.2m] = [T.sub.2m] [T.sup.j-1.sub.2m] [S.sup.j-1.sub.2m] [S.sub.2m] = [T.sup.j-1.sub.2m] [S.sup.j-1.sub.2m] - [T.sup.j-1.sub.2m] ([V.sup.T.sub.2m] [??]A[[V.sub.2m+1],[V.sub.2m+2]])[[~.S].sub.m+1,m][E.sup.T.sub.2m-1][S.sup.j-1.sub.2m].

Multiplying this equation by [e.sub.1] from the right gives

[T.sup.j.sub.2m][S.sup.j.sub.2m][e.sub.1] = [T.sup.j-1.sub.2m] [S.sup.j-1.sub.2m] [e.sub.1] - [T.sup.j-1.sub.2m] ([V.sup.T.sub.2m] [??] A[[V.sub.2m+1], [V.sub.2m+2]])[[~.S].sub.m+1,m][E.sup.T.sub.2m-1][S.sup.j-1.sub.2m] [e.sub.1].

An application of Lemma 3.6 yields

[T.sup.j.sub.2m][S.sup.j.sub.2m][e.sub.1] = [T.sup.j-1.sub.2m] [S.sup.j-1.sub.2m] [e.sub.1], j = 1, 2,... , m.

This completes the proof.

LEMMA 3.8. Let the matrix A be symmetric positive definite, and let the matrices [T.sub.2m] and [S.sub.2m] be defined by (3.1). Let [V.sub.2m] = [[V.sub.1], [V.sub.2],... , [V.sub.2m]] be the matrix in (3.14), i.e., the block columns [V.sub.j] [member of] [R.sup.nxs] are F-orthonormal, and the first block column is given by [V.sub.1] = V/||V[||.sub.F]. Then

(3.17) [A.sup.j][V.sub.1] = [V.sub.2m]([T.sub.2m][e.sub.1] [cross product] [I.sub.s]), j = 0, 1,... , m - 1,

(3.18) [A.sup.-j][V.sub.1] = [V.sub.2m]([S.sup.j.sub.2m][e.sub.1] [cross product] [I.sub.s]), j = 0,1,..., m,

(3.19) [A.sup.-j][V.sub.1] = [V.sub.2m]([T.sup.-j.sub.2m][e.sub.1] [cross product] [I.sub.s]), j = 0,1,..., m.

Proof. We obtain from (3.14) that

A[V.sub.2m] = [V.sub.2m]([T.sub.2m] [cross product] [I.sub.s]) + [V.sub.2m+1] ([[t.sub.2m+1,2m-1], [t.sub.2m+1,2m]][E.sup.T.sub.2m-1] [cross product] [I.sub.s]).

Multiplying this equation by [E.sub.1] = [e.sub.1] [cross product] [I.sub.s] from the right gives

A[V.sub.1] = A[V.sub.2m][E.sub.1] =[V.sub.2m]([T.sub.2m] [cross product] [I.sub.s])[E.sub.1]+[V.sub.2m+1] ([[t.sub.2m+1,2m-1], [t.sub.2m+1,2m]] [E.sup.T.sub.2m-1] [cross product] [I.sub.s]) [E.sub.1].

Using properties of the Kronecker product and the fact that [E.sup.T.sub.2m-1][e.sub.1] = [0, 0], we obtain

A[V.sub.1] = [V.sub.2m]([T.sub.2m][e.sub.1] [cross product] [I.sub.s]).

Let j = 2,3,..., m - 1, and assume that

[A.sup.k] [V.sub.1] = [V.sub.2m]([T.sup.k.sub.2m][e.sub.1] [cross product] [I.sub.s]), k = 0, 1,... , j - 1.

We will show the identity

[A.sup.j][V.sub.1] = [V.sub.2m]([T.sup.j.sub.2m][e.sub.1] [cross product] [I.sub.s])

by induction. We have

[A.sup.j][V.sub.1] = A * [A.sup.j-1] [V.sub.1] = A[V.sub.2m]([T.sup.j-1.sub.2m] [e.sub.1] [cross product] [I.sub.s]).

Using the decomposition (3.14), we obtain

[A.sup.j] [V.sub.1]= [[V.sub.2m]([T.sub.2m] [cross product] [I.sub.s]) +[V.sub.2m+1] ([[t.sub.2m+1,2m-1], [t.sub.2m+1,2m]] [E.sup.T.sub.2m-1] [cross product] [I.sub.s])] ([T.sub.2m] [e.sub.1] [cross product] [I.sub.s]) = [V.sub.2m]([T.sub.2m][e.sub.1] [cross product] [I.sub.s]) + [V.sub.2m+1] ([[t.sub.2m+1,2m-1], [t.sub.2m+1,2m]] [E.sup.T.sub.2m-1][T.sup.j-1.sub.2m] [e.sub.1] [cross product] [I.sub.s]).

An application of Lemma 3.6 gives

[A.sup.j][V.sub.1] = [V.sub.2m]([T.sub.2m][e.sub.1] [cross product] [I.sub.s]).

According to (3.11) and Lemma 3.6 and by using the same techniques as above, we find (3.18). Finally, (3.19) follows from (3.17) and Lemma 3.7.

LEMMA 3.9. Let the matrix A be symmetric positive definite, and let the matrices [T.sub.2m] and S2m be defined by (3.1). Let [V.sub.1] be the initial block vector computed in Algorithm 1. Then

(3.20) trace([V.sup.T.sub.1] [A.sup.j][V.sub.1]) = [e.sup.T.sub.1] [T.sup.j.sub.2m][e.sub.1], for j = 0,1,..., 2m - 1,

(3.21) trace([V.sup.T.sub.1] [A.sup.-j][V.sub.1]) = [e.sup.T.sub.1] [S.sup.j][2.sub.m][e.sub.1], for j = 0,1,...,2m + 1,

(3.22) trace([V.sup.T.sub.1] [A.sup.-j][V.sub.1]) = [e.sup.T.sub.1] [T.sup.-j.sub.2m][e.sub.1], forj = 0,1,..., 2m.

Proof. The proof is based on an application of Lemma 3.8. Let j = [j.sub.1] + [j.sub.2] with 0 [less than or equal to] [j.sub.1],[j.sub.2] < m. Consider

trace([V.sup.T.sub.1] [A.sup.j][V.sub.1]) = trace([V.sup.T.sub.1] [A.sup.j1] [A.sup.j2][V.sub.1]) = trace([V.sup.T.sub.1] [A.sup.j1] [V.sub.2m]([T.sup.j2.sub.2m][e.sub.1] [cross product] [I.sub.s])) = ([V.sup.T.sub.1][A.sup.j1]) [??] [V.sub.2m]([T.sup.j1.sub.2m] [e.sub.1] [cross product] [I.sub.s]) = ([V.sup.T.sub.1][A.sup.j1] [??][V.sub.2m])[T.sup.j2.sub.2m] [e.sub.1] = [e.sup.T.sub.1][T.sup.j1.sub.2m] [T.sup.j2.sub.2m] [e.sub.1] = [e.sup.T.sub.1] [T.sup.j.sub.2m][e.sub.1].

For the power 2m - 1 of A, we have

trace([V.sub.1.sup.T] [A.sup.2m-1] [V.sub.1]) = trace([V.sup.T.sub.1] [A.sup.m][A.sup.m-1] [V.sub.1]) = [V.sup.T.sub.1] [A.sup.m] [??] [V.sub.2m]([T.sup.m-1.sub.2m] [e.sub.1] [cross product] [I.sub.s]) = ([V.sup.T.sub.1][A.sup.m-1]A [??] [V.sub.2m])[T.sup.m-1.sub.2m][e.sub.1] = [([e.sup.T.sub.1] [T.sup.m-1.sub.2m]- [cross product] [I.sub.s])[V.sup.T.sub.2m]A [??] [V.sub.2m]][T.sup.m-1.sub.2m] [e.sub.1] = [e.sup.T.sub.1][T.sup.m-1.sub.2m]- ([V.sup.T.sub.2m]A [cross product] [V.sub.2m])[T.sup.m-1.sub.2m] [e.sub.1] = [e.sup.T.sub.1] [T.sup.2m-1.sub.2m] [e.sub.1].

This shows (3.20). The same techniques can be used to show (3.21). Finally, (3.22) can be established by substituting (3.16) into (3.21).

We are in a position to prove Theorem 3.5.

Proof of Theorem 3.5. Letp [member of] [[DELTA].sub.-2m,2m-1]. Then [mathematical expression not reproducible] for certain coefficients [c.sub.i]. Let V = ||V[||.sub.F] [V.sub.1]. Then

trace([V.sup.T] p(A)V) = ||V[||.sub.F]trace([V.sup.T.sub.1] p(A)[V.sub.1]).

It suffices to show that

trace([V.sup.T.sub.1] p(A)[V.sub.1]) = [e.sup.T.sub.1]p([T.sub.2m])[e.sub.1].

We have

[mathematical expression not reproducible].

Then Lemma 3.9 gives

[mathematical expression not reproducible].

This concludes the proof of Theorem 3.5.

Algorithm 2 describes how approximations of trace([V.sup.T]f(A)V) are computed by the extended global Lanczos method.
Algorithm 2 Approximation of trace([V.sup.T] f (A)V) by the extended
global Lanczos method.

Input: symmetric positive definite matrix A [member of] [R.sup.nxn],
function f, initial block vector
V [member of] [R.sup.nxs]
Output: approximation [G.sub.2m] of trace([V.sup.T]f(A)V)
1: Choose tolerance [??] > 0 and the maximal number of iterations
[I.sub.max]. [G.sub.0] = 1.
2: for m = 1 to [I.sub.max]
3: Compute block vectors [[V.sub.2m-1], [V.sub.2m]] to update
[V.sub.2m] by Algorithm 1.
4: Compute matrix [T.sub.2m] [member of] [R.sup.2mx2m] using
Proposition 3.1.
5: Compute [mathematical expression not reproducible] given by (3.4).
6:    [mathematical expression not reproducible]
7:  if [mathematical expression not reproducible] then exit
8: end for

4. Numerical experiments. We illustrate the performance of the extended global Lanczos method when applied to the approximation of expressions of the form (1.1). All experiments are carried out in MATLAB R2015a on a computer with an Intel Core i-3 processor and 3.89 GBytes of RAM. The computations are done with about 15 significant decimal digits. In all the experiments, the initial block vector V is generated randomly with uniformly distributed entries in the interval [0,1]. The parameter [epsilon] in Algorithm 2 is set to [10.sup.-7] unless specified otherwise.

EXAMPLE 4.1. The purpose of this example is to illustrate the degree of exactness of (3.2) (4m - 1 moments are matched) shown in Theorem 3.5. Consider the symmetric positive definite tridiagonal matrix A = tridiag(-1,2, -1) [member of] [R.sup.nxn] and the symmetric positive definite Toeplitz B = [[[b.sub.i,j]].sup.n.sub.i=1], where [b.sub.i,j] = 1/(1 + |i - j|) with n = 1000. We choose a block size s = 6 and f(x) to be the Laurent polynomial p(x) = [x.sup.-6] + [x.sup.5]. Figure 4.1 shows that the approximation of trace([V.sup.T] p(A)V) determined by the extended global Lanczos method is the exact value of (1.1) after m = 3 iterations, up to round-off errors, in agreement with the theory. The standard global Lanczos requires many more steps to give an accurate approximation of the expression (1.1).

In the following three examples, we compare the performance of Algorithm 2 with the performance of the standard global Lanczos algorithm. The speedup factor shows the ratio of execution times for these methods, i.e., SF := tim[e.sub.standard]/tim[e.sub.extended].

EXAMPLE 4.2. Let A [member of] [R.sup.nxn] with n = 1000 be a symmetric positive definite matrix, whose eigenvalues are log-uniformly distributed in the interval [[10.sup.-1],[10.sup.6]]. The block size is s = 6. Results for several functions f are reported in Table 4.1.

EXAMPLE 4.3. The matrix A is the discretization of the negative Laplacian [L.sub.A](u) = -[u.sub.xx] - [u.sub.yy] on the unit square [[0, 1].sup.2] with homogeneous Dirichlet boundary conditions. The number of inner grid points in each coordinate direction is 100. A discretization by the standard 5-point stencil gives a symmetric positive definite matrix A [member of] [R.sup.nxn] with n = 10000. The block size is chosen to be s = 20. Computed results are shown in Table 4.2.

EXAMPLE 4.4. Let A = [n.sup.2]tridiag(-1, 2,-1) [member of] [R.sup.nxn] with n = 50000. The block size is set to s = 50. Table 4.3 reports the performance of the methods. A dash '-' signifies that the maximum allowed number of iterations [I.sub.max] is reached before the stopping criterion is satisfied.

5. Conclusion. This paper describes an extended global Lanczos method for the approximation of bilinear forms of the form trace([V.sup.T]f(A)V). An F-orthonormal basis of block vectors for extended Krylov subspaces is computed by short recurrence formulas. The numerical results show the extended global Lanczos method to require fewer iterations than the (standard) global Lanczos method. Timings show the former method to be faster than the latter for the functions and matrices considered in Examples 4.2-4.4. The extended global Lanczos method is particularly attractive to use when linear systems of equations with the matrix A can be solved efficiently.

Acknowledgment. Research of LR was supported in part by NSF grants DMS-1729509 and DMS-1720259.

REFERENCES

[1] O. ABIDI, M. HEYOUNI, AND K. JBILOU, On some properties of the extended block and global Arnoldi methods with applications to model reduction, Numer. Algorithms, 75 (2017), pp. 285-304.

[2] S. BARONI, R. GEBAUER, O. B. MALCIOGLU, Y. SAAD, P. UMARI, AND J. XIAN, Harnessing molecular excited states with Lanczos chains, J. Phys. Condens. Mat., 22 (2010), Art. Id. 074204, 8 pages.

[3] M. BELLALIJ, L. REICHEL, G. RODRIGUEZ, AND H. SADOK, Bounding matrix functionals via partial global block Lanczos decomposition, Appl. Numer. Math., 94 (2015), pp. 127-139.

[4] R. BOUYOULI, K. JBILOU, R. SADAKA, AND H. SADOK, Convergence properties of some block Krylov subspace methods for multiple linear systems, J. Comput. Appl. Math., 196 (2006), pp. 498-511.

[5] D. R. BOWLER AND T. MIYAZAKI, O(n) methods in electronic structure calculations, Rep. Prog. Phys., 75 (2012), Art. Id. 036503, 43 pages.

[6] A. BULTHEEL, P. GONZALEZ-VERA, AND R. ORIVE, Quadrature on the half-line and two point Pade approximants to Stieljes functions. Part I: Algebraic aspects, J. Comput. Appl. Math., 65 (1995), pp. 57-72.

[7] V. DRUSKIN AND L. KNIZHNERMAN, Extended Krylov subspaces: approximation of the matrix square root and related functions, SIAM J. Matrix Anal. Appl., 19 (1998), pp. 755-771.

[8] L. ELBOUYAHYAOUI, A. MESSAOUDI, AND H. SADOK, Algebraic properties of the block GMRES and block Arnoldi methods, Electron. Trans. Numer. Anal., 33 (2009), pp. 207-220. http://etna.ricam.oeaw.ac.at/vol.33.2008-2009/pp207-220.dir/pp207-220.pdf

[9] E. ESTRADA, The Structure of Complex Networks: Theory and Applications, Oxford University Press, Oxford, 2011.

[10] C. FENU, L. REICHEL, G. RODRIGUEZ, AND H. SADOK, GCV for Tikhonov regularization by partial SVD, BIT, 57 (2017), pp. 1019-1039.

[11] P. C. HANSEN, Rank-Deficient and Discrete Ill-Posed Problems, SIAM, Philadelphia, 1998.

[12] M. HEYOUNI, Extended Arnoldi methods for large low-rank Sylvester matrix equations, Appl. Numer. Math., 60 (2010), pp. 1171-1182.

[13] N. J. HIGHAM, Functions of Matrices, SIAM, Philadelphia, 2008.

[14] C. JAGELS AND L. REICHEL, The extended Krylov subspace method and orthogonal Laurent polynomials, Linear Algebra Appl., 431 (2009), pp. 441-458.

[15] K. JBILOU, A. MESSAOUDI, AND H. SADOK, Global FOM and GMRES algorithms for matrix equations, Appl. Numer. Math., 31 (1999), pp. 49-63.

[16] L. KNIZHNERMAN AND V. SIMONCINI, A new investigation of the extended Krylov subspace method for matrix function evaluations, Numer. Linear Algebra Appl., 17 (2010), pp. 615-638.

[17] T. T. NGO, M. BELLALIJ, AND Y. SAAD, The trace ratio optimization problem, SIAM Rev., 54 (2012), pp. 545-569.

[18] O. NJASTAD AND W. J. THRON, The theory of sequences of orthogonal L-polynomials, in Pade Approximants and Continued Fractions, H. Waadeland and H. Wallin, eds., Det Kongelige Norske Videnskabers Selskab Skrifter 1, DKNVS, Trondheim, 1983, pp. 54-91.

[19] Y. SAAD, J. CHELIKOWSKY, AND S. SHONTZ, Numerical methods for electronic structure calculations of materials, SIAM Rev., 52 (2010), pp. 3-54.

[20] V. SIMONCINI, A new iterative method for solving large-scale Lyapunov matrix equations, SIAM J. Sci. Comput., 29 (2007), pp. 1268-1288.

A. H. BENTBIB ([dagger]), M. EL GHOMARI ([dagger]), C. JAGELS ([double dagger]), K. JBILOU ([section]), AND L. REICHEL ([paragraph])

Dedicated to Walter Gautschi on the occasion of his 90th birthday

(*) Received November 24, 2018. Accepted November 28, 2018. Published online on January 14, 2019. Recom-mended by G. Milovanovic.

([dagger]) Laboratory LAMAI, Faculty of Sciences and Technologies, Cadi Ayyad University, Marrakech, Morocco ({a.bentbib@uca.ac.ma, m.elghomari10@gmail.com}).

([double dagger]) Department of Mathematics, Hanover College, Hanover, IN 47243, USA (jagels@hanover.edu).

([section]) Laboratory LMPA, 50 Rue F. Buisson, ULCO Calais cedex, France (Khalide.Jbilou@univ-littoral.fr).

([paragraph]) Department of Mathematical Sciences, Kent State University, Kent, OH 44242, USA (reichel@math.kent.edu).

DOI: 10.1553/etna_vol50s144
TABLE 4.1 Example 4.2: The symmetric matrix A [member of] [R.sup.nxn]
has eigenvalues distributed in the interval [[10.sup.-1], [10.sup.6]]
and a random orthogonal eigenvector matrix. The block size is s = 6.

Extended global Lanczos, subspace
[K.sub.g.sup.m+1,m]
(m + 1,m)  Error               Time (sec)

[e.sup.-x]            (25,24)    2.2 x [10.sup.-7]   1.4
[square root of (x)]  (59,58)    9.2 x [10.sup.-7]   3.9
[x.sup.-1/4]          (49,48)    8.9 x [10.sup.-7]   3.0
ln(x)                 (77,76)    9.5 x [10.sup.-7]   7.9
[e.sup.-[square
root of (x)]]         (19,18)    2.3 x [10.sup.-7]   0.6
[x.sup.-4]            (3,2)      1.6 x [10.sup.-11]  0.2

Standard global Lanczos [K.sup.m.sub.g]
m    Error              SF

[e.sup.-x]            470  8.1 x [10.sup.-5]  17.3
[square root of (x)]  599  9.9 x [10.sup.-7]   8.7
[x.sup.-1/4]          607  9.1 x [10.sup.-5]  19.0
ln(x)                 612  1.7 x [10.sup.-3]   7.1
[e.sup.-[square
root of (x)]]         607  5.7 x [10.sup.-5]  61.0
[x.sup.-4]            308  7.8 x [10.sup.-3]  76.3

TABLE 4.2 Example 4.3: The matrix A [member of] [R.sup.nxn] is a
discretized negative Laplacian with n = 10000. Block size is s = 20.

Extended global Lanczos [K.sub.g.sup.m +1,m]
f (x)                 (m + 1,m)  Error               Time (sec)

[e.sup.-x]            (5,4)      1.1 x [10.sup.-7]   1.7
[square root of (x)]  (9,8)      9.4 x [10.sup.-7]   4.1
[x.sup.-1/4]          (9,8)      3.0 x [10.sup.-7]   4.1
ln(x)                 (9,8)      5.6 x [10.sup.-7]   3.6
[e.sup.-[square
root of ()x]]         (4,3)      3.0 x [10.sup.-7]   1.6
[x.sup.-4]            (3,2)      1.5 x [10.sup.-13]  0.9

Standard global Lanczos [K.sup.m.sub.g]
f (x)                 m    Error              SF

[e.sup.-x]            143  9.4 x [10.sup.-7]  38.3
[square root of (x)]   67  8.2 x [10.sup.-7    3.4
[x.sup.-1/4]           84  9.5 x [10.sup.-7]   6.2
ln(x)                  73  8.7 x [10.sup.-7]   5.1
[e.sup.-[square
root of ()x]]         105  9.4 x [10.sup.-7]  28.0
[x.sup.-4]            111  9.9 x [10.sup.-7]  40.0

TABLE 4.3 Example 4.4: the matrix A [member of] [R.sup.nxn] is positive
definite tridiagonal with n = 50000. Block size is s = 50.

Extended global Lanczos [K.sub.g.sup.m +1,m]
f (x)                 (m + 1,m)  Error              Time (sec)

[e.sup.-x]            (4,3)      2.5 x [10.sup.-8]   6.0
[square root of (x)]  (9,8)      8.1 x [10.sup.-4]  21.9
[x.sup.-1/4]          (10,9)     9.1 x [10.sup.-5]  24.5
ln(x)                 (19,18)    9.9 x [10.sup.-5]  65.4
[e.sup.-x]            (4,3)      3.1 x [10.sup.-7]   6.1
[x.sup.-4]            (3,2)      4.0 x [10.sup.-8]   3.7

Standard global Lanczos [K.sup.m.sub.g]
f (x)                 m  Error  SF

[e.sup.-x]            -  -      -
[square root of (x)]  -  -      -
[x.sup.-1/4]          -  -      -
ln(x)                 -  -      -
[e.sup.-x]            -  -      -
[x.sup.-4]            -  -      -
COPYRIGHT 2018 Institute of Computational Mathematics
No portion of this article can be reproduced without the express written permission from the copyright holder.