# A rank-one updating approach for solving systems of linear equations in the least squares sense.

1. Introduction. We consider the problem of solving a rectangular
system of linear equations Ax = b in the least squares sense. Note that
x solves the associated normal equations

[A.sup.T] Ax = [A.sup.T] b.

Here we suppose that the matrix A [member of] [R.sup.mxn] has maximal rank [mu] := min (m, n). In the case m = n this amounts to solving a nonsingular system Ax = b with a perhaps nonsymmetric matrix A by means of solving [A.sup.T] Ax = [A.sup.T]b. When A is symmetric positive definite (s.p.d.), the classical conjugate gradient method (cg) of Hestenes and Stiefel [13] belongs to the most powerful iterative methods. For solving general rectangular systems, a number of cg-type methods have been developed, see for instance [23]. In general solving such systems is more difficult than in the case of s.p.d. A. In this paper, we use rank-one updates to find approximations of the pseudoinverse of A, which, for instance, may be used as preconditioners for solving such systems with multiple right hand sides.

The use of a rank-one update to solve square systems with A [member of] [R.sup.nxn] was studied by Eirola and Nevanlinna [8]. They invoked a conjugate transposed secant condition without line search. They proved that their algorithm terminates after at most n iteration steps, and if n steps are needed, then [A.sup.-1] is obtained.

The problem was recently considered by Ichim [14] and by Mohsen and Stoer [17]. In [17], starting from an approximation Ho for the pseudoinverse [A.sup.+], or from [A.sup.T] when such an approximation is not available, their method uses rank-2 updates to generate a sequence of n x m-matrices [H.sub.k] approximating [A.sup.+]. Two rank-2 updates were proposed. One is related to the DFP update and the other to the BFGS update. Numerical results showed that both methods of [17], when used for the case m = n, give a better accuracy than the other rank-one update methods.

On the other hand, rank-one updating has attractive features in terms of computational requirements (lower number of operations, less storage). In this paper, rank-one updates (of Broyden's type) for matrices [H.sub.k] are computed so that, in rather general situations, the sequence [H.sub.k] terminates with the pseudoinverse in no more than [mu] steps. The properties of the algorithm are established. For square systems, m = n, also the new method is compared with the methods CGN, CGS and GMRES, which try to solve Ax = b directly.

An alternative algorithm utilizing the s.p.d. matrices Dk := [AH.sub.k] is proposed. It has the advantage that it can monitor the accuracy as [D.sub.k] [right arrow] [I.sub.m]. However, we then have to update two matrices (but note that the [D.sub.k] are symmetric).

2. Rank-one updating. We consider the solution of the linear system

(2.1) Ax =b

in the least squares sense: [bar.x] "solves" (2.1) if [parallel]b - A[bar.x][parallel]b = [min.sub.x] [parallel]b - Ax[parallel] that is if [A.sup.T] (b - A[bar.x]) = 0. We assume that A [member of] [R.sup.mxn] has maximal rank, rk A = [mu] := min (m, n). Here and in what follows, [parallel]x[parallel] := [(x, x).sup.1/2] is the Euclidean norm and (x, y) := [X.sup.T]y the standard scalar product in [R.sup.m]. The vector space spanned by the vectors [u.sub.i] [member of] [R.sup.m], i = 1,2, ..., k, is denoted by

[[[u.sub.1], [u.sub.2], ..., [u.sub.k]]]

We note already at this point that all results of this paper can be extended to complex linear systems with a complex matrix A [member of] [C.sup.mxn]. One only has to define the inner product in [C.sup.m] by

(x, y) := [x.sup.H]y.

Also the operator [sup.T] then has to be replaced by [sup.H] and the word "symmetric" by "Hermitian".

We call an n x m-matrix H A-related if AH is symmetric positive semidefinite (s.p.s.d.) and [x.sup.T] AH x = 0 implies that [x.sup.T] A = 0 and Hx = 0. Clearly, H := [A.sup.T] is A-related, the pseudoinverse [A.sup.+] (in the Moore Penrose sense) is A-related (this can be shown using the singular value decomposition of A and [A.sup.+]); if A is s.p.d. then H = I is A-related. It is easily verified that any matrix H of the form H = U[A.sup.T], where U [member of] [R.sup.nxn] is s.p.d., is A-related.

This concept will be central since our algorithms aim to generate A-related matrices [H.sub.k] which approximate [A.sup.+]. It also derives its interest from the special case when A is a nonsingular, perhaps nonsymmetric, square matrix. It is easily verified that a matrix H is A-related if and only if AH is symmetric positive definite. This means that H can be used as a right preconditioner to solve Ax = b by means of solving the equivalent positive definite system AH y = b, x = Hy, using the cg-algorithm. Then the preconditioner H will be the better the smaller the condition number of AH is. So the algorithms of this paper to find A-related matrices [H.sub.k] with good upper and lower bounds on the nonzero eigenvalues of [AH.sub.k] can be viewed as algorithms for computing good preconditioners [H.sub.k] in the case of a full-rank A.

The following proposition gives another useful characterization of A-related matrices: PROPOSITION 2.1. Let P be any unitary m x m-matrix satisfying

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [??] is a [mu] x n-matrix of rank [mu] = min (m, n) = rk A. Define for an n x m- matrix H the n x [mu]-matrix [??] and [??] by

[[??][??]] := [HP.sup.T].

Then H is A-related if and only if the following holds

[??] = 0 and [??][??] is s.p.d.,

and then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Proof. 1. Suppose that H is A-related. Then the matrix

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

is s.p.s.d. . Hence [??][??] = 0 and [??][??] is s.p.s.d.

Suppose [??] [not equal to] 0. Then there is a vector [y.sub.2] with [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] [not equal to] 0. Define x := [P.sup.T] y, where [Y.sup.T] = [[y.sup.T.sub.1] [y.sup.T.sub.2]], [y.sub.1] := 0. Then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

but

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

contradicting the A-relatedness of H. Hence [??] = 0. Now suppose that [??][??] is only positive semidefinite but not positive definite. Then there is a vector [y.sub.1] [not equal to] 0 such that [??][??][y.sub.1] = 0. But then [y.sup.T.sub.1] [??] [not equal to] 0, because A has full row rank, and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Hence the corresponding

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

satisfies

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

but

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

again contradicting the A-relatedness of H.

2. Conversely, suppose that P, A and H satisfy the conditions of the proposition and assume [x.sup.T] AH x = 0. Then with [y.sup.T] = [[y.sup.T.sub.1], [y.sup.T.sub.2]] := [x.sup.T][P.sup.T],

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

so that [y.sub.1] = 0. But then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and likewise

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Hence H is A-related.

In the case [mu] = m [less than or equal to] n, we can choose P := I and the proposition reduces to the following corollary, which has a simple direct proof.

COROLLARY 2.2. If the matrix A has full row rank, then H is A-related iff AH is a symmetric positive definite matrix.

Proof. If AH is s.p.d., then [x.sup.T] AHx = 0 gives x = 0. Conversely, if H is A-related, then [x.sup.T] AHx = 0 implies [A.sup.T]x = 0 and, therefore, x = 0 since [A.sup.T] has full column rank. Hence the s.p.s.d. matrix AH is s.p.d..

We noted above that any matrix H of the form H = U[A.sup.T], where U [member of] [R.sup.nxn] is s.p.d., is A-related. The converse is also true:

PROPOSITION 2.3. A matrix H [member of] [R.sup.nxm] is A-related iff it has the form H = U[A.sup.T], where U [member of] R111 is s.p.d..

Proof. We prove the result only for the most important case of a full column rank matrix A,m[greater than or equal to] n = [mu].

Let H be A-related. Since [x.sup.T] AHx = 0 implies [A.sup.T] x = 0 and Hx = 0, [A.sup.T] x = 0 is equivalent to H x = 0, that is H and [A.sup.T] have the form H = U[A.sup.T], [A.sup.T] = VH for some matrices U, V [member of] [R.sup.nxn]. As [A.sup.T] has full column rank n = [mu], so has H. This implies that U is nonsingular and V = [U.sup.-1]. Since AH = AU[A.sup.T] is s.p.s.d. and AU[A.sup.T] = [(AU[A.sup.T]).sup.T] = [AU.sup.T] [A.sup.T], and therefore by the nonsingularity of [AA.sup.T]

[A.sup.T] AU [A.sup.T] A= [A.sup.T] [AU.sup.T] [A.sup.T]A [??] U = [U.sup.T],

showing that U is symmetric, nonsingular and positive semidefinite, and therefore a s.p.d. matrix.

The methods for solving (2.1) will be iterative. The new iterate [x.sub.k+1] is related to the previous iterate by

[x.sub.k+1] = [x.sub.k] + [[alpha].sub.k][p.sub.k] = [x.sub.k] + [y.sub.k],

where the search direction [p.sub.k] is determined by the residual [r.sub.k] = b - [Ax.sub.k] via an A-related n x m matrix [H.sub.k] of rank [mu] by

[p.sub.k] = [H.sub.k][r.sub.k].

We will assume [p.sub.k] [not equal to] 0, because otherwise

[Ap.sub.k] = A[H.sub.k][r.sub.k] = 0 [??] (A[H.sub.k][r.sub.k], [r.sub.k]) = 0,

so that, by the A-relatedness of [H.sub.k], [A.sup.T] [r.sub.k] = 0, that is, [x.sub.k] is a (least-squares) solution of (2.1).

We also note that [p.sub.k] [not equal to] 0 implies [Ap.sub.k] [not equal to] 0, because otherwise, again by A-relatedness,

0 = ([Ap.sub.k], [r.sub.k]) = ([AH.sub.k][r.sub.k], [r.sub.k]) [??] [H.sub.k][r.sub.k] = [p.sub.k] = 0.

Hence the scalar products ([Ap.sub.k], [r.sub.k]), ([Ap.sub.k], [Ap.sub.k]) will be positive.

Now, the stepsize [[alpha].sub.k] is chosen to minimize the Euclidean norm [parallel][r.sub.k+1][parallel]. Thus,

[r.sub.k]+1 = [r.sub.k] - akAPk = [r.sub.k] - [z.sub.k],

where

[[alpha].sub.k] = ([Ap.sub.k], [r.sub.k]), ([Ap.sub.k] [Ap.sub.k]) > 0.

Hence [[alpha].sub.k] is well defined and

([r.sub.k]+l,[z.sub.k]) = 0 = ([r.sub.k]+1, [AH.sub.k][r.sub.k]), ([r.sub.k]+l,[r.sub.k+1]) = ([r.sub.k],[r.sub.k]) - [[absolute value of ([r.sub.k],[Ap.sub.k])].sup.2]/([Ap.sub.k],[Ap.sub.k]) < ([r.sub.k],[r.sub.k])

We update [H.sub.k] using a rank-one correction

[H.sub.k+1] = [H.sub.k] + [u.sub.k][v.sup.T.sub.k], [u.sub.k] [member of] [R.sup.n], [v.sub.k] [member of] [R.sup.m].

Upon invoking the secant condition

[H.sub.k+1] [x.sub.k] = [y.sub.k],

we get the update formula of Broyden type [3],

(2.2) [H.sub.k+1] = [H.sub.k] + ([y.sub.k] - [H.sub.k][z.sub.k])[v.sup.T.sub.k]/([v.sub.k], [z.sub.k]) = [H.sub.k] + [u.sub.k][v.sup.T.sub.k]([v.sub.k], [z.sub.k]), uk := [y.sub.k] - [H.sub.k][z.sub.k],

provided that ([v.sub.k], [z.sub.k]) [not equal to] 0. The updates (2.2) are invariant with respect to a scaling of [v.sub.k]. It can be shown (see [10]) that for m = n the matrix [H.sub.k+1] is nonsingular iff [H.sub.k] is nonsingular and ([v.sub.k], [H.sup.-1.sub.k] [y.sub.k]) [not equal to] 0. If m = n and A is s.p.d., then it is usually recommended to start the method with [H.sub.0] := I and to use [v.sub.k] := [u.sub.k] = [y.sub.k] - [H.sub.k][z.sub.k], which results in the symmetric rank-one method (SRK1) of Broyden [21] and leads to symmetric matrices [H.sub.k]. However, it is known that SRK1 may not preserve the positive definitness of the updates [H.sub.k]. This stimulated the work to overcome this difficulty [22], [16], [24], [15].

When A is nonsymmetric, the good Broyden (GB) updates result from the choice [v.sub.k] := [H.sub.k] yk, while the bad Broyden (BB) updates take [v.sub.k] := [z.sub.k]. For [[alpha].sub.k] := 1 and solving linear systems, Broyden [4] proved the local R-superlinear convergence of GB (that is, the errors [[epsilon].sub.k] := [parallel][x.sub.k] - x* [parallel] [less than or equal to] [[theta].sub.k] are bounded by a superlinearly convergent sequence [[theta].sub.k] [down arrow] 0). In More and Trangenstein [18], global Q-superlinear convergence (i.e. [lim.sub.k[right arrow][infinity]], [[epsilon].sub.k+1]/[[epsilon].sub.k] 0) for linear systems is achieved using a modified form of Broyden's method. For A [member of] [R.sup.nxn], Gay [10] proved that the solution using update (2.2) is found in at most 2n steps. The generation of {[v.sub.k]} as a set of orthogonal vectors was considered by Gay and Schnabel [11]. They proved termination of the method after at most n iterations.

The application of Broyden updates to rectangular systems was first considered by Gerber and Luk [12]. The use of the Broyden updates (2.2) for solving non-Hermitian square systems was considered by Deuflhard et al. [7]. For both GB and BB, they introduced an error matrix and showed the reduction of its Euclidean norm. For GB they provided a condition on the choice of Ho ensuring that all [H.sub.k] remain nonsingular. They also discussed the choice [v.sub.k] := [H.sup.T.sub.k] [H.sub.k][z.sub.k], But in this case, no natural measure for the quality of [H.sub.k] was available, which makes the method not competitive with GB and BB. For GB and BB, they showed the importance of using an appropriate line search to speed up the convergence.

In this paper, we start with an A-related matrix Ho and wish to make sure that all [H.sub.k] remain A-related. Hence the recursion (2.2) should at least preserve the symmetry of the [AH.sub.k]. This is ensured iff

[v.sub.k] = [Au.sub.k]

or equivalently

(2.3) [u.sub.k] = [y.sub.k] - [H.sub.k][z.sub.k] = (I - [H.sub.k]A)[y.sub.k] = [H.sub.k]([[alpha].sub.k][r.sub.k] - [z.sub.k]) = [H.sub.k][t.sub,k], [v.sub.k] - A([y.sub.k] - [H.sub.k][z.sub.k]) = (I - [AH.sub.k])[z.sub.k] = [AH.sub.k][t.sub.k],

where [t.sub.k] is the vector [t.sub.k] := [[alpha].sub.k][r.sub.k] - [z.sub.k].

However, the scheme (2.2), (2.3) may break down when ([v.sub.k], [z.sub.k]) = 0 and, in addition, it may not ensure that [AH.sub.k]+l is again A-related. We will see that both difficulties can be overcome.

3. Ensuring positive definiteness and A-relatedness. For simplicity, we drop the iteration index k and replace k + 1 by a star *. We assume that H is A-related and p = Hr [not equal to] 0, so that (AHr, r) > 0 and (Ap, Ap) > 0. Following Kleinmichel [16] and Spedicato [24], we scale the matrix H by a scaling factor 7 > 0 before updating. This leads to

H* = [gamma]H + [uv.sup.T] /(v, z),

where u = y - [gamma]Hz = H([alpha]r - [gamma]z) = Ht, v = Au = AM and t = [alpha]r - [gamma]z. Therefore,

(3.1) H* = [gamma]H + Ht[(AHt).sup.T]/(AHt, z),

(3.2) AH* = [gamma]AH + AHt[(AHt).sup.T] /(AHt, z).

Then, introducing the abbreviations

[[beta].sub.1] := (AHr, r), [[beta].sub.2] := (AHz, z), [beta]* := (AHr*, r*),

we find

[[beta].sub.1] > 0, [beta]* > 0, [[beta].sub.2] [greater than or equal to] [[beta].sub.1] > 0,

[alpha] (Ap, r)/(Ap, Ap) = (AHr, r)/(AHr, AHr) > 0,

and an easy calculation, using (r*, z) = 0, z = r - r*, shows

[[beta].sub.2] = [[beta].sub.1] + [beta]*

and the following formula for the denominator in (3.2),

(3.3) (v, z) = (AM, z) = ([alpha]AHr - [gamma]AHz, z) = [alpha][[beta].sub.1] - [gamma][[beta].sub.2] = (a - [gamma]) [[beta].sub.1] - [gamma][beta]*.

Since H is A-related, Proposition 2.1 can be applied. Hence, there exist a unitary m x m-matrix P and [mu] x n-matrices [??], [[??].sup.T], such that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[??][??] is symmetric positive definite.

Then (3.1), (3.2) preserve the block structure of H[P.sup.T] and PAH[P.sup.T]. Replacing t by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

we find

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Hence, by Proposition 2.1, AH* is A-related if and only if

(3.4) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

is positive definite.

Since [??][??] is s.p.d. and [gamma] > 0, [([gamma][??][??]).sup.1/2] is well defined and [??][??] satisfies

(3.5) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The matrix U has the eigenvalues 1 (with multiplicity [mu] - 1) and, with multiplicity 1, the Eigenvalue

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

But as is easily seen, ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]) = (AHt, t), so that by (3.3),

(3.6) [phi](gamma]) = 1/[gamma] [alpha]([alpha] - [gamma])[[beta].sub.1]/ [alpha] [[beta].sub.1] - [gamma][[beta].sub.2].

Hence, H* will be A-related, iff [gamma] > 0 satisfies [alpha][[beta].sub.1] - [gamma][[beta].sub.2] [not equal to] 0 and

[alpha]([alpha] - [gamma])[[beta].sub.1]/[alpha][[beta].sub.1] - [gamma][[beta].sub.2] > 0,

that is iff

(3.7) 0 < [gamma] < [alpha][[beta].sub.1]/[[beta].sub.2] = [alpha] [[beta].sub.1]/[[beta].sub.1] + [beta]* or [gamma]>a.

In view of the remark before Corollary 4.5 (see below), the choice 7 = 1 seems to be favorable. By (3.7), this choice secures that H* is A-related unless a, [[beta].sub.1], [beta]* satisfy

(3.8) 1[less than or equal to] [alpha] [less than or equal to] 1 + [beta]*/[[beta].sub.1].

Next we derive bounds for the condition number of AH* and follow the reasoning of Oren and Spedicato [20]. Let [??] := [??][??] and [??]* := [??][??]. Then by (3.4),

[??]* = [gamma][??] + [??][[??].sub.1][([??][[??].sub.1]).sup.T]/(AHt, z),

where we assume that [??] is s.p.d. and 7 satisfies (3.7), so that also [??]* is s.p.d. Then the condition number [kappa]([??]) with respect to the Euclidean norm is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Then by (3.5) for any x [not equal to] 0,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and, similarly,

[x.sup.T][??]*x/[x.sup.T]x [greater than or equal to] [[lambda].sub.min] (U) [gamma] [[lambda].sub.min] ([??])

Now, the eigenvalues of U are 1 and [phi]([gamma]) is given by (3.6). Hence,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

so that, finally,

[kappa]([??]*) [less than or equal to] [PHI])([gamma])[kappa]([??]),

Where

[PHI]([gamma]): = max(1, [phi]([gamma]))/min (1, [phi]([gamma])).

Note that [??]* depends on [gamma], [??]* = [??]*([gamma]). Unfortunately, the upper bound for [kappa]([??]*) just proved is presumably too crude; in particular it does not allow us to conclude that [kappa]([??]*) < [kappa]([??]) for certain values of 7. Nevertheless, one can try to minimize the upper bound by minimizing [PHI]([gamma]) with respect to 7 on the set r of all 7 satisfying (3.6). The calculation is tedious though elementary. With the help of MATHEMATICA one finds for the case [beta]* > 0, that is [[beta].sub.2] > [[beta].sub.1], that 4) has exactly two local minima on [GAMMA] namely

(3.9) [[gamma].sub.+] := [alpha] (1 + [square root of [beta]*/[beta].sub.2]) > [alpha], [[gamma].sub.-] := [alpha] (1 + [square root of [beta]*/[beta].sub.2]) < [alpha], [[beta].sub.1]/ [[beta].sub.2].

Both minima are also global minima, and they satisfy

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

4. The algorithm and its main properties. Now, let [H.sub.0] be any A-related starting matrix, and [x.sub.0] a starting vector. Then the algorithm for solving Ax = b in the least squares sense and computing a sequence of A-related matrices [H.sub.k] is given by:

Algorithm: Let A be a m x n-matrix of maximal rank, b [member of] [R.sup.m], [H.sub.0] be A-related, and [x.sub.0] [member of] [R.sup.n] a given vector.

For k = 0, 1, ...

1. Compute the vectors [r.sub.k] = b - [Ax.sub.k], [p.sub.k] := [H.sub.k][r.sub.k]. If [p.sub.k] = 0 then stop.

2. Otherwise, compute

[[alpha].sub.k] := ([Ap.sub.k], [r.sub.k])/([Ap.sub.k], [Ap.sub.k]), [x.sub.k+1] := [x.sub.k] + [[alpha].sub.k][p.sub.k], [y.sub.k] := [x.sub.k+1] - [x.sub.k], [r.sub.k+1] = [r.sub.k] - [AY.sub.k], [z.sub.k] = [r.sub.k] - [r.sub.k+1].

3. Define

[[beta].sub.1] := ([AH.sub.k][r.sub.k],[r.sub.k]) = ([Ap.sub.k],[r.sub.k]), [beta]* = ([AH.sub.k][r.sub.k]+l,[r.sub.k+1]).

4. Set [[gamma].sub.k] := 1 if (3.8) does not hold.

Otherwise compute any [[gamma].sub.k] > 0 satisfying (3.7), say by using (3.9),

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where eps is the relative machine precision.

Compute [u.sub.k] := [y.sub.k] - [[gamma].subk][H.sub.k][z.sub.k], [v.sub.k] := [Au.sub.k] and

[H.sub.k+1] := [[gamma].sub.k][H.sub.k] + [u.sub.k][v.sup.T.sub.k]/([v.sub.k], [z.sub.k]).

Set k := k + 1 and goto 1.

We have already seen that step 2 is well-defined if [p.sub.k] [not equal to] 0, because then ([Ar.sub.k], [r.sub.k]) > 0 and [Ap.sub.k] [not equal to] 0, if [H.sub.k] is A-related. Moreover, [p.sub.k] = 0 implies [A.sup.T] [r.sub.k] = [A.sup.T] (b - Axk) = 0, i.e., [x.sub.k] is a least-squares solution of (2.1), and the choice of [[gamma].sub.k] in step 2 secures also that [H.sub.k+1] is A-related. Hence, the algorithm is well-defined and generates only A-related matrices [H.sub.k]. This already proves part of the main theoretical properties of the algorithm stated in the following theorem:

THEOREM 4.1. Let A be real m x n-matrix of maximal rank u := min (m, n), [H.sub.0] an A-related real n x m-matrix and [x.sub.0] [member of] [R.sup.n]. Then the algorithm is well-defined and stops after at most p steps: There is a smallest index l < p with pi = 0, and then the following holds:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and in particular [Ax.sub.1] = b if m [less than or equal to] n. Moreover

1. [H.sub.j] is A-related for 0 [less than or equal to] j [less than or equal to] l.

2. [H.sub.k][z.sub.i] = [[gamma].sub.i,k] [y.sub.i] for 0 [less than or equal to] i < k [less than or equal to] l, where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

3. ([r.sub.k],[z.sub.i])= 0 for 0 [less than or equal ot] i < k [less than or equal to]l.

4. ([z.sub.k],[z.sub.i])= 0 for i [not equal to] k [less than or equal to] l.

5. [z.sub.k] [not equal to] 0 for 0 [less than or equal to] k < l.

Proof. Denote by (Pk) the following properties:

1) [H.sub.j] is A-related for 0 [less than or equal to] j [less than or equal to] k.

2) [H.sub.j] [z.sub.i] = -[[gamma].sub.i,j][y.sub.i] for 0 [less than or equal to] i < j [less than or equal to] k.

3) ([r.sub.j], [z.sub.i]) = 0 for 0 [less than or equal to] i < j [less than or equal to] k.

4) ([z.sub.j], [z.sub.i]) = 0 for 0 [less than or equal to] i < j [less than or equal to] k.

5) [z.sub.j] [not equal to] 0 for 0 [less than or equal to] j < k.

By the remarks preceding the theorem, the algorithm is well-defined as long as [p.sub.k] [not equal to] 0 and the matrices [H.sub.k] are A-related. (Pk), 4) and 5) imply that the k vectors [z.sub.i], i = 0, 1, ..., k - 1, are linearly independent and, since [z.sub.i] = [Ay.sub.i] [member of] R(A) and dim R(A) = [mu], property ([P.sub.k]) cannot be true for k > [mu]. Therefore, the theorem is proved, if (PO) holds and the following implication is shown:

([P.sub.k]), [p.sub.k] [not equal to] 0 [??] ([P.sub.k+1]).

1) ([P.sub.k+1]), 1) follows from the choice of [[gamma].sub.k] that [H.sub.k+1] is well-defined and A-related if is [H.sub.k].

2) To prove ([P.sub.k+1]), 2) we first note that

[H.sub.k+1] [z.sub.k] = [y.sub.k]

by the definition of [H.sub.k]. In addition for i < k, ([P.sub.k]) and the definition of -[[gamma].sub.i,k] imply

[H.sub.k+1][z.sub.i] = [[gamma].sub.k][H.sub.k][z.sub.i] = [[gamma].sub.k][[gamma].sub.i,k][y.sub.i] = [[gamma].sub.i,k+1][y.sub.i],

since ([z.sub.k], [z.sub.i]) = 0 and

([AH.sub.k]zk&& [z.sub.i]) = ([z.sub.k], [AH.sub.k][z.sub.i]) = ([z.sub.k], A [[gamma].sub.i,k][y.sub.i]) = ([z.sub.k], [y.sub.i,k][z.sub.i]) = 0.

This proves ([P.sub.k+1]), 2).

3) Since ([r.sub.k+1], [z.sub.k]) = 0, and for i < k,

([r.sub.k+1], [z.sub.i]) = ([r.sub.i+1] + [k.summation over (j=i+1)], [z.sub.j][z.sub.i]) = 0,

because of ([P.sub.k]), 3), 4). This proves ([P.sub.k+1]), 3).

4) By ([P.sub.k+1]), 3) we have for i < k

([z.sub.i], [z.sub.k]) = ([z.sub.i], [r.sub.k] - [r.sub.k+1]) = 0.

Hence ([P.sub.k+1]), 4) holds.

It was already noted that [P.sub.k] [not equal to] 0 implies [z.sub.k] [not equal to] 0. Hence ([P.sub.k+1], 5) holds. This completes the proof of the theorem.

REMARK 4.2. In the case m [greater than or equal to] n = [mu], the algorithm finds the solution [bar.x] := arg [min.sub.x] [parallel]Ax - b[parallel], which is unique. In the case m = [mu] < n, the algorithm finds only some solution x of the many solutions of Ax = b, usually not the interesting particular solution of smallest Euclidean norm. This is seen from simple examples, when for instance the algorithm is started with a solution [x.sub.0] of Ax = b that is different from the least norm solution: the algorithm then stops immediately with x := [x.sub.0], irrespective of the choice of [H.sub.0].

The least squares, minimal norm solution x* of Ax = b can be computed as follows: let [bar.b] be any solution of Ax = b (computed by the algorithm) and compute y* as the least squares solution of [A.sup.T] y = b again by the algorithm (but with A replaced by [A.sup.T] and b by b). Then x* = [A.sup.T] y* is the least squares solution of Ax = b with minimal norm.

We also note that the algorithm of this paper belongs to the large ABS-class of algorithms (see Abaffy, Broyden and Spedicato [5]). As is shown in [25], also the minimal norm solution of a Ax = b, A [member of] [R.sup.mxn], m = [mu] < n, can be computed by invoking suitable algorithms of the ABS-class twice.

COROLLARY 4.3. Under the hypotheses of Theorem 4.1, the following holds. If the algorithm does not stop prematurely, that is if p = min (m, n) is the first index with [p.sub.l] = 0, l= [mu], then

A[H.sub.[mu]] = [ZDZ.sup.-1], if p=m[greater than or equal to]n, [H.sub.[mu]]A = YD[Y.sup.-1], if p = n [greater than or equal to] m,

with the matrices

Z := [[z.sub.0] [z.sub.1] ... [z.sub.[mu]-1]],

Y:= [[y.sub.0] [y.sub.1] ... [y.sub.[mu]-1]],

D := diag ([[gamma].sub.0,[mu]], [[gamma].sub.1,[mu]], [[gamma].sub.[mu]-1.[mu]]

That is, if [[gamma].sub.k] = 1 for all k, then D = I so that [H.sub.[mu]], is a right-inverse of A if p = m [less than or equal to] n, and a left-inverse if p = n [less than or equal to] m.

Proof. Part 2. of Theorem 4.1 and the definitions of D, Z, Y imply

(4.1) [H.sub.[mu]] Z = YD, AY = Z.

The [mu] columns [z.sub.i] [member of] [R.sup.[mu]] of Z are linearly independent by parts 4. and 5. of Theorem 4.1 (they are even orthogonal to each other), so that by AY = Z also the columns [y.sub.i] [member of] [R.sup.n] of the matrix Y are linearly independent.

Hence, for [mu] = m [less than or equal to] n, [Z.sup.-1] exists and by (4.1)

A[H.sub.[mu]],Z = ZD [??] A[H.sub.[mu]], = ZD[Z.sup.-1],

and for p = n [less than or equal to] m, [Y.sup.-1] exists, which implies by (4.1)

[H.sub.[mu]]Z = [H.sub.[mu]]AY = YD [??][H.sub.[mu]]A = Y[DY.sup.-1].

As a byproduct of the corollary we note

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

If D = I (i.e., [[gamma].sub.k]k = 1 for all k) these formulae reduce to

A[H.sub.[mu]]A = A, [H.sub.[mu]]A[H.sub.[mu]] = [H.sub.[mu]].

Since then, for [mu] = n [less than or equal to] m, in addition, both [H.sub.[mu]]A = [I.sub.m] and A[H.sub.[mu]] are symmetric, [H.sub.[mu]] must be the Moore-Penrose pseudoinverse [A.sup.+] of A.

REMARK 4.4. Consider the special case [mu] = m = n of a nonsingular matrix A. Then, A[H.sub.[mu]] is s.p.d. and, by the corollary, A[H.sub.[mu]] = [ZDZ.sup.-1], so that the eigenvalues of A[H.sub.[mu]] are just the diagonal elements of D. Hence its condition number is r.(A[H.sub.[mu]]) = r.(D). Since the algorithm chooses the scaling parameter [[gamma].sub.k] = 1 as often as possible, A[H.sub.n] will presumably have a relatively small condition number even if D [not equal to] I.

Theorem 4.1, part 3. implies a minimum property of the residuals:

COROLLARY 4.5. For k < l, the residuals satisfy

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Proof. By the algorithm

[[alpha].sub.i][Ap.sub.i] = [[alpha].sub.i][AH.sub.i][r.sub.i] = [z.sub.i],

so that

[r.sub.k+1] = [r.sub.0] - [k.summation over (i=0)] [[alpha].sub.i][Ap.sub.i] = [r.sub.0] [k.summation over (i=0)] [z.sub.i].

Hence

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

is a convex quadratic function with minimum [[lambda].sub.i] = [[alpha].sub.i], i = 0, 1, ..., k, since

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

by Theorem 4.1, part 3.

We note that the space [[[z.sub.0], [z.sub.i], ..., [z.sub.k]]] is closely related to the Krylov space

[K.sub.k+1] ([AH.sub.0], [r.sub.0]) := [[[r.sub.0], [AH.sub.0][r.sub.0], ..., [([AH.sub.0]).sup.k][r.sub.0]]] [subset] [R.sup.m]

generated by the matrix [AH.sub.0] and the initial residual [r.sub.0]:

THEOREM 4.6. Using the notation of Theorem 4.1, the following holds for 0 [less than or equal to] k < l

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

that is [z.sub.0], [z.sub.1], ..., [z.sub.k] is an orthogonal basis of [AH.sub.0][K.sub.k+1] ([AH.sub.0], [r.sub.0]) and

dim [AH.sub.0] [K.sub.k+1] ([AH.sub.0], [r.sub.0]) = k + 1.

Proof. The assertions are true for k = 0, since

[z.sub.0] = [Ay.sub.0] = [[alpha].sub.0][AH.sub.0][r.sub.0] [member of] [[[AH.sub.0][r.sub.0]].

Now

[AH.sub.k][z.sub.k] [member of] [[[AH.sub.0][r.sub.0], ..., [([AH.sub.0]).sup.k+2][r.sub.0]]], 0 [less than or equal to] k < l,

[r.sub.k+1] = [r.sub.k] - [z.sub.k], [z.sub.k][[alpha].sub.k][AH.sub.k][r.sub.k] [member of] [[[AH.sub.k][r.sub.k]]], [[alpha].sub.k] [not equal to] 0.

For any vector u [member of] [R.sup.m]

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

5. Comparison with other methods and numerical results. The numerical tests are concentrated to the cases m = n + 1 and m = n for the following two reasons: The first is that it is harder to solve least squares problems with n [less than or equal to] m with n close to m than those with n [much less than] m. The second is that in the case m = n, which is equivalent to solving a linear equation Ax = b with a nonsingular and perhaps nonsymmetric matrix A, there are many more competing iterative methods, such as CGN, CGS, GMRES than the rank-one ("RK1") method of this paper and the rank-2 methods ("RK2") of [17] that solve Ax = b only indirectly by solving [A.sup.T]Ax = [A.sup.T]b. In all examples we take [log.sub.10] [parallel][r.sub.k][parallel] as measure of the accuracy.

EXAMPLE 5.1. Here we use that the algorithm of this paper is defined also for complex matrices A. As in [17] we consider rectangular complex tridiagonal matrices A = ([a.sub.jk]) [member of] [C.sup.mxn] that depend on two real parameters q and f and are defined by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

In particular, we applied RK1 (starting with [x.sub.0] := 0 and [H.sub.0] := [A.sup.H]) to solve the least squares problem with m = 31, n = 30, q = 0.1 and f = 1. The iteration was stopped as soon as

[re.sub.k] := [parallel][r.sub.k][parallel] [parallel][r.sub.0][parallel] [less than or equal to] [epsilon], [epsilon] := [10.sup.-3].

RKl stopped after 24 iterations. Figure 5.1 (plotting [log.sub.10] [re.sub.k] versus k) shows the typical behavior observed with cg-type methods (cf. Theorem 4.1).

At termination, RK1 produced only an approximation H* of the pseudoinverse of A, since RKl stopped already after 24 < 30 = n iterations. When [H.sub.0] := H* was then used to solve by RK1 the system Ax = b' with a new right-hand side b' [not equal to] b (but the same starting vector [x.sub.0] = 0 and precision e = [10.sup.-3]), RK1 already stopped after 9 iterations (see the plot of Figure 5.2).

Other choices of the parameters q and f lead to least squares systems Ax = b, where RKl (started with [H.sub.0] := [A.sup.T]) did not stop prematurely: it needed n iterations and thus terminated with a final H* very close to [A.sup.+]. Not surprisingly, RKl (started with [H.sub.0] := H*) then needed only one iteration to solve the system Ax = b' with a different right-hand side b' [not equal to] b..

[FIGURE 5.1 OMITTED]

[FIGURE 5.2 OMITTED]

The test results of [17] for RK2 were similar. Since RK1 needs only half the storage and number of operations as RK2, RK1 is preferable.

EXAMPLE 5.2. This major example refers to solving square systems Ax = [b.sub.i], i = 1, 2, . .., with many right hand sides bi. It illustrates the use of the final matrices HW produced by RK1 when solving Ax = bi as starting matrix [H.sub.0] := [H.sub.(i)] of RK1 for solving Ax = [b.sub.i+1].

We consider the same evolution equation as in [17],

[u.sub.t] + a * [u.sub.x] + b * [u.sub.y] = [u.sub.xx] + [u.sub.yy] + f (x, y, t), 0 [less than or equal to] t [less than or equal to]T, 0 [less than or equal to] x, y [less than or equal to] 1,

Where

f (x, y, t) := [e.sup.-[lambda]t] [(-[lambda] + 2[[pi].sup.2]) sin [pi] x sin [pi]y + [pi] (a cos [pi]x sin [pi]y + b sin [pi]x cos [pi]y)],

with the initial and boundary conditions

u (x, y, 0) = sin [pi]x sin [pi]y, 0 [less than or equal to] X, Y [less than or equal to] 1,

u(x, y, t)= 0, fort > 0, (x, y) [member of] [partial derivative][0,1] x [0,1].

Its exact solution is

u(x, y, t) = [e.sup.-[lambda]t] sin [pi]x sin [pi]y.

The space-time domain is discretized by a grid with space step [DELTA]x = [DELTA]y = h = 1/N and time step [DELTA]t = [tau] > 0. Let us denote the discretized solution at time level [n.sub.[tau]] by U, and at level (n + 1)[tau] by V. Application of the Crank-Nicolson technique using central differences to approximate the first order derivatives and the standard second order differences to approximate the second derivatives yields the following scheme (we use the abbreviations [beta] := [tau]([2h.sup.2]), [gamma] := [tau]/(4h)) for the transition U [right arrow] V:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where 2 [less than or equal to] i, j [less than or equal to] N - 1. This scheme is known to be stable and of order O([[tau].sup.2], [h.sup.2]). V is obtained form U by solving a linear system of equations AV = C(U) with a nonsymmetric penta-diagonal matrix A and a right hand side C(U) depending linearly on U. This linear system may be solved directly by using an appropriate modification of Gauss elimination. However, to demonstrate the use of our updates, we solve these equations for given U iteratively starting for the first time step with [H.sub.0] := [A.sup.T] . For the next time steps, we compare the results for the rank-two method in [17] with the rank-one method of this paper.

For N = 35 (corresponding to more than 1000 unknowns at each time level) and [DELTA]t = [tau] = 0.01, a = 10, b = 20, [lambda] = 1 the solution of the problem up to the time t = 5 x [DELTA]t requires the following numbers of iterations in order to achieve a relative residual norm < 0.0001:

RK1: 158 123 98 91 62 RK2: 158 123 109 87 74

These results show a comparable number of iterations for both methods. But again, the storage and the number of operations for RKl is half that required for RK2.

The next examples also refer to the solution of systems Ax = b with a square nonsingular n x n-matrix A. Here we follow the ideas of Nachtigal et al. in [19], who compared three main iterative methods for the solution of such systems. These are CGN, GMRES and CGS. Examples of matrices were given for which each method outperforms the others by a factor of size O([square root of n]) or even O(n). These examples are used to compare our algorithm ("RK1") with these three methods. In all examples n = 40, except Examples 5.3 and 5.6, where n = 400. We take [log.sub.10] [parallel][r.sub.k][parallel] as a measure of the accuracy. Note that RK1 solves Ax = b only indirectly via the equivalent system [A.sup.T] Ax = [A.sup.T] b..

EXAMPLE 5.3. In this example all methods make only negligible progress until step n. Here

A := diag (1, 4, 9, ..., [n.sup.2]),

is diagonal but represents a normal matrix with troublesome eigenvalues and singular values. The results show that both GMRES and RKl are leading.

EXAMPLE 5.4. This is an example where both CGN and RK1 are leading. Here A is taken as the unitary n x n-matrix

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

EXAMPLE 5.5. Here

A:= diag ([d.sub.1], [d.sub.2],..., [d.sub.n])

where {[d.sub.j]} denote the set of Chebyshev points scaled to the interval [1,[kappa]], where

[e.sub.j]:= cos(j-1)[pi]/n-1,

[d.sub.j] := 1 + ([e.sub.j] + 1) ([kappa] - 1) /2,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

A has condition number O(n) and smoothly distributed entries. In this example CGS wins and both CGN and RK1 are bad.

EXAMPLE 5.6. Here A is the block diagonal matrix

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

with

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

which ensures a bad distribution of singular values. Here both CGS and GMRES are leading, but RK1 outperforms CGN.

EXAMPLE 5.7. A is chosen as in Example 5.6, but with

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

This choice of [M.sub.j] causes failure of CGS in the very first step. GMRES is leading and RK1 outperforms CGN.

EXAMPLE 5.8. A is taken as in Example 5.6, but with

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [[gamma].sub.j] := ([[kappa].sup.2] + 1- [e.sup.2.sub.j] - [[kappa].sup.2]/[e.sup.2.sub.j]).sup.1/2] and [e.sub.j] and r. are defined as in Example 5.3. This leads to fixed singular values and varying eigenvalues.

Here RK1 and CGN lead and both are much better than CGS and GMRES.

EXAMPLE 5.9. A is taken as in Example 5.6, but with

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

This matrix is normal and has eigenvalues [+ or -]i and singular value 1. Here both RK1 and CGN require only one iteration, GMRES requires two while CGS fails.

The numerical results for Examples 5.1-5.7 are summarized in the next table listing the number of iterations needed for convergence (tested by [log.sub.10] [parallel][r.sub.k][parallel] [less than or equal to] -10). The number of iterations was limited to 50, and if a method failed to converge within this limit then the final accuracy is given by the number following \, e.g., \e-2 indicates that the final accuracy was [parallel][r.sub.50][parallel] [approximately equal to] [10.sup.-2].

We note that the residuals [parallel][r.sub.k][parallel] behave erratically for CGN. Its performance may be improved if one uses [[bar.r].sub.0] := [A.sup.T] [r.sub.0] rather than [[bar.r].sub.0] := [r.sub.0] as taken in [19]. RK1 has common features with CGN: both perform excellently in Examples 5.2, 5.6 and 5.7, but RK1 avoids the bad performance of CGN in Examples 5.4, 5.5. RK1 never exceeded the maximum number of iterations.

No preconditioning was tried. It is obvious that if Examples 5.1 and 5.3 are scaled (to achieve a unit diagonal), one reaches the solution immediately within one iteration.

The comparison between these methods may include the number of matrix vector multiplications, of arithmetical operations, storage requirements and general rates of convergence. However, we chose only the number of iterations and the reduction of the norm of the residual as in [19] as indicators of performance. After comparing different iterative algorithms, Nachtigal et al. [19] concluded that it is always possible to find particular examples where one method performs better than others: There is not yet a universal iterative solver which is optimal for all problems.

6. Implementational issues and further discussion. There are several ways to realize the Algorithm. We assume that [H.sub.0] = [A.sup.T] and that A is sparse, but [H.sub.k] (k > 0) is not. Then a direct realization of the Algorithm requires storage of a matrix H [member of] [R.sup.nxm], and in each iteration, computation of 2 new matrix-vector products of the type [H.sub.k]w (neglecting products with the sparse matrices A and [H.sub.0] = [A.sup.T]) and updating of [H.sub.k] to [H.sub.k+1] by means of the dyadic product [u.sub.k][v.sup.T.sub.k] [member of] [R.sup.nxm] in Step 4 of the Algorithm. This requires essentially 3nm arithmetic operations (1 operation = 1 multiplication + 1 addition).

The arithmetic expense can be reduced by using the fact that any [H.sub.k] has the form [H.sub.k] [U.sub.k] [A.sup.T] (see Propositon 2.3), where [U.sub.k] [member of] [R.sup.nxn] is updated by

[U.sub.k+1] = [[gamma].sub.k][U.sub.k] + [u.sub.k][u.sup.T.sub.k] / ([Au.sub.k], [z.sub.k])

and each product [H.sub.k]w is computed as [U.sub.k]([A.sup.T]W). This scheme requires storage space for a matrix U [member of] [R.sup.nxn] and only 3n2 operations/iteration, which is much less than 3nm if n [much less than] m.

Finally, if the Algorithm needs only few iterations, say at most k [much less than] n, one may store only one set of vectors [u.sub.i], i [less than or equal to] k, explicitly (not two as with the rival method RK2 of [17]) and use the formula (supposing that [H.sub.0] = [A.sup.T])

(6.1) [H.sub.k+1] = [[gamma].sub.0][[gamma].sub.1] ... [[gamma].sub.k] (I + [k.summation over (i=0)] [u.sub.i][U.sup.T.sub.i]/ ([Au.sub.i], [z.sub.i])[[gamma].sub.i])[A.sup.T].

when computing products like [H.sub.k][z.sub.k] and [H.sub.k+1]rk+1 in the Algorithm. Using (6.1) to compute Hkw requires essentially the computation of 2 matrix-vector products of the form

[[bar.U].sup.T.sub.k]p, [[bar.U].sub.k]q

with the matrix

[[bar.U].sub.k] := [[u.sub.0], ..., [u.sub.k-1]] [member of] [R.sup.nxk],

which requires 2kn operations to compute a product [H.sub.k]w. This again is half the number of operations as for the rank-2 methods in [17].

Our new update is connected to the previous one in the same way the SRK1 update is related to the DFP-update as shown by Fletcher [9] and Brodlie et al. [2]. We may define general updating expressions by

[H.sub.k+1] = [gamma][H.sub.k] + [u.sub.k] [([Su.sub.k]).sup.T] / ([Su.sub.k], [z.sub.k]).

This encompasses the case when A is s.p.d., which requires S = I and [H.sub.0] = I, as well as the more general case in which S = A: this would require [H.sub.0] = [A.sup.T] or a better starting matrix, which secures that [AH.sub.0] is s.p.d..

An alternative procedure is to update both [H.sub.k] and the matrices [D.sub.k] := [AH.sub.k] using

[D.sub.k] = [D.sub.k] + [v.sub.k][v.sup.T.sub.k]/([v.sub.k], [z.sub.k])

This allows us to monitor the accuracy of the approximate inverse [H.sub.k] by checking the size of [D.sub.k] - I. For reasons of economy, one could monitor only the diagonal or some row of [D.sub.k].

The algorithm may be appropriate for solving initial value problems for partial differential equations as shown for rank-two updates in [17] and, for rank-one updates, in Example 5.2. In such problems, we start with the given initial value and iterate to find the solution for the next time level. The solution is considered to be acceptable if its accuracy is comparable to the discretization error of the governing differential equations. The final updated estimate of the inverse obtained for the present time level is then taken as the initial estimate of the inverse for the next time step.

* Received October 9, 2006. Accepted for publication September 3, 2007. Published online May 19, 2008. Recommended by M. Hochbruck.

REFERENCES

[1] C. BREZINSKI, M. REDIVO-ZAGLIA, AND H. SADOK, New look-ahead Lanczos-type algorithms for linear systems, Numer. Math, 83 (1999), pp. 53-85.

[2] K. W. BRODLIE, A. R. GOURLAY, AND J. GREENSTADT, Rank-one and rank-two corrections to positive definite matrices expressed in product form, J. Inst. Math. Appl., 11 (1973), pp. 73-82.

[3] C. G BROYDEN, A class of methods for solving nonlinear simultaneous equations, Math. Comput.,19 (1965), pp. 577-593.

[4] C, G. BROYDEN, The convergence of single-rank quasi-Newton methods, Math. Comput., 24 (1970), pp. 365382.

[5] J. ABAFFY, C. G BROYDEN, AND E. SPEDic[A.sup.T]O,A class ofdirectmethods for linear systems, Numer. Math. 45 (1984), pp. 361-376.

[6] J. Y. CHEN, D. R. KINCAID, AND D. M. YOUNG, Generalizations and modifications of the GMRES iterative method Numer. Algorithms, 21 (1999), pp. 119-146.

[7] P. DEUFLHARD, R. FREUND, AND A. WALTER, Fast secant methods for the iterative solution of large nonsymmetric linear systems, Impact Comput. Sci. Eng., 2 (1990), pp. 244-276.

[8] T. EIROLA AND O. NEVANLINNA, Accelerating with rank-one updates, Linear Algebra Appl., 121 (1989), pp. 511-520.

[9] R. FLETCHER, A new approach to variable metric algorithms, Computer J., 13 (1970), pp. 317-322.

[10] D. M. GAY, Some convergence properties of Broyden's method, SIAM J. Numer. Anal., 16 (1979), pp. 623630.

[11] D. M. GAY AND R. B. SCHNABEL, Solving systems of nonlinear equations by Broyden's method with projected updates, in Nonlinear Programming, O. L. Mangasarian, R. R. Mayer, and S. M. Robinson, eds., Academic Press, New York, 1973, pp. 245-281.

[12] R. R. GERBER AND F. J. LUK, A generalized Broyden's method for solving simultaneous linear equations, SIAM J. Numer. Anal., 18 (1981), pp. 882-890.

[13] M. R. HESTENES AND E. STIEFEL, Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Standards, 49 (1952), pp. 409-436.

[14] I. ICHIM, Anew solution for the least squares problem, Int. J. Computer Math., 72 (1999), pp 207-222.

[15] C. M. IP AND M. J. TODD, Optimal conditioning and convergence in rank one quasi-Newton updates, SIAM J. Numer. Anal., 25 (1988), pp. 206-221.

[16] H. KLEINMICHEL, Quasi-Newton Verfahren vom Rang-Eins-Typ zur Losung unrestringierter Minimierungsprobleme, Teil 1. Verfahren and grundlegende Eigenschaften, Numer. Math., 38 (1981), pp.219-228.

[17] A. MOHSEN AND J. STOER, A variable metric method for approximating inverses of matrices, ZAMM, 81 (2001), pp. 435-446.

[18] J. J. MORE AND J. A. TRANGENSTEIN, On the global convergence of Broyden's method, Math. Comput., 30 (1976), pp. 523-540.

[19] N. M. NACHTIGAL, S. C. REDDY, AND L. N. TREFETHEN, HOW fast are nonsymmetric matrix iterations?, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 778-795.

[20] S. S. OREN AND E. SPEDICATO, Optimal conditioning of self-scaling variable metric algorithms, Math. Program., 10 (1976), pp. 70-90.

[21] J. D. PEARSON, Variable metric methods of minimization, Comput. J., 12 (1969), pp. 171-178.

[22] M. J. D. POWELL, Rank one method for unconstrained optimization, in Integer and Nonlinear Programming, J. Abadie, ed., North Holland, Amsterdam, 1970, pp. 139-156.

[23] Y. SAAR AND H. A VAN DER VORST, Iterative solution of linear systems in the 20th century, J. Comput. Appl. Math., 123 (2000), pp. 1-33.

[24] E. SPEDICATO A class of rank-one positive definite quasi-Newton updates for unconstrained minimization, Optimization, 14 (1983), pp. 61-70.

[25] E. SPEDICATO, M. BONOMI, AND A. DEL POPOLO, ABS solution of normal equations of second type and application to the primal-dual interior point method for linear programming, Technical report, Department of Mathematics, University of Bergamo, Bergamo, Italy, 2007.

A. MOHSEN ([dagger]) AND J. STOER ([double dagger])

([dagger]) Department of Engineering Mathematics and Physics, Cairo University, Giza 12211, Egypt (amohsen@enq. cu. edu. eq). The work of this author was supported by the Alexander von Humboldt Stiftung, Germany.

([double dagger]) Institut fur Mathematik, Universitat Wurzburg, Am Hubland, D-97074 Wurzburg, Germany (jstoer@mathematik.uni-wuerzburg.de).

[A.sup.T] Ax = [A.sup.T] b.

Here we suppose that the matrix A [member of] [R.sup.mxn] has maximal rank [mu] := min (m, n). In the case m = n this amounts to solving a nonsingular system Ax = b with a perhaps nonsymmetric matrix A by means of solving [A.sup.T] Ax = [A.sup.T]b. When A is symmetric positive definite (s.p.d.), the classical conjugate gradient method (cg) of Hestenes and Stiefel [13] belongs to the most powerful iterative methods. For solving general rectangular systems, a number of cg-type methods have been developed, see for instance [23]. In general solving such systems is more difficult than in the case of s.p.d. A. In this paper, we use rank-one updates to find approximations of the pseudoinverse of A, which, for instance, may be used as preconditioners for solving such systems with multiple right hand sides.

The use of a rank-one update to solve square systems with A [member of] [R.sup.nxn] was studied by Eirola and Nevanlinna [8]. They invoked a conjugate transposed secant condition without line search. They proved that their algorithm terminates after at most n iteration steps, and if n steps are needed, then [A.sup.-1] is obtained.

The problem was recently considered by Ichim [14] and by Mohsen and Stoer [17]. In [17], starting from an approximation Ho for the pseudoinverse [A.sup.+], or from [A.sup.T] when such an approximation is not available, their method uses rank-2 updates to generate a sequence of n x m-matrices [H.sub.k] approximating [A.sup.+]. Two rank-2 updates were proposed. One is related to the DFP update and the other to the BFGS update. Numerical results showed that both methods of [17], when used for the case m = n, give a better accuracy than the other rank-one update methods.

On the other hand, rank-one updating has attractive features in terms of computational requirements (lower number of operations, less storage). In this paper, rank-one updates (of Broyden's type) for matrices [H.sub.k] are computed so that, in rather general situations, the sequence [H.sub.k] terminates with the pseudoinverse in no more than [mu] steps. The properties of the algorithm are established. For square systems, m = n, also the new method is compared with the methods CGN, CGS and GMRES, which try to solve Ax = b directly.

An alternative algorithm utilizing the s.p.d. matrices Dk := [AH.sub.k] is proposed. It has the advantage that it can monitor the accuracy as [D.sub.k] [right arrow] [I.sub.m]. However, we then have to update two matrices (but note that the [D.sub.k] are symmetric).

2. Rank-one updating. We consider the solution of the linear system

(2.1) Ax =b

in the least squares sense: [bar.x] "solves" (2.1) if [parallel]b - A[bar.x][parallel]b = [min.sub.x] [parallel]b - Ax[parallel] that is if [A.sup.T] (b - A[bar.x]) = 0. We assume that A [member of] [R.sup.mxn] has maximal rank, rk A = [mu] := min (m, n). Here and in what follows, [parallel]x[parallel] := [(x, x).sup.1/2] is the Euclidean norm and (x, y) := [X.sup.T]y the standard scalar product in [R.sup.m]. The vector space spanned by the vectors [u.sub.i] [member of] [R.sup.m], i = 1,2, ..., k, is denoted by

[[[u.sub.1], [u.sub.2], ..., [u.sub.k]]]

We note already at this point that all results of this paper can be extended to complex linear systems with a complex matrix A [member of] [C.sup.mxn]. One only has to define the inner product in [C.sup.m] by

(x, y) := [x.sup.H]y.

Also the operator [sup.T] then has to be replaced by [sup.H] and the word "symmetric" by "Hermitian".

We call an n x m-matrix H A-related if AH is symmetric positive semidefinite (s.p.s.d.) and [x.sup.T] AH x = 0 implies that [x.sup.T] A = 0 and Hx = 0. Clearly, H := [A.sup.T] is A-related, the pseudoinverse [A.sup.+] (in the Moore Penrose sense) is A-related (this can be shown using the singular value decomposition of A and [A.sup.+]); if A is s.p.d. then H = I is A-related. It is easily verified that any matrix H of the form H = U[A.sup.T], where U [member of] [R.sup.nxn] is s.p.d., is A-related.

This concept will be central since our algorithms aim to generate A-related matrices [H.sub.k] which approximate [A.sup.+]. It also derives its interest from the special case when A is a nonsingular, perhaps nonsymmetric, square matrix. It is easily verified that a matrix H is A-related if and only if AH is symmetric positive definite. This means that H can be used as a right preconditioner to solve Ax = b by means of solving the equivalent positive definite system AH y = b, x = Hy, using the cg-algorithm. Then the preconditioner H will be the better the smaller the condition number of AH is. So the algorithms of this paper to find A-related matrices [H.sub.k] with good upper and lower bounds on the nonzero eigenvalues of [AH.sub.k] can be viewed as algorithms for computing good preconditioners [H.sub.k] in the case of a full-rank A.

The following proposition gives another useful characterization of A-related matrices: PROPOSITION 2.1. Let P be any unitary m x m-matrix satisfying

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [??] is a [mu] x n-matrix of rank [mu] = min (m, n) = rk A. Define for an n x m- matrix H the n x [mu]-matrix [??] and [??] by

[[??][??]] := [HP.sup.T].

Then H is A-related if and only if the following holds

[??] = 0 and [??][??] is s.p.d.,

and then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Proof. 1. Suppose that H is A-related. Then the matrix

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

is s.p.s.d. . Hence [??][??] = 0 and [??][??] is s.p.s.d.

Suppose [??] [not equal to] 0. Then there is a vector [y.sub.2] with [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] [not equal to] 0. Define x := [P.sup.T] y, where [Y.sup.T] = [[y.sup.T.sub.1] [y.sup.T.sub.2]], [y.sub.1] := 0. Then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

but

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

contradicting the A-relatedness of H. Hence [??] = 0. Now suppose that [??][??] is only positive semidefinite but not positive definite. Then there is a vector [y.sub.1] [not equal to] 0 such that [??][??][y.sub.1] = 0. But then [y.sup.T.sub.1] [??] [not equal to] 0, because A has full row rank, and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Hence the corresponding

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

satisfies

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

but

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

again contradicting the A-relatedness of H.

2. Conversely, suppose that P, A and H satisfy the conditions of the proposition and assume [x.sup.T] AH x = 0. Then with [y.sup.T] = [[y.sup.T.sub.1], [y.sup.T.sub.2]] := [x.sup.T][P.sup.T],

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

so that [y.sub.1] = 0. But then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and likewise

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Hence H is A-related.

In the case [mu] = m [less than or equal to] n, we can choose P := I and the proposition reduces to the following corollary, which has a simple direct proof.

COROLLARY 2.2. If the matrix A has full row rank, then H is A-related iff AH is a symmetric positive definite matrix.

Proof. If AH is s.p.d., then [x.sup.T] AHx = 0 gives x = 0. Conversely, if H is A-related, then [x.sup.T] AHx = 0 implies [A.sup.T]x = 0 and, therefore, x = 0 since [A.sup.T] has full column rank. Hence the s.p.s.d. matrix AH is s.p.d..

We noted above that any matrix H of the form H = U[A.sup.T], where U [member of] [R.sup.nxn] is s.p.d., is A-related. The converse is also true:

PROPOSITION 2.3. A matrix H [member of] [R.sup.nxm] is A-related iff it has the form H = U[A.sup.T], where U [member of] R111 is s.p.d..

Proof. We prove the result only for the most important case of a full column rank matrix A,m[greater than or equal to] n = [mu].

Let H be A-related. Since [x.sup.T] AHx = 0 implies [A.sup.T] x = 0 and Hx = 0, [A.sup.T] x = 0 is equivalent to H x = 0, that is H and [A.sup.T] have the form H = U[A.sup.T], [A.sup.T] = VH for some matrices U, V [member of] [R.sup.nxn]. As [A.sup.T] has full column rank n = [mu], so has H. This implies that U is nonsingular and V = [U.sup.-1]. Since AH = AU[A.sup.T] is s.p.s.d. and AU[A.sup.T] = [(AU[A.sup.T]).sup.T] = [AU.sup.T] [A.sup.T], and therefore by the nonsingularity of [AA.sup.T]

[A.sup.T] AU [A.sup.T] A= [A.sup.T] [AU.sup.T] [A.sup.T]A [??] U = [U.sup.T],

showing that U is symmetric, nonsingular and positive semidefinite, and therefore a s.p.d. matrix.

The methods for solving (2.1) will be iterative. The new iterate [x.sub.k+1] is related to the previous iterate by

[x.sub.k+1] = [x.sub.k] + [[alpha].sub.k][p.sub.k] = [x.sub.k] + [y.sub.k],

where the search direction [p.sub.k] is determined by the residual [r.sub.k] = b - [Ax.sub.k] via an A-related n x m matrix [H.sub.k] of rank [mu] by

[p.sub.k] = [H.sub.k][r.sub.k].

We will assume [p.sub.k] [not equal to] 0, because otherwise

[Ap.sub.k] = A[H.sub.k][r.sub.k] = 0 [??] (A[H.sub.k][r.sub.k], [r.sub.k]) = 0,

so that, by the A-relatedness of [H.sub.k], [A.sup.T] [r.sub.k] = 0, that is, [x.sub.k] is a (least-squares) solution of (2.1).

We also note that [p.sub.k] [not equal to] 0 implies [Ap.sub.k] [not equal to] 0, because otherwise, again by A-relatedness,

0 = ([Ap.sub.k], [r.sub.k]) = ([AH.sub.k][r.sub.k], [r.sub.k]) [??] [H.sub.k][r.sub.k] = [p.sub.k] = 0.

Hence the scalar products ([Ap.sub.k], [r.sub.k]), ([Ap.sub.k], [Ap.sub.k]) will be positive.

Now, the stepsize [[alpha].sub.k] is chosen to minimize the Euclidean norm [parallel][r.sub.k+1][parallel]. Thus,

[r.sub.k]+1 = [r.sub.k] - akAPk = [r.sub.k] - [z.sub.k],

where

[[alpha].sub.k] = ([Ap.sub.k], [r.sub.k]), ([Ap.sub.k] [Ap.sub.k]) > 0.

Hence [[alpha].sub.k] is well defined and

([r.sub.k]+l,[z.sub.k]) = 0 = ([r.sub.k]+1, [AH.sub.k][r.sub.k]), ([r.sub.k]+l,[r.sub.k+1]) = ([r.sub.k],[r.sub.k]) - [[absolute value of ([r.sub.k],[Ap.sub.k])].sup.2]/([Ap.sub.k],[Ap.sub.k]) < ([r.sub.k],[r.sub.k])

We update [H.sub.k] using a rank-one correction

[H.sub.k+1] = [H.sub.k] + [u.sub.k][v.sup.T.sub.k], [u.sub.k] [member of] [R.sup.n], [v.sub.k] [member of] [R.sup.m].

Upon invoking the secant condition

[H.sub.k+1] [x.sub.k] = [y.sub.k],

we get the update formula of Broyden type [3],

(2.2) [H.sub.k+1] = [H.sub.k] + ([y.sub.k] - [H.sub.k][z.sub.k])[v.sup.T.sub.k]/([v.sub.k], [z.sub.k]) = [H.sub.k] + [u.sub.k][v.sup.T.sub.k]([v.sub.k], [z.sub.k]), uk := [y.sub.k] - [H.sub.k][z.sub.k],

provided that ([v.sub.k], [z.sub.k]) [not equal to] 0. The updates (2.2) are invariant with respect to a scaling of [v.sub.k]. It can be shown (see [10]) that for m = n the matrix [H.sub.k+1] is nonsingular iff [H.sub.k] is nonsingular and ([v.sub.k], [H.sup.-1.sub.k] [y.sub.k]) [not equal to] 0. If m = n and A is s.p.d., then it is usually recommended to start the method with [H.sub.0] := I and to use [v.sub.k] := [u.sub.k] = [y.sub.k] - [H.sub.k][z.sub.k], which results in the symmetric rank-one method (SRK1) of Broyden [21] and leads to symmetric matrices [H.sub.k]. However, it is known that SRK1 may not preserve the positive definitness of the updates [H.sub.k]. This stimulated the work to overcome this difficulty [22], [16], [24], [15].

When A is nonsymmetric, the good Broyden (GB) updates result from the choice [v.sub.k] := [H.sub.k] yk, while the bad Broyden (BB) updates take [v.sub.k] := [z.sub.k]. For [[alpha].sub.k] := 1 and solving linear systems, Broyden [4] proved the local R-superlinear convergence of GB (that is, the errors [[epsilon].sub.k] := [parallel][x.sub.k] - x* [parallel] [less than or equal to] [[theta].sub.k] are bounded by a superlinearly convergent sequence [[theta].sub.k] [down arrow] 0). In More and Trangenstein [18], global Q-superlinear convergence (i.e. [lim.sub.k[right arrow][infinity]], [[epsilon].sub.k+1]/[[epsilon].sub.k] 0) for linear systems is achieved using a modified form of Broyden's method. For A [member of] [R.sup.nxn], Gay [10] proved that the solution using update (2.2) is found in at most 2n steps. The generation of {[v.sub.k]} as a set of orthogonal vectors was considered by Gay and Schnabel [11]. They proved termination of the method after at most n iterations.

The application of Broyden updates to rectangular systems was first considered by Gerber and Luk [12]. The use of the Broyden updates (2.2) for solving non-Hermitian square systems was considered by Deuflhard et al. [7]. For both GB and BB, they introduced an error matrix and showed the reduction of its Euclidean norm. For GB they provided a condition on the choice of Ho ensuring that all [H.sub.k] remain nonsingular. They also discussed the choice [v.sub.k] := [H.sup.T.sub.k] [H.sub.k][z.sub.k], But in this case, no natural measure for the quality of [H.sub.k] was available, which makes the method not competitive with GB and BB. For GB and BB, they showed the importance of using an appropriate line search to speed up the convergence.

In this paper, we start with an A-related matrix Ho and wish to make sure that all [H.sub.k] remain A-related. Hence the recursion (2.2) should at least preserve the symmetry of the [AH.sub.k]. This is ensured iff

[v.sub.k] = [Au.sub.k]

or equivalently

(2.3) [u.sub.k] = [y.sub.k] - [H.sub.k][z.sub.k] = (I - [H.sub.k]A)[y.sub.k] = [H.sub.k]([[alpha].sub.k][r.sub.k] - [z.sub.k]) = [H.sub.k][t.sub,k], [v.sub.k] - A([y.sub.k] - [H.sub.k][z.sub.k]) = (I - [AH.sub.k])[z.sub.k] = [AH.sub.k][t.sub.k],

where [t.sub.k] is the vector [t.sub.k] := [[alpha].sub.k][r.sub.k] - [z.sub.k].

However, the scheme (2.2), (2.3) may break down when ([v.sub.k], [z.sub.k]) = 0 and, in addition, it may not ensure that [AH.sub.k]+l is again A-related. We will see that both difficulties can be overcome.

3. Ensuring positive definiteness and A-relatedness. For simplicity, we drop the iteration index k and replace k + 1 by a star *. We assume that H is A-related and p = Hr [not equal to] 0, so that (AHr, r) > 0 and (Ap, Ap) > 0. Following Kleinmichel [16] and Spedicato [24], we scale the matrix H by a scaling factor 7 > 0 before updating. This leads to

H* = [gamma]H + [uv.sup.T] /(v, z),

where u = y - [gamma]Hz = H([alpha]r - [gamma]z) = Ht, v = Au = AM and t = [alpha]r - [gamma]z. Therefore,

(3.1) H* = [gamma]H + Ht[(AHt).sup.T]/(AHt, z),

(3.2) AH* = [gamma]AH + AHt[(AHt).sup.T] /(AHt, z).

Then, introducing the abbreviations

[[beta].sub.1] := (AHr, r), [[beta].sub.2] := (AHz, z), [beta]* := (AHr*, r*),

we find

[[beta].sub.1] > 0, [beta]* > 0, [[beta].sub.2] [greater than or equal to] [[beta].sub.1] > 0,

[alpha] (Ap, r)/(Ap, Ap) = (AHr, r)/(AHr, AHr) > 0,

and an easy calculation, using (r*, z) = 0, z = r - r*, shows

[[beta].sub.2] = [[beta].sub.1] + [beta]*

and the following formula for the denominator in (3.2),

(3.3) (v, z) = (AM, z) = ([alpha]AHr - [gamma]AHz, z) = [alpha][[beta].sub.1] - [gamma][[beta].sub.2] = (a - [gamma]) [[beta].sub.1] - [gamma][beta]*.

Since H is A-related, Proposition 2.1 can be applied. Hence, there exist a unitary m x m-matrix P and [mu] x n-matrices [??], [[??].sup.T], such that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[??][??] is symmetric positive definite.

Then (3.1), (3.2) preserve the block structure of H[P.sup.T] and PAH[P.sup.T]. Replacing t by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

we find

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Hence, by Proposition 2.1, AH* is A-related if and only if

(3.4) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

is positive definite.

Since [??][??] is s.p.d. and [gamma] > 0, [([gamma][??][??]).sup.1/2] is well defined and [??][??] satisfies

(3.5) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The matrix U has the eigenvalues 1 (with multiplicity [mu] - 1) and, with multiplicity 1, the Eigenvalue

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

But as is easily seen, ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]) = (AHt, t), so that by (3.3),

(3.6) [phi](gamma]) = 1/[gamma] [alpha]([alpha] - [gamma])[[beta].sub.1]/ [alpha] [[beta].sub.1] - [gamma][[beta].sub.2].

Hence, H* will be A-related, iff [gamma] > 0 satisfies [alpha][[beta].sub.1] - [gamma][[beta].sub.2] [not equal to] 0 and

[alpha]([alpha] - [gamma])[[beta].sub.1]/[alpha][[beta].sub.1] - [gamma][[beta].sub.2] > 0,

that is iff

(3.7) 0 < [gamma] < [alpha][[beta].sub.1]/[[beta].sub.2] = [alpha] [[beta].sub.1]/[[beta].sub.1] + [beta]* or [gamma]>a.

In view of the remark before Corollary 4.5 (see below), the choice 7 = 1 seems to be favorable. By (3.7), this choice secures that H* is A-related unless a, [[beta].sub.1], [beta]* satisfy

(3.8) 1[less than or equal to] [alpha] [less than or equal to] 1 + [beta]*/[[beta].sub.1].

Next we derive bounds for the condition number of AH* and follow the reasoning of Oren and Spedicato [20]. Let [??] := [??][??] and [??]* := [??][??]. Then by (3.4),

[??]* = [gamma][??] + [??][[??].sub.1][([??][[??].sub.1]).sup.T]/(AHt, z),

where we assume that [??] is s.p.d. and 7 satisfies (3.7), so that also [??]* is s.p.d. Then the condition number [kappa]([??]) with respect to the Euclidean norm is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Then by (3.5) for any x [not equal to] 0,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and, similarly,

[x.sup.T][??]*x/[x.sup.T]x [greater than or equal to] [[lambda].sub.min] (U) [gamma] [[lambda].sub.min] ([??])

Now, the eigenvalues of U are 1 and [phi]([gamma]) is given by (3.6). Hence,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

so that, finally,

[kappa]([??]*) [less than or equal to] [PHI])([gamma])[kappa]([??]),

Where

[PHI]([gamma]): = max(1, [phi]([gamma]))/min (1, [phi]([gamma])).

Note that [??]* depends on [gamma], [??]* = [??]*([gamma]). Unfortunately, the upper bound for [kappa]([??]*) just proved is presumably too crude; in particular it does not allow us to conclude that [kappa]([??]*) < [kappa]([??]) for certain values of 7. Nevertheless, one can try to minimize the upper bound by minimizing [PHI]([gamma]) with respect to 7 on the set r of all 7 satisfying (3.6). The calculation is tedious though elementary. With the help of MATHEMATICA one finds for the case [beta]* > 0, that is [[beta].sub.2] > [[beta].sub.1], that 4) has exactly two local minima on [GAMMA] namely

(3.9) [[gamma].sub.+] := [alpha] (1 + [square root of [beta]*/[beta].sub.2]) > [alpha], [[gamma].sub.-] := [alpha] (1 + [square root of [beta]*/[beta].sub.2]) < [alpha], [[beta].sub.1]/ [[beta].sub.2].

Both minima are also global minima, and they satisfy

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

4. The algorithm and its main properties. Now, let [H.sub.0] be any A-related starting matrix, and [x.sub.0] a starting vector. Then the algorithm for solving Ax = b in the least squares sense and computing a sequence of A-related matrices [H.sub.k] is given by:

Algorithm: Let A be a m x n-matrix of maximal rank, b [member of] [R.sup.m], [H.sub.0] be A-related, and [x.sub.0] [member of] [R.sup.n] a given vector.

For k = 0, 1, ...

1. Compute the vectors [r.sub.k] = b - [Ax.sub.k], [p.sub.k] := [H.sub.k][r.sub.k]. If [p.sub.k] = 0 then stop.

2. Otherwise, compute

[[alpha].sub.k] := ([Ap.sub.k], [r.sub.k])/([Ap.sub.k], [Ap.sub.k]), [x.sub.k+1] := [x.sub.k] + [[alpha].sub.k][p.sub.k], [y.sub.k] := [x.sub.k+1] - [x.sub.k], [r.sub.k+1] = [r.sub.k] - [AY.sub.k], [z.sub.k] = [r.sub.k] - [r.sub.k+1].

3. Define

[[beta].sub.1] := ([AH.sub.k][r.sub.k],[r.sub.k]) = ([Ap.sub.k],[r.sub.k]), [beta]* = ([AH.sub.k][r.sub.k]+l,[r.sub.k+1]).

4. Set [[gamma].sub.k] := 1 if (3.8) does not hold.

Otherwise compute any [[gamma].sub.k] > 0 satisfying (3.7), say by using (3.9),

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where eps is the relative machine precision.

Compute [u.sub.k] := [y.sub.k] - [[gamma].subk][H.sub.k][z.sub.k], [v.sub.k] := [Au.sub.k] and

[H.sub.k+1] := [[gamma].sub.k][H.sub.k] + [u.sub.k][v.sup.T.sub.k]/([v.sub.k], [z.sub.k]).

Set k := k + 1 and goto 1.

We have already seen that step 2 is well-defined if [p.sub.k] [not equal to] 0, because then ([Ar.sub.k], [r.sub.k]) > 0 and [Ap.sub.k] [not equal to] 0, if [H.sub.k] is A-related. Moreover, [p.sub.k] = 0 implies [A.sup.T] [r.sub.k] = [A.sup.T] (b - Axk) = 0, i.e., [x.sub.k] is a least-squares solution of (2.1), and the choice of [[gamma].sub.k] in step 2 secures also that [H.sub.k+1] is A-related. Hence, the algorithm is well-defined and generates only A-related matrices [H.sub.k]. This already proves part of the main theoretical properties of the algorithm stated in the following theorem:

THEOREM 4.1. Let A be real m x n-matrix of maximal rank u := min (m, n), [H.sub.0] an A-related real n x m-matrix and [x.sub.0] [member of] [R.sup.n]. Then the algorithm is well-defined and stops after at most p steps: There is a smallest index l < p with pi = 0, and then the following holds:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and in particular [Ax.sub.1] = b if m [less than or equal to] n. Moreover

1. [H.sub.j] is A-related for 0 [less than or equal to] j [less than or equal to] l.

2. [H.sub.k][z.sub.i] = [[gamma].sub.i,k] [y.sub.i] for 0 [less than or equal to] i < k [less than or equal to] l, where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

3. ([r.sub.k],[z.sub.i])= 0 for 0 [less than or equal ot] i < k [less than or equal to]l.

4. ([z.sub.k],[z.sub.i])= 0 for i [not equal to] k [less than or equal to] l.

5. [z.sub.k] [not equal to] 0 for 0 [less than or equal to] k < l.

Proof. Denote by (Pk) the following properties:

1) [H.sub.j] is A-related for 0 [less than or equal to] j [less than or equal to] k.

2) [H.sub.j] [z.sub.i] = -[[gamma].sub.i,j][y.sub.i] for 0 [less than or equal to] i < j [less than or equal to] k.

3) ([r.sub.j], [z.sub.i]) = 0 for 0 [less than or equal to] i < j [less than or equal to] k.

4) ([z.sub.j], [z.sub.i]) = 0 for 0 [less than or equal to] i < j [less than or equal to] k.

5) [z.sub.j] [not equal to] 0 for 0 [less than or equal to] j < k.

By the remarks preceding the theorem, the algorithm is well-defined as long as [p.sub.k] [not equal to] 0 and the matrices [H.sub.k] are A-related. (Pk), 4) and 5) imply that the k vectors [z.sub.i], i = 0, 1, ..., k - 1, are linearly independent and, since [z.sub.i] = [Ay.sub.i] [member of] R(A) and dim R(A) = [mu], property ([P.sub.k]) cannot be true for k > [mu]. Therefore, the theorem is proved, if (PO) holds and the following implication is shown:

([P.sub.k]), [p.sub.k] [not equal to] 0 [??] ([P.sub.k+1]).

1) ([P.sub.k+1]), 1) follows from the choice of [[gamma].sub.k] that [H.sub.k+1] is well-defined and A-related if is [H.sub.k].

2) To prove ([P.sub.k+1]), 2) we first note that

[H.sub.k+1] [z.sub.k] = [y.sub.k]

by the definition of [H.sub.k]. In addition for i < k, ([P.sub.k]) and the definition of -[[gamma].sub.i,k] imply

[H.sub.k+1][z.sub.i] = [[gamma].sub.k][H.sub.k][z.sub.i] = [[gamma].sub.k][[gamma].sub.i,k][y.sub.i] = [[gamma].sub.i,k+1][y.sub.i],

since ([z.sub.k], [z.sub.i]) = 0 and

([AH.sub.k]zk&& [z.sub.i]) = ([z.sub.k], [AH.sub.k][z.sub.i]) = ([z.sub.k], A [[gamma].sub.i,k][y.sub.i]) = ([z.sub.k], [y.sub.i,k][z.sub.i]) = 0.

This proves ([P.sub.k+1]), 2).

3) Since ([r.sub.k+1], [z.sub.k]) = 0, and for i < k,

([r.sub.k+1], [z.sub.i]) = ([r.sub.i+1] + [k.summation over (j=i+1)], [z.sub.j][z.sub.i]) = 0,

because of ([P.sub.k]), 3), 4). This proves ([P.sub.k+1]), 3).

4) By ([P.sub.k+1]), 3) we have for i < k

([z.sub.i], [z.sub.k]) = ([z.sub.i], [r.sub.k] - [r.sub.k+1]) = 0.

Hence ([P.sub.k+1]), 4) holds.

It was already noted that [P.sub.k] [not equal to] 0 implies [z.sub.k] [not equal to] 0. Hence ([P.sub.k+1], 5) holds. This completes the proof of the theorem.

REMARK 4.2. In the case m [greater than or equal to] n = [mu], the algorithm finds the solution [bar.x] := arg [min.sub.x] [parallel]Ax - b[parallel], which is unique. In the case m = [mu] < n, the algorithm finds only some solution x of the many solutions of Ax = b, usually not the interesting particular solution of smallest Euclidean norm. This is seen from simple examples, when for instance the algorithm is started with a solution [x.sub.0] of Ax = b that is different from the least norm solution: the algorithm then stops immediately with x := [x.sub.0], irrespective of the choice of [H.sub.0].

The least squares, minimal norm solution x* of Ax = b can be computed as follows: let [bar.b] be any solution of Ax = b (computed by the algorithm) and compute y* as the least squares solution of [A.sup.T] y = b again by the algorithm (but with A replaced by [A.sup.T] and b by b). Then x* = [A.sup.T] y* is the least squares solution of Ax = b with minimal norm.

We also note that the algorithm of this paper belongs to the large ABS-class of algorithms (see Abaffy, Broyden and Spedicato [5]). As is shown in [25], also the minimal norm solution of a Ax = b, A [member of] [R.sup.mxn], m = [mu] < n, can be computed by invoking suitable algorithms of the ABS-class twice.

COROLLARY 4.3. Under the hypotheses of Theorem 4.1, the following holds. If the algorithm does not stop prematurely, that is if p = min (m, n) is the first index with [p.sub.l] = 0, l= [mu], then

A[H.sub.[mu]] = [ZDZ.sup.-1], if p=m[greater than or equal to]n, [H.sub.[mu]]A = YD[Y.sup.-1], if p = n [greater than or equal to] m,

with the matrices

Z := [[z.sub.0] [z.sub.1] ... [z.sub.[mu]-1]],

Y:= [[y.sub.0] [y.sub.1] ... [y.sub.[mu]-1]],

D := diag ([[gamma].sub.0,[mu]], [[gamma].sub.1,[mu]], [[gamma].sub.[mu]-1.[mu]]

That is, if [[gamma].sub.k] = 1 for all k, then D = I so that [H.sub.[mu]], is a right-inverse of A if p = m [less than or equal to] n, and a left-inverse if p = n [less than or equal to] m.

Proof. Part 2. of Theorem 4.1 and the definitions of D, Z, Y imply

(4.1) [H.sub.[mu]] Z = YD, AY = Z.

The [mu] columns [z.sub.i] [member of] [R.sup.[mu]] of Z are linearly independent by parts 4. and 5. of Theorem 4.1 (they are even orthogonal to each other), so that by AY = Z also the columns [y.sub.i] [member of] [R.sup.n] of the matrix Y are linearly independent.

Hence, for [mu] = m [less than or equal to] n, [Z.sup.-1] exists and by (4.1)

A[H.sub.[mu]],Z = ZD [??] A[H.sub.[mu]], = ZD[Z.sup.-1],

and for p = n [less than or equal to] m, [Y.sup.-1] exists, which implies by (4.1)

[H.sub.[mu]]Z = [H.sub.[mu]]AY = YD [??][H.sub.[mu]]A = Y[DY.sup.-1].

As a byproduct of the corollary we note

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

If D = I (i.e., [[gamma].sub.k]k = 1 for all k) these formulae reduce to

A[H.sub.[mu]]A = A, [H.sub.[mu]]A[H.sub.[mu]] = [H.sub.[mu]].

Since then, for [mu] = n [less than or equal to] m, in addition, both [H.sub.[mu]]A = [I.sub.m] and A[H.sub.[mu]] are symmetric, [H.sub.[mu]] must be the Moore-Penrose pseudoinverse [A.sup.+] of A.

REMARK 4.4. Consider the special case [mu] = m = n of a nonsingular matrix A. Then, A[H.sub.[mu]] is s.p.d. and, by the corollary, A[H.sub.[mu]] = [ZDZ.sup.-1], so that the eigenvalues of A[H.sub.[mu]] are just the diagonal elements of D. Hence its condition number is r.(A[H.sub.[mu]]) = r.(D). Since the algorithm chooses the scaling parameter [[gamma].sub.k] = 1 as often as possible, A[H.sub.n] will presumably have a relatively small condition number even if D [not equal to] I.

Theorem 4.1, part 3. implies a minimum property of the residuals:

COROLLARY 4.5. For k < l, the residuals satisfy

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Proof. By the algorithm

[[alpha].sub.i][Ap.sub.i] = [[alpha].sub.i][AH.sub.i][r.sub.i] = [z.sub.i],

so that

[r.sub.k+1] = [r.sub.0] - [k.summation over (i=0)] [[alpha].sub.i][Ap.sub.i] = [r.sub.0] [k.summation over (i=0)] [z.sub.i].

Hence

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

is a convex quadratic function with minimum [[lambda].sub.i] = [[alpha].sub.i], i = 0, 1, ..., k, since

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

by Theorem 4.1, part 3.

We note that the space [[[z.sub.0], [z.sub.i], ..., [z.sub.k]]] is closely related to the Krylov space

[K.sub.k+1] ([AH.sub.0], [r.sub.0]) := [[[r.sub.0], [AH.sub.0][r.sub.0], ..., [([AH.sub.0]).sup.k][r.sub.0]]] [subset] [R.sup.m]

generated by the matrix [AH.sub.0] and the initial residual [r.sub.0]:

THEOREM 4.6. Using the notation of Theorem 4.1, the following holds for 0 [less than or equal to] k < l

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

that is [z.sub.0], [z.sub.1], ..., [z.sub.k] is an orthogonal basis of [AH.sub.0][K.sub.k+1] ([AH.sub.0], [r.sub.0]) and

dim [AH.sub.0] [K.sub.k+1] ([AH.sub.0], [r.sub.0]) = k + 1.

Proof. The assertions are true for k = 0, since

[z.sub.0] = [Ay.sub.0] = [[alpha].sub.0][AH.sub.0][r.sub.0] [member of] [[[AH.sub.0][r.sub.0]].

Now

[AH.sub.k][z.sub.k] [member of] [[[AH.sub.0][r.sub.0], ..., [([AH.sub.0]).sup.k+2][r.sub.0]]], 0 [less than or equal to] k < l,

[r.sub.k+1] = [r.sub.k] - [z.sub.k], [z.sub.k][[alpha].sub.k][AH.sub.k][r.sub.k] [member of] [[[AH.sub.k][r.sub.k]]], [[alpha].sub.k] [not equal to] 0.

For any vector u [member of] [R.sup.m]

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

5. Comparison with other methods and numerical results. The numerical tests are concentrated to the cases m = n + 1 and m = n for the following two reasons: The first is that it is harder to solve least squares problems with n [less than or equal to] m with n close to m than those with n [much less than] m. The second is that in the case m = n, which is equivalent to solving a linear equation Ax = b with a nonsingular and perhaps nonsymmetric matrix A, there are many more competing iterative methods, such as CGN, CGS, GMRES than the rank-one ("RK1") method of this paper and the rank-2 methods ("RK2") of [17] that solve Ax = b only indirectly by solving [A.sup.T]Ax = [A.sup.T]b. In all examples we take [log.sub.10] [parallel][r.sub.k][parallel] as measure of the accuracy.

EXAMPLE 5.1. Here we use that the algorithm of this paper is defined also for complex matrices A. As in [17] we consider rectangular complex tridiagonal matrices A = ([a.sub.jk]) [member of] [C.sup.mxn] that depend on two real parameters q and f and are defined by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

In particular, we applied RK1 (starting with [x.sub.0] := 0 and [H.sub.0] := [A.sup.H]) to solve the least squares problem with m = 31, n = 30, q = 0.1 and f = 1. The iteration was stopped as soon as

[re.sub.k] := [parallel][r.sub.k][parallel] [parallel][r.sub.0][parallel] [less than or equal to] [epsilon], [epsilon] := [10.sup.-3].

RKl stopped after 24 iterations. Figure 5.1 (plotting [log.sub.10] [re.sub.k] versus k) shows the typical behavior observed with cg-type methods (cf. Theorem 4.1).

At termination, RK1 produced only an approximation H* of the pseudoinverse of A, since RKl stopped already after 24 < 30 = n iterations. When [H.sub.0] := H* was then used to solve by RK1 the system Ax = b' with a new right-hand side b' [not equal to] b (but the same starting vector [x.sub.0] = 0 and precision e = [10.sup.-3]), RK1 already stopped after 9 iterations (see the plot of Figure 5.2).

Other choices of the parameters q and f lead to least squares systems Ax = b, where RKl (started with [H.sub.0] := [A.sup.T]) did not stop prematurely: it needed n iterations and thus terminated with a final H* very close to [A.sup.+]. Not surprisingly, RKl (started with [H.sub.0] := H*) then needed only one iteration to solve the system Ax = b' with a different right-hand side b' [not equal to] b..

[FIGURE 5.1 OMITTED]

[FIGURE 5.2 OMITTED]

The test results of [17] for RK2 were similar. Since RK1 needs only half the storage and number of operations as RK2, RK1 is preferable.

EXAMPLE 5.2. This major example refers to solving square systems Ax = [b.sub.i], i = 1, 2, . .., with many right hand sides bi. It illustrates the use of the final matrices HW produced by RK1 when solving Ax = bi as starting matrix [H.sub.0] := [H.sub.(i)] of RK1 for solving Ax = [b.sub.i+1].

We consider the same evolution equation as in [17],

[u.sub.t] + a * [u.sub.x] + b * [u.sub.y] = [u.sub.xx] + [u.sub.yy] + f (x, y, t), 0 [less than or equal to] t [less than or equal to]T, 0 [less than or equal to] x, y [less than or equal to] 1,

Where

f (x, y, t) := [e.sup.-[lambda]t] [(-[lambda] + 2[[pi].sup.2]) sin [pi] x sin [pi]y + [pi] (a cos [pi]x sin [pi]y + b sin [pi]x cos [pi]y)],

with the initial and boundary conditions

u (x, y, 0) = sin [pi]x sin [pi]y, 0 [less than or equal to] X, Y [less than or equal to] 1,

u(x, y, t)= 0, fort > 0, (x, y) [member of] [partial derivative][0,1] x [0,1].

Its exact solution is

u(x, y, t) = [e.sup.-[lambda]t] sin [pi]x sin [pi]y.

The space-time domain is discretized by a grid with space step [DELTA]x = [DELTA]y = h = 1/N and time step [DELTA]t = [tau] > 0. Let us denote the discretized solution at time level [n.sub.[tau]] by U, and at level (n + 1)[tau] by V. Application of the Crank-Nicolson technique using central differences to approximate the first order derivatives and the standard second order differences to approximate the second derivatives yields the following scheme (we use the abbreviations [beta] := [tau]([2h.sup.2]), [gamma] := [tau]/(4h)) for the transition U [right arrow] V:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where 2 [less than or equal to] i, j [less than or equal to] N - 1. This scheme is known to be stable and of order O([[tau].sup.2], [h.sup.2]). V is obtained form U by solving a linear system of equations AV = C(U) with a nonsymmetric penta-diagonal matrix A and a right hand side C(U) depending linearly on U. This linear system may be solved directly by using an appropriate modification of Gauss elimination. However, to demonstrate the use of our updates, we solve these equations for given U iteratively starting for the first time step with [H.sub.0] := [A.sup.T] . For the next time steps, we compare the results for the rank-two method in [17] with the rank-one method of this paper.

For N = 35 (corresponding to more than 1000 unknowns at each time level) and [DELTA]t = [tau] = 0.01, a = 10, b = 20, [lambda] = 1 the solution of the problem up to the time t = 5 x [DELTA]t requires the following numbers of iterations in order to achieve a relative residual norm < 0.0001:

RK1: 158 123 98 91 62 RK2: 158 123 109 87 74

These results show a comparable number of iterations for both methods. But again, the storage and the number of operations for RKl is half that required for RK2.

The next examples also refer to the solution of systems Ax = b with a square nonsingular n x n-matrix A. Here we follow the ideas of Nachtigal et al. in [19], who compared three main iterative methods for the solution of such systems. These are CGN, GMRES and CGS. Examples of matrices were given for which each method outperforms the others by a factor of size O([square root of n]) or even O(n). These examples are used to compare our algorithm ("RK1") with these three methods. In all examples n = 40, except Examples 5.3 and 5.6, where n = 400. We take [log.sub.10] [parallel][r.sub.k][parallel] as a measure of the accuracy. Note that RK1 solves Ax = b only indirectly via the equivalent system [A.sup.T] Ax = [A.sup.T] b..

EXAMPLE 5.3. In this example all methods make only negligible progress until step n. Here

A := diag (1, 4, 9, ..., [n.sup.2]),

is diagonal but represents a normal matrix with troublesome eigenvalues and singular values. The results show that both GMRES and RKl are leading.

EXAMPLE 5.4. This is an example where both CGN and RK1 are leading. Here A is taken as the unitary n x n-matrix

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

EXAMPLE 5.5. Here

A:= diag ([d.sub.1], [d.sub.2],..., [d.sub.n])

where {[d.sub.j]} denote the set of Chebyshev points scaled to the interval [1,[kappa]], where

[e.sub.j]:= cos(j-1)[pi]/n-1,

[d.sub.j] := 1 + ([e.sub.j] + 1) ([kappa] - 1) /2,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

A has condition number O(n) and smoothly distributed entries. In this example CGS wins and both CGN and RK1 are bad.

EXAMPLE 5.6. Here A is the block diagonal matrix

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

with

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

which ensures a bad distribution of singular values. Here both CGS and GMRES are leading, but RK1 outperforms CGN.

EXAMPLE 5.7. A is chosen as in Example 5.6, but with

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

This choice of [M.sub.j] causes failure of CGS in the very first step. GMRES is leading and RK1 outperforms CGN.

EXAMPLE 5.8. A is taken as in Example 5.6, but with

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [[gamma].sub.j] := ([[kappa].sup.2] + 1- [e.sup.2.sub.j] - [[kappa].sup.2]/[e.sup.2.sub.j]).sup.1/2] and [e.sub.j] and r. are defined as in Example 5.3. This leads to fixed singular values and varying eigenvalues.

Here RK1 and CGN lead and both are much better than CGS and GMRES.

EXAMPLE 5.9. A is taken as in Example 5.6, but with

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

This matrix is normal and has eigenvalues [+ or -]i and singular value 1. Here both RK1 and CGN require only one iteration, GMRES requires two while CGS fails.

The numerical results for Examples 5.1-5.7 are summarized in the next table listing the number of iterations needed for convergence (tested by [log.sub.10] [parallel][r.sub.k][parallel] [less than or equal to] -10). The number of iterations was limited to 50, and if a method failed to converge within this limit then the final accuracy is given by the number following \, e.g., \e-2 indicates that the final accuracy was [parallel][r.sub.50][parallel] [approximately equal to] [10.sup.-2].

Example CGN CGS GMRES RK1 5.1 \e-6 \e-0 40 40 5.2 1 40 40 1 5.3 \e-4 20 38 50 5.4 \e-2 2 2 40 5.5 \e-2 Fails 3 40 5.6 2 20 38 2 5.7 1 Fails 2 1

We note that the residuals [parallel][r.sub.k][parallel] behave erratically for CGN. Its performance may be improved if one uses [[bar.r].sub.0] := [A.sup.T] [r.sub.0] rather than [[bar.r].sub.0] := [r.sub.0] as taken in [19]. RK1 has common features with CGN: both perform excellently in Examples 5.2, 5.6 and 5.7, but RK1 avoids the bad performance of CGN in Examples 5.4, 5.5. RK1 never exceeded the maximum number of iterations.

No preconditioning was tried. It is obvious that if Examples 5.1 and 5.3 are scaled (to achieve a unit diagonal), one reaches the solution immediately within one iteration.

The comparison between these methods may include the number of matrix vector multiplications, of arithmetical operations, storage requirements and general rates of convergence. However, we chose only the number of iterations and the reduction of the norm of the residual as in [19] as indicators of performance. After comparing different iterative algorithms, Nachtigal et al. [19] concluded that it is always possible to find particular examples where one method performs better than others: There is not yet a universal iterative solver which is optimal for all problems.

6. Implementational issues and further discussion. There are several ways to realize the Algorithm. We assume that [H.sub.0] = [A.sup.T] and that A is sparse, but [H.sub.k] (k > 0) is not. Then a direct realization of the Algorithm requires storage of a matrix H [member of] [R.sup.nxm], and in each iteration, computation of 2 new matrix-vector products of the type [H.sub.k]w (neglecting products with the sparse matrices A and [H.sub.0] = [A.sup.T]) and updating of [H.sub.k] to [H.sub.k+1] by means of the dyadic product [u.sub.k][v.sup.T.sub.k] [member of] [R.sup.nxm] in Step 4 of the Algorithm. This requires essentially 3nm arithmetic operations (1 operation = 1 multiplication + 1 addition).

The arithmetic expense can be reduced by using the fact that any [H.sub.k] has the form [H.sub.k] [U.sub.k] [A.sup.T] (see Propositon 2.3), where [U.sub.k] [member of] [R.sup.nxn] is updated by

[U.sub.k+1] = [[gamma].sub.k][U.sub.k] + [u.sub.k][u.sup.T.sub.k] / ([Au.sub.k], [z.sub.k])

and each product [H.sub.k]w is computed as [U.sub.k]([A.sup.T]W). This scheme requires storage space for a matrix U [member of] [R.sup.nxn] and only 3n2 operations/iteration, which is much less than 3nm if n [much less than] m.

Finally, if the Algorithm needs only few iterations, say at most k [much less than] n, one may store only one set of vectors [u.sub.i], i [less than or equal to] k, explicitly (not two as with the rival method RK2 of [17]) and use the formula (supposing that [H.sub.0] = [A.sup.T])

(6.1) [H.sub.k+1] = [[gamma].sub.0][[gamma].sub.1] ... [[gamma].sub.k] (I + [k.summation over (i=0)] [u.sub.i][U.sup.T.sub.i]/ ([Au.sub.i], [z.sub.i])[[gamma].sub.i])[A.sup.T].

when computing products like [H.sub.k][z.sub.k] and [H.sub.k+1]rk+1 in the Algorithm. Using (6.1) to compute Hkw requires essentially the computation of 2 matrix-vector products of the form

[[bar.U].sup.T.sub.k]p, [[bar.U].sub.k]q

with the matrix

[[bar.U].sub.k] := [[u.sub.0], ..., [u.sub.k-1]] [member of] [R.sup.nxk],

which requires 2kn operations to compute a product [H.sub.k]w. This again is half the number of operations as for the rank-2 methods in [17].

Our new update is connected to the previous one in the same way the SRK1 update is related to the DFP-update as shown by Fletcher [9] and Brodlie et al. [2]. We may define general updating expressions by

[H.sub.k+1] = [gamma][H.sub.k] + [u.sub.k] [([Su.sub.k]).sup.T] / ([Su.sub.k], [z.sub.k]).

This encompasses the case when A is s.p.d., which requires S = I and [H.sub.0] = I, as well as the more general case in which S = A: this would require [H.sub.0] = [A.sup.T] or a better starting matrix, which secures that [AH.sub.0] is s.p.d..

An alternative procedure is to update both [H.sub.k] and the matrices [D.sub.k] := [AH.sub.k] using

[D.sub.k] = [D.sub.k] + [v.sub.k][v.sup.T.sub.k]/([v.sub.k], [z.sub.k])

This allows us to monitor the accuracy of the approximate inverse [H.sub.k] by checking the size of [D.sub.k] - I. For reasons of economy, one could monitor only the diagonal or some row of [D.sub.k].

The algorithm may be appropriate for solving initial value problems for partial differential equations as shown for rank-two updates in [17] and, for rank-one updates, in Example 5.2. In such problems, we start with the given initial value and iterate to find the solution for the next time level. The solution is considered to be acceptable if its accuracy is comparable to the discretization error of the governing differential equations. The final updated estimate of the inverse obtained for the present time level is then taken as the initial estimate of the inverse for the next time step.

* Received October 9, 2006. Accepted for publication September 3, 2007. Published online May 19, 2008. Recommended by M. Hochbruck.

REFERENCES

[1] C. BREZINSKI, M. REDIVO-ZAGLIA, AND H. SADOK, New look-ahead Lanczos-type algorithms for linear systems, Numer. Math, 83 (1999), pp. 53-85.

[2] K. W. BRODLIE, A. R. GOURLAY, AND J. GREENSTADT, Rank-one and rank-two corrections to positive definite matrices expressed in product form, J. Inst. Math. Appl., 11 (1973), pp. 73-82.

[3] C. G BROYDEN, A class of methods for solving nonlinear simultaneous equations, Math. Comput.,19 (1965), pp. 577-593.

[4] C, G. BROYDEN, The convergence of single-rank quasi-Newton methods, Math. Comput., 24 (1970), pp. 365382.

[5] J. ABAFFY, C. G BROYDEN, AND E. SPEDic[A.sup.T]O,A class ofdirectmethods for linear systems, Numer. Math. 45 (1984), pp. 361-376.

[6] J. Y. CHEN, D. R. KINCAID, AND D. M. YOUNG, Generalizations and modifications of the GMRES iterative method Numer. Algorithms, 21 (1999), pp. 119-146.

[7] P. DEUFLHARD, R. FREUND, AND A. WALTER, Fast secant methods for the iterative solution of large nonsymmetric linear systems, Impact Comput. Sci. Eng., 2 (1990), pp. 244-276.

[8] T. EIROLA AND O. NEVANLINNA, Accelerating with rank-one updates, Linear Algebra Appl., 121 (1989), pp. 511-520.

[9] R. FLETCHER, A new approach to variable metric algorithms, Computer J., 13 (1970), pp. 317-322.

[10] D. M. GAY, Some convergence properties of Broyden's method, SIAM J. Numer. Anal., 16 (1979), pp. 623630.

[11] D. M. GAY AND R. B. SCHNABEL, Solving systems of nonlinear equations by Broyden's method with projected updates, in Nonlinear Programming, O. L. Mangasarian, R. R. Mayer, and S. M. Robinson, eds., Academic Press, New York, 1973, pp. 245-281.

[12] R. R. GERBER AND F. J. LUK, A generalized Broyden's method for solving simultaneous linear equations, SIAM J. Numer. Anal., 18 (1981), pp. 882-890.

[13] M. R. HESTENES AND E. STIEFEL, Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Standards, 49 (1952), pp. 409-436.

[14] I. ICHIM, Anew solution for the least squares problem, Int. J. Computer Math., 72 (1999), pp 207-222.

[15] C. M. IP AND M. J. TODD, Optimal conditioning and convergence in rank one quasi-Newton updates, SIAM J. Numer. Anal., 25 (1988), pp. 206-221.

[16] H. KLEINMICHEL, Quasi-Newton Verfahren vom Rang-Eins-Typ zur Losung unrestringierter Minimierungsprobleme, Teil 1. Verfahren and grundlegende Eigenschaften, Numer. Math., 38 (1981), pp.219-228.

[17] A. MOHSEN AND J. STOER, A variable metric method for approximating inverses of matrices, ZAMM, 81 (2001), pp. 435-446.

[18] J. J. MORE AND J. A. TRANGENSTEIN, On the global convergence of Broyden's method, Math. Comput., 30 (1976), pp. 523-540.

[19] N. M. NACHTIGAL, S. C. REDDY, AND L. N. TREFETHEN, HOW fast are nonsymmetric matrix iterations?, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 778-795.

[20] S. S. OREN AND E. SPEDICATO, Optimal conditioning of self-scaling variable metric algorithms, Math. Program., 10 (1976), pp. 70-90.

[21] J. D. PEARSON, Variable metric methods of minimization, Comput. J., 12 (1969), pp. 171-178.

[22] M. J. D. POWELL, Rank one method for unconstrained optimization, in Integer and Nonlinear Programming, J. Abadie, ed., North Holland, Amsterdam, 1970, pp. 139-156.

[23] Y. SAAR AND H. A VAN DER VORST, Iterative solution of linear systems in the 20th century, J. Comput. Appl. Math., 123 (2000), pp. 1-33.

[24] E. SPEDICATO A class of rank-one positive definite quasi-Newton updates for unconstrained minimization, Optimization, 14 (1983), pp. 61-70.

[25] E. SPEDICATO, M. BONOMI, AND A. DEL POPOLO, ABS solution of normal equations of second type and application to the primal-dual interior point method for linear programming, Technical report, Department of Mathematics, University of Bergamo, Bergamo, Italy, 2007.

A. MOHSEN ([dagger]) AND J. STOER ([double dagger])

([dagger]) Department of Engineering Mathematics and Physics, Cairo University, Giza 12211, Egypt (amohsen@enq. cu. edu. eq). The work of this author was supported by the Alexander von Humboldt Stiftung, Germany.

([double dagger]) Institut fur Mathematik, Universitat Wurzburg, Am Hubland, D-97074 Wurzburg, Germany (jstoer@mathematik.uni-wuerzburg.de).

Printer friendly Cite/link Email Feedback | |

Author: | Mohsen, A.; Stoer, J. |
---|---|

Publication: | Electronic Transactions on Numerical Analysis |

Article Type: | Report |

Geographic Code: | 7EGYP |

Date: | Dec 1, 2007 |

Words: | 9029 |

Previous Article: | Harmonic Rayleigh-Ritz extraction for the multiparameter eigenvalue problem. |

Next Article: | Special volume on Applied Linear Algebra. |

Topics: |