QuasiNewton preconditioners for the inexact Newton method.
Abstract. In this paper preconditioners for solving the linear
systems of the Newton method in each nonlinear iteration are studied. In
particular, we define a sequence of preconditioners built by means of
Broydentype rankone updates. Optimality conditions are derived which
guarantee that the preconditioned matrices are not far from the identity
in a matrix norm. Some notes on the implementation of the corresponding
inexact Newton method are given and some numerical results on two model
problems illustrate the application of the proposed preconditioners.
Key words. QuasiNewton method, Krylov iterations, updating preconditioners, inexact Newton method AMS subject classifications. 65F10, 65H10, 15A12 1. Introduction. Newton's method for the solution of systems of nonlinear equations F(x) = 0 (1.1) with F : [R.sup.n] [right arrow] [R.sup.n] can be described as follows. If each component of F, [f.sub.i], is differentiable, the Jacobian matrix J(x) is defined by: J[(x).sub.ij] = [partial derivative][f.sub.i]/[partial derivative] [x.sub.j] (x). Assuming that J is available, nonsingular, and "easy" to compute, Newton's method can be defined by the iteration [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.2) When J is large and sparse, e.g., for problems arising from the discretization of a nonlinear PDE, preconditioned Krylov based iterative schemes can be employed for the solution of the linear system, so that two nested iterative procedures need to be implemented. To avoid oversolving, i.e., excessive and essentially useless iterations of the inner scheme, it is crucial to employ an "inexact" technique [5]. This approach tries to control the number of linear iterations by allowing the accuracy of the linear solver to vary across nonlinear iterations [7]. Another crucial issue for the reduction of total linear iterations is to use efficient preconditioning techniques. In general, ILUtype preconditioners [14], [17] can be employed and calculated at every nonlinear iteration. Techniques for selectively evaluating a preconditioner P may be developed to save on the cost of the calculation of the preconditioner. Note that the two phases where efficiency can be mostly improved are the cost of the linear system solution (thus including the number of iterations) and the cost of preconditioner evaluation. In this paper we are mainly concerned with the efficient preconditioning of the linear system. The "optimal" preconditioner P would minimize the constant C of: [parallel] I  PJ([x.sub.k]) [parallel] [less than or equal to] C. (1.3) Note that, ideally one would like C to be as small as possible or even to tend to zero as k [right arrow] [infinity]. This requires that information from the nonlinear iterative scheme be taken into account in the evaluation of P. Standard preconditioners, such as ILU, are completely unaware of the underlying linearization process. Actually, often the Jacobian matrix tends to become more illconditioned as k grows, giving rise to increased numerical inaccuracies in the calculation of P. The approach proposed in this paper is to solve system (1.2) with an iterative Krylov subspace method, starting with either ILU(0) [14] or AINV [2] preconditioners, computed from the initial Jacobian, and to update this preconditioner using a rank one sum. A sequence of preconditioners [P.sub.k] can thus be defined by imposing the secant condition, as used in the implementation of quasiNewton methods [6]. We choose to work with the Broyden update as described for instance in [9], and analyze the theoretical properties of the preconditioner and the numerical behavior of the resulting scheme. Our approach is in the framework of the work of J. M. Martinez in [12], [13], where computing the preconditioner of choice (possibly even the null matrix) at every Newton step and then enriching it with a lowrank update is suggested. In these papers the author is mainly concerned with convergence of the Inexact Newton method while we focus on the optimality of the preconditioner at each nonlinear iteration. In addition we solve the Newtonian linear systems using an arbitrary iterative method with the stopping criterion of [5], while in Martinez's procedure this criterion is not used and the first iterate of the iterative linear solver should be constructed using the preconditioner. Morales and Nocedal in [15] proposed a ranktwo update of the previously computed preconditioner in the acceleration of the Conjugate Gradient (CG) method. The vectors involved in this update do not come from the Newton iteration but they are generated using information from the CG iteration. Another study on updating factorized preconditioners can be found in [19]. The paper is organized as follows: Section 2 defines the preconditioner [P.sub.k]. In Section 3, we prove that we can make the quantity [parallel] I  [P.sub.k]J([x.sub.k]) [parallel] as small as possible. Section 4 gives the main lines of the implementation of the preconditioner application. Section 5 reports some numerical results on two model problems. Finally, Section 6 gives some concluding remarks. 2. Preconditioning by Broyden update. The idea is to start with a preconditioner P = [B.sup.1.sub.0] for [J.sub.0]. If the preconditioner is in the form of a sparse approximate inverse, then [B.sup.1.sub.0] is known explicitly. Otherwise, if [B.sub.0] = [??][??] is an incomplete LU factorization, we only can compute [B.sup.1.sub.0] times a vector by solving two triangular sparse linear systems. The proposed approach tries to evaluate [P.sub.k+1] = [B.sup.1.sub.k+1] at every Newton iteration by adding to the previous preconditioner a rank one update. Let us write [B.sub.k+1] = [B.sub.k] + [uv.sup.T]. (2.1) As in the Broyden scheme, [B.sub.k+1] must satisfy the secant condition [B.sub.k+1] [s.sub.k] = [y.sub.k], (2.2) where [y.sub.k] = [F.sub.k+1]  [F.sub.k]. To uniquely determine u, v, we also impose that [B.sub.k+1] be the closest matrix to [B.sub.k] in the Frobenius norm among all the matrices satisfying (2.2), [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (2.3) Combining the secant condition (2.2) with (2.1), since [B.sub.k][s.sub.k] + [uv.sup.T][s.sub.k] = [y.sub.k], we readily obtain: u = [y.sub.k]  [B.sub.k][s.sub.k]/[v.sup.T][s.sub.k]. Condition (2.3) is satisfied by choosing v = [s.sub.k]/[parallel] [s.sub.k] [parallel]. Using u and v thus defined, for every matrix B satisfying the secant equation we can write [B.sub.k+1] = [B.sub.k] + B[s.sub.k]  [B.sub.k][s.sub.k]/[s.sup.T.sub.k] [s.sub.k] [s.sup.T.sub.k] from which, taking Frobenius norms, we have that [parallel] [B.sub.k+1]  [B.sub.k] [parallel] = [parallel] B[s.sub.k]  [B.sub.k][s.sub.k]/[s.sup.T.sub.k][s.sub.k] [s.sup.T.sub.k] [parallel] [less than or equal to] [parallel] B  [B.sub.k] [parallel] x [parallel] [s.sub.k][s.sup.T.sub.k]/[s.sup.T.sub.k][s.sub.k] [parallel] = [parallel] B  [B.sub.k] [parallel], and thus (2.3) holds. The expression for [B.sub.k+1] is then written as [B.sub.k+1] = [B.sub.k] + ([y.sub.k]  [B.sub.k][s.sub.k]) [s.sup.T.sub.k]/[s.sup.T.sub.k][s.sub.k]. (2.4) The final preconditioner [P.sub.k+1] = [B.sup.1.sub.k+1] can be obtained by inversion using the ShermanMorrison formula to yield: [B.sup.1.sub.k+1] = [B.sup.1.sub.k]  ([B.sup.1.sub.k] [y.sub.k]  [s.sub.k]) [s.sup.T.sub.k] [B.sup.1].sub.k]/[s.sup.T.sub.k] [B.sup.1.sub.k] [y.sub.k] (2.5) In the next section we will prove that [parallel] I  [B.sup.1.sub.k] J(xk) [parallel] can be made as small as possible by suitable choices of the initial guess [x.sub.0] and the initial preconditioner [B.sub.0]. Note that this makes our preconditioner almost ideal in the sense of (1.3). REMARK 2.1. The definition of our preconditioner and the convergence analysis that follows is similar to the corresponding developments of Broyden's method, as reported in [9]. The main difference is that in our case the new update is calculated by Newton iteration using the real and not the approximate Jacobian matrix. Thus in all the following steps we need to take into account that [y.sub.k]  [B.sub.k][s.sub.k] [not equal to] [F.sub.k+1]. 3. Convergence analysis. We will make the standard assumptions on F. 1. Equation (1.1) has a solution [x.sup.*]. 2. J(x) : [OMEGA] [right arrow] [R.sup.nxn] is Lipschitz continuous with Lipschitz constant [gamma]. 3. J([x.sup.*]) is nonsingular. We will now bound the norm of the difference between [B.sub.k+1] and J([x.sub.k+1]) in terms of the errors in the current and new iterates. It is convenient to indicate with the subscript c every vector or matrix referring to the current iterate and with + every quantity referring to the new iterate k + 1. Following this notation Newton's method can be stated as [J.sub.c]s = [F.sub.c], [x.sub.+] = [x.sub.c] + s. (3.1) We define the error vectors [e.sub.+] = [x.sup.*]  [x.sub.+], [e.sub.c] = [x.sup.*]  [x.sub.c] and the error matrices [E.sub.+] = [B.sub.+]  J([x.sup.*]), [E.sub.c] = [B.sub.c]  J([x.sup.*]). The next lemma is the analog of Theorem 7.2.2 in [9]. It states that the sequence {[B.sub.k]} remains close to J([x.sup.*]) as [x.sub.k] approaches [x.sup.*]. LEMMA 3.1. Let the standard assumptions hold. Then [parallel] [E.sub.+] [parallel] [less than or equal to] [parallel] [E.sub.c] [parallel] + [gamma] ([parallel] [e.sub.c] [parallel] + [parallel] [e.sub.+] [parallel])/2. Proof. Since ([B.sub.c]  [J.sub.c])s  [F.sub.c] = [B.sub.c]s = J([x.sup.*])s + [E.sub.c]s, we have [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3.2) where [[DELTA].sub.c] = [[integral].sup.1.sub.0] (J([x.sub.c] + ts)  J([x.sup.*])) dt. From (2.4) we can write [E.sub.+] = [B.sub.+]  J([x.sup.*]) = [B.sub.c]  J([x.sup.*]) + ([F.sub.+]  [F.sub.c]  [B.sub.c]s)/[s.sup.T]/[s.sup.T]s. (3.3) Then, setting [P.sub.s] = s[s.sup.T]/[s.sup.T]s and substituting (3.2) into (3.3) we obtain [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3.4) Using the standard assumptions and taking norms, since [parallel] [P.sub.s] [parallel] = 1 we have [parallel] [E.sub.+] [parallel] [less than or equal to] [parallel] [E.sub.c] [parallel] + [parallel] [[DELTA].sub.c] [parallel] [less than or equal to] [parallel] [E.sub.c] [parallel] + [gamma] ([parallel] [e.sub.c] [parallel] + [parallel] [e.sub.+] [parallel])/2. (3.5) The next Lemma 3.2 and Theorem 3.3 prove that if the initial Newton point [x.sub.0] is sufficiently near the solution, and [B.sub.0] sufficiently near J([x.sup.*]), then the sequence {[B.sup.1.sub.k] is well defined. The proof of Lemma 3.2 is not given since it is analogous to that of Theorem 7.2.3 in [9]. LEMMA 3.2. Let the standard assumptions hold. Fix [[delta].sub.1] > 0. Then there are [delta], [[delta].sub.B] such that if [parallel] [e.sub.0] [parallel] < [delta] and [parallel] [E.sub.0] [parallel] < [[delta].sub.B] then [parallel] [E.sub.k] [parallel] < [[delta].sub.1], [for all]k > 0. THEOREM 3.3. Let the standard assumptions hold. Define [alpha] = [parallel] J[([x.sup.*]).sup.1] [parallel]. Fix 0 < [[delta].sub.1] < 1/[alpha]. Then there exist [delta] and [[delta].sub.B] such that if [parallel] [e.sub.0] [parallel] < [delta] and [parallel] [E.sub.0] [parallel] < [[delta].sub.B] then [parallel] [B.sup.1.sub.+] [parallel] [alpha]/1  [[delta].sub.1][alpha] (3.6) Proof. From Lemma 3.2 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3.7) Rewriting (3.7) we have [parallel] [B.sup.1.sub.+] [parallel] (1  [[delta].sub.1][alpha]) < [alpha] from which the thesis, by the hypothesis [[delta].sub.1]a < 1. We now prove that we can make [parallel] I  [B.sup.1.sub.k] J([x.sub.k]) [parallel] as small as possible. To this end we need the following two Lemmas which bound the difference between [B.sub.k] and J([x.sub.k]). LEMMA 3.4. Let the standard assumptions hold. Then setting [E'.sub.+] = [B.sub.+]  [J.sub.+], and [E'.sub.c] = [B.sub.c]  [J.sub.c], we have [parallel] [E'.sub.+] [parallel] [less than or equal to] [parallel] [E'.sub.c] [parallel] + [gamma] 3/2 ([parallel] [e.sub.c] [parallel] + [parallel] [e.sub.+] [parallel]). Proof. Since [E'.sub.+] = [E.sub.+] + J([x.sup.*])  [J.sub.+] = [E.sub.+] + [DELTA][J.sub.+] and [E'.sub.c] = [E.sub.c] + J([x.sup.*])  [J.sub.c] = [E.sub.c] + [DELTA][J.sub.c] eq. (3.4) becomes [E'.sub.+] = [DELTA][J.sub.+] + [([E'.sub.c]  [DELTA][J.sub.c]) (I  [P.sub.s]) + [[DELTA].sub.c][P.sub.s] = [E'.sub.c] (I  [P.sub.s])  [DELTA][J.sub.c](I  [P.sub.s]) + [DELTA][J.sub.+] + [[DELTA].sub.c] [P.sub.s]. (3.8) [parallel] [E'.sub.+] [parallel] [less than or equal to] [parallel] [E'.sub.c] [parallel] + [gamma] [parallel] [e.sub.c] [parallel] + [gamma] [parallel] [e.sub.+] [parallel] + [gamma] [parallel] [e.sub.c] [parallel] + [parallel] [e.sub.+] [parallel]/2 and the thesis holds. LEMMA 3.5. Let the standard assumptions hold. Fix [[delta]'.sub.1] > 0. Then there are [delta], [[delta].sub.B] such that if [parallel] [e.sub.0] [parallel] < [delta] and [parallel] [E'.sub.0] [parallel] < [[delta]'.sub.B] then [parallel] [E'.sub.k] [parallel] < [[delta]'.sub.1], [for all]k > 0. Proof. Analogous to that of Lemma 3.2. The next theorem will establish abound of [parallel] I  [B.sup.1.sub.+] [J.sub.+] [parallel] in terms of [parallel] J[([x.sup.*]).sup.1] [parallel]. THEOREM 3.6. Let the standard assumptions hold. Fix 0 < [[delta].sub.1] < 1/[alpha], [[delta]'.sub.1] > 0. Then there are [delta], [[delta].sub.B], [[delta]'.sub.B] such that if [parallel] [e.sub.0] [parallel] < [delta], [parallel] [E.sub.0] [parallel] < [[delta].sub.B] and [parallel] [E'.sub.0] [parallel] < [[delta]'.sub.B] then [parallel] I  [B.sup.1.sub.+] [J.sub.+] [parallel] < [[delta]'.sub.1] [alpha]/1  [[delta].sub.1] [alpha]. Proof. Let [delta], [[delta].sub.B], [[delta]'.sub.B] be as in Lemmas 3.2 and 3.5. From Lemma 3.5 and Theorem 3.3 we have [parallel] I  [B.sup.1.sub.+] [J.sub.+] [parallel] = [parallel] [B.sup.1.sub.+] ([B.sub.+]  [J.sub.+]) [parallel] [less than or equal to] [parallel] [B.sup.1.sub.+] [parallel] [parallel] [E'.sub.+] [parallel] < [[delta]'.sub.1] [alpha]/1  [[delta].sub.1][alpha]. (3.9) 4. Notes on implementation. In this section we give the main lines of the implementation of the product of our preconditioner times a vector, which is needed when using a preconditioned Krylov method. At a certain nonlinear iteration level, k, and given a vector [z.sup.(l).sub.k], we want to compute c = [B.sup.1.sub.k] [z.sup.(l).sub.k]. (4.1) Here with superscript l we indicate the linear iteration index. For k [greater than or equal to] 0, [B.sub.k+1] is given inductively by (2.5): [B.sup.1.sub.k+1] = [B.sup.1.sub.k]  ([B.sup.1.sub.k] [y.sub.k]  [s.sub.k]) [s.sup.T.sub.k] [B.sup.1.sub.k]/ [s.sup.T.sub.k] [B.sup.1.sub.k] [y.sub.k] (4.2) If we set [v.sub.k] = [s.sub.k]/[parallel] [s.sub.k] [parallel], [u.sub.k] = [y.sub.k]  [B.sub.k][s.sub.k]/ [parallel] [s.sub.k] [parallel] and [w.sub.k] = [B.sup.1.sub.k] + [v.sup.T.sub.k][B.sup.1.sub.k][u.sub.k], then [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.3) At the initial nonlinear iteration k = 0, under the hypothesis that [B.sup.1.sub.0] (or [B.sub.0]) is explicitly known, we simply have c = [B.sup.1.sub.0] [z.sup.(l).sub.0]. If k=1, c = [B.sup.1.sub.1] [z.sup.(1).sub.1] = (I  [w.sub.0][v.sub.0]) [B.sup.1.sub.0] [z.sup.(l) .sub.1]. Before doing this we have to compute [w.sub.0] = [B.sup.1.sub.0] [u.sub.0]/1 + [v.sup.T.sub.0] [B.sup.1.sub.0] [u.sub.0] If k > 1, let us suppose that [u.sub.i], [v.sub.i], [w.sub.i] are known, i = 0, ..., k  2. Then to be able to compute c = [B.sup.1.sub.k] [z.sup.(l).sub.k], before starting the linear system solution we have to evaluate [v.sub.k1], [u.sub.k1] and [w.sub.k1] = [u'.sub.k1]/1 + [v.sup.T.sub.k1] [u'.sub.k1] where [u'.sub.k1] = [B.sup.1.sub.k][u.sub.k1] can be evaluated using (4.3) at the price of an application of [B.sup.1.sub.0], k  1 dot products and k  1 daxpy operations. Now we are ready to compute c = (I  [w.sub.k1][v.sup.T.sub.k1]) (I  [w.sub.k]2[v.sup.T.sub.k2]) ... (I [w.sub.0][v.sup.T.sub.0]) [B.sup.1.sub.0] [z.sup.(l).sub.k] at the price of an application of [B.sup.1.sub.0], k dot products and k daxpy operations. Note that the updating of the preconditioner just described, being based on scalar products and daxpy operations, is well suited to parallelization. 4.1. The NewtonBroyden algorithm. The implementation of the Inexact Newton method requires the definition of a stopping criterion for the linear solver (1.2) based on the nonlinear residual. Following [5], we stop the linear iteration as soon as the following test is satisfied [parallel] J([x.sub.k])[s.sub.k] + F([x.sub.k]) [parallel] [less than or equal to] [[eta].sub.k] [parallel] F ([x.sub.k] [parallel]. Superlinear or even quadratic convergence of the Inexact Newton method can be achieved by properly setting the sequence {[[eta].sub.k]}. We can now write the NewtonBroyden Algorithm as follows: NEWTONBROYDEN (NB) ALGORITHM Input: [x.sub.0], F, nlmax, tol * WHILE [parallel] F([x.sub.k]) [parallel] > tol AND k < nlmax DO 1. Compute [B.sub.0] ([B.sup.1.sub.0]) approximating [J.sub.0] ([J.sup.1.sub.0]); k = 0 2. IF k > 0 THEN update [B.sup.1.sub.k] from [B.sup.1.sub.k1] 3. Solve J([x.sub.k])[s.sub.k] = F([x.sub.k]) by a Krylov method with preconditioner [B.sup.1.sub.k] and tolerance [[eta].sub.k]. 4. [x.sub.k+1] = [x.sub.k] + [s.sub.k] 5. k = k + 1 * END WHILE Since the actual computation of the preconditioner is never performed explicitly, the stated algorithm suffers from two main drawbacks, namely increasing costs of memory for [w.sub.k] and [v.sub.k] and of preconditioner application. These drawbacks are common to many iterative schemes, such as for example sparse (Limited Memory) Broyden implementations [16], GMRES [18] and Arnoldi method for eigenvalue problems [11]. Several options can be devised to alleviate these difficulties, all based on variations of a restart procedure, by which the scheme is reset after a chosen number of iterations. Our implementation is described in the following section. 4.2. Restart. If the number of nonlinear iterations is high (e.g. more than ten iterations), the application of Broyden preconditioner may be too expensive to be counterbalanced by a reduction in the iteration number. To this aim we define [k.sub.max] the maximum number of rank one corrections we allow. After [k.sub.max] nonlinear iterations the previous computed [w.sub.i], [v.sub.i], i = 1, ..., [k.sub.max] are discarded, and a new preconditioner [B.sup.1.sub.0] is computed. RESTARTED NEWTONBROYDEN (RNB) ALGORITHM Input: [x.sub.0], F, [k.sub.max], nlmax, tol * Compute [B.sub.0] ([B.sup.1.sub.0]) approximating [J.sub.0] ([J.sup.1.sub.0]); k = 0, restart=TRUE * WHILE [parallel] F([x.sub.k]) [parallel] > to] AND k < nlmax DO 1. IF (NOT restart) THEN update [B.sup.1.sub.k] from [B.sup.1.sub.k1]; restart = FALSE. 2. Solve J([x.sub.k])[s.sub.k] = F([x.sub.k]) by a Krylov method with preconditioner [B.sup.1.sub.k] and tolerance [[eta].sub.k]. 3. [x.sub.k+1] = [x.sub.k] + [s.sub.k] 4. k = k + 1 5. IF k MOD [k.sub.max = 0 THEN  restart = TRUE; compute [B.sub.k] ([B.sup.1.sub.k]) approximating [J.sub.k] ([J.sup.1.sub.k]) * END WHILE REMARK 4. 1. Our preconditioner can be viewed as a particular case of the one defined in [121 as [B.sub.k] = C([x.sub.k]) + [A.sub.k], where we set [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] 5. Numerical results. In this section we report the numerical performance of our preconditioners in solving two nonlinear test problems of large size. As initial preconditioners (Bo), we consider either the incomplete Cholesky factorization ILU(0) [14] or the approximate inverse preconditioner AINV [2], [3]. Note that, in the first case, [B.sub.0] is known and the application of [B.sup.1.sub.0] results in two triangular sparse linear system solutions. On the other hand, [B.sup.1.sub.0] is explicitly provided by AINV, as the product of two sparse triangular factors, and hence its application needs two matrixvector products. Our aim is to give experimental evidence that the Broyden correction, implemented as explained in the previous sections, produces an acceleration of the convergence of the iterative solver, independently of the initial preconditioner of choice. All the numerical experiments were performed on a Compaq ES40 equipped with an alphaprocessor "ev6.8" at 833Mhz, 4.5 GB of core memory, and 16 MB of secondary cache. The CPU times are measured in seconds. The solution of systems (2) is performed by the BiCGstab iterative method [20] because in our examples the Jacobian is symmetric but is not guaranteed to be positive definite. We stop the iteration whenever the exit test (4.4) with constant [[eta].sub.k] = [10.sup.4] is fulfilled. 5.1. Example 1: the Bratu problem. We consider a generalization of the classical discrete Bratu problem [8]: Au = [lambda]D(u), D = diag(exp([u.sub.1]), ..., exp([u.sub.n])) where A is a symmetric positive definite matrix arising from a 2d or 3d discretization of the diffusion equation on the unitary domain: [nabla] (K[nabla]u) = f (5.1) with variable diffusion coefficient K, and [lambda] is a real parameter. We used [lambda] = 1 and [x.sub.0] = [(0.1, ..., 0.1).sup.T]. We consider matrices arising from discretization employing Finite Elements (FE) and Mixed Finite Elements (MFE). 5.1.1. 2d MITE discretization. Matrix A has 28600 rows and 142204 nonzero elements. It arises from a Mixed Finite Element discretization of the diffusion operator (with constant K [equivalent to] 1) in two spatial dimensions on triangles of uniform sizes. Tables 5.1 and 5.2 report the results of the nonlinear convergence, giving the number of nonlinear iterations, the cumulative number of linear iterations, the total CPU time and the CPU time needed to evaluate the preconditioner (computation of [B.sup.1.sub.0] when needed plus all the updates). When the Broyden acceleration is used, we also provide the values of [k.sub.max]. With ILU(0)([J.sub.0]) we mean that the preconditioner is computed once and for all at the beginning of the nonlinear iteration, while ILU(0)([J.sub.k]) indicates that the incomplete factorization is accomplished at each nonlinear iteration. In both cases ([B.sub.0] = ILU(0)/AINV) the RNB algorithm produces an acceleration in terms of both number of iterations and CPU time. The simple NB algorithm (without restart) appears to be less efficient. This is not surprising since the results of Theorem 3 only hold when [x.sub.0] is near the solution. In the NB algorithm, [B.sub.0] is computed only once, when the iterate is still far from [x.sup.*], while in the RNB approach, the "initial" preconditioner is computed every [k.sub.max] iterations, thus taking advance from the nonlinear convergence. 5.2. Example 2. We consider here a slight modification of the Bratu problem which leads to an increasingly illconditioned Jacobian matrix. This is easily accomplished by taking a negative value for the parameter [lambda]. We choose to work with [lambda] = 1 and the same initial vector as in the previous example. 5.2.1. 2d MFE discretization. We report in Table 5.3 the results of the various algorithms described in Section 4, compared with the Inexact Newton method preconditioned with the simple ILU(0) computed at the initial iteration. Table 5.4 provides the same out come as Table 5.3 but employing the AINV preconditioner. In the latter case we also provide the value of the drop tolerance used in the test. Analyzing Table 5.3 we notice that the Broyden acceleration produces a mild reduction of the CPU time as compared with the computation of ILU(0) at every iteration. The optimal [k.sub.max] value is 1. The simple NewtonBroyden algorithm (i.e. with no restart) does not give any improvement in terms of CPU time and iteration number. In Table 5.4 the results relative to the AINV preconditioner with drop tolerance of 0.05 enhances the efficiency of the proposed preconditioner. Here the optimal [k.sub.max]. value is larger than 1. Using [k.sub.max] > 1 produces a saving in the CPU time needed for computing the costly AINV preconditioner. 5.2.2. 3d FE discretization. Matrix A is the result of a 3D Finite Element discretization of (5.1) with coefficient K varying in space and using tetrahedral elements. It has 268 515 rows and 3 926 823 nonzero elements. For this test case we also compare the Broyden acceleration with the result obtained by selectively computing the preconditioner (both ILU(0) and AINV) every [k.sub.max] iterations; see Tables 5.5 and 5.6. In Figure 5.1 we report the linear iterations employed at each nonlinear iteration for the ILU(0) and AINV(0.03) preconditioners computed at each nonlinear iteration and using the Broyden acceleration for a couple of [k.sub.max] values. As is clear from figure 5.1, in this problem the Jacobian matrix becomes more illconditioned during the nonlinear iteration. For this reason, using the AINV preconditioner with a fixed drop tolerance value (= 0.03), yields a sequence of preconditioners with increasing number of nonzero elements and consequent increasing evaluation cost. This explains the CPU times for preconditioner evaluation in Table 5.6. The RNB preconditioner provides a substantial reduction of the linear iterations especially near the solution of the nonlinear problem, as expected by the findings of the theory in Section 2. [FIGURE 5.1 OMITTED] The initial preconditioner does not remain a good preconditioner for the subsequent iterations as confirmed by the first row in Tables 5.5 and 5.6. The Broyden correction produces a reduction in the number of linear iterations and CPU time with respect to the ILU (AINV) preconditioner for small values of [k.sub.max]. Also, the proposed preconditioner is more efficient than simply computing selectively ILU or AINV Comparing for example the results in the third and fifth rows of Table 5.6 we see that the Broyden acceleration reduces the linear iteration from 705 to 448 (35% reduction) and the CPU time from 355.6 to 262.1 seconds (26% reduction). Comparable reductions hold for the ILU(0) preconditioner. 6. Conclusions. We have proposed a family of preconditioners based on the Broyden secant update formula, with the aim of accelerating the convergence properties of a given preconditioner during the nonlinear process. During the Newton iteration, starting from a preconditioner [B.sup.1.sub.0] approximating the inverse of the initial Jacobian matrix, a sequence of preconditioners [B.sup.1.sub.k] 1 is defined, by taking into account information of the previous nonlinear iterations. The developed theoretical analysis proves that the sequence satisfies [parallel] I  [B.sup.1.sub.k] [J.sub.k] ([x.sub.k]) [parallel] [less than or equal to] C, with C that can be made arbitrarily small, depending on [x.sub.0] and [B.sub.0]. The application of the proposed preconditioner may be memory and time consuming, especially when k is large. However, compared with the standard procedure of computing the preconditioner of choice at every Newton iteration, a limited memory variant of the proposed approach provides an improvement of the performance. Experimental results on a number of large test problems, show substantial savings in terms of both iteration number and computing time. The algorithm described in this paper seems to be particularly appropriate for parallel implementation. First, the improvement of the proposed preconditioner is important especially when the computation of the initial preconditioner is costly, which is a common situation when solving systems in a parallel framework. Second, the implementation of the QuasiNewton corrections as described in Section 4, being made up of a number of daxpys and scalar products, can be parallelized in a straightforward manner. If the computation and application of the initial preconditioner can be efficiently parallelized (which is the case for approximate inverse preconditioners; see [1] for AINV or [4], [10] for other approaches), the overall algorithm is completely parallel. Acknowledgements. We would like to thank the referees and J. M. Martinez for their suggestions that have improved the presentation of this paper. REFERENCES [1] M. BENZI, J. MARIN, AND M. TUMA, A twolevel parallel preconditioner based on sparse approximate inverses, in D. R. Kincaid and A. C. Elster, eds., Iterative Methods in Scientific Computation IV, Vol. 5 of IMACS Series in Computational and Applied Mathematics, New Brunswick, New Jersey, USA, 1999, international Assoc. for Mathematics and Computers in Simulation, pp. 167178. [2] M. BENZI AND M. TUMA, A sparse approximate inverse preconditioner for nonsymmetric linear systems, SIAM J. Sci. Comput., 19 (1998), pp. 968994. [3] , A comparative study of sparse approximate inverse preconditioners, Appl. Numer. Math., 30 (1999), pp.305340. [4] L. BERGAMASCHI AND A. MARTINEZ, Parallel acceleration of Krylov solvers by factorized approximate inverse preconditioners, in M. Dayde et al., eds., VECPAR 2004, Lecture Notes in Comput. Sci., No. 3402, Springer, Heidelberg, 2005, pp. 623636. [5] R. S. DEMBO, S. C. EISENSTAT AND T. STEIHAUG, Inexact Newton methods, SIAM J. Numer. Anal., 19 (1982), pp. 400408. [6] J. E. DENNIS, JR. AND J. J. MORE, QuasiNewton methods, motivation and theory, SIAM Rev., 19 (1977), pp. 4689. [7] S. C. EISENSTAT AND H. F. WALKER, Choosing the forcing terms in an inexact Newton method, SIAM J. Sci. Comput., 17 (1996), pp. 1632. [8] D. R. FOKKEMA, G. L. G. SLEJIPEN AND H. A. VAN DER VORST, Accelerated inexact Newton schemes for large systems of nolinear equations, SIAM J. Sci. Comput., 19 (1997), pp. 657674. [9] C. T. KELLEY, Iterative Methods for Linear and Nonlinear Equations, SIAM, Philadelphia, 1995. [10] L. Yu. KOLOTILINA AND A. YU. YEREMIN, Factorized sparse approximate inverse preconditionings I. Theory, SIAM J. Matrix Anal. Appl., 14 (1993), pp. 4558. [11] R. B. LEHOUCQ AND D. C. SORENSEN, Deflation techniques for an implicit restarted Arnoldi iteration, SIAM J. Matrix Anal. Appl., 17 (1996), pp. 789821. [12] J. M. MARTINEZ, A theory of secant preconditioners, Math. Comp., 60 (1993), pp. 681698. [13] , An extension of the theory of secant preconditioners, J. Comput. Appl. Math., 60 (1995), pp. 115125. [14] J. A. MEIJERINK AND H. A. VAN DER VORST, An iterative solution method for linear systems of which the coefficient matrix is a symmetric Mmatrix, Math. Comp., 31 (1977), pp. 148162. [15] J. L. MORALES AND J. NOCEDAL, Automatic preconditioning by limited memory QuasiNewton updating, SIAM J. Optim., 10 (2000), pp. 10791096. [16] S. G. NASH AND J. NOCEDAL, A numerical study of the limited memory BFGS method and the truncatedNewton method for large scale optimization, SIAM J. Optim., 1 (1991), pp. 358372. [17] Y. SHAD, ILUT:: A dual threshold incomplete LU factorization, Numer. Linear Algebra Appl., 1 (1994), pp.387402. [18] Y. SHAD AND M. H. SCHULTZ, GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM J. Sci. Stat. Comput., 7 (1986), pp. 856869. [19] J. D. TEBBENS AND M. TUMA, Preconditioner updates for solving sequences of large and sparse nonsymmetric linear systems, Technical Report, Institute of Computer Science, Academy of Sciences of the Czech Republic, V940, 2005. [20] H. A. VAN DER VORST, BiCGSTAB: A fast and smoothly converging variant of BICG for the solution of nonsymmetric linear systems, SIAM J. Sci. Stat. Comput., 13 (1992), pp. 631644. * Received November 2, 2005. Accepted for publication January 24, 2006. Recommended by C. Brezinski. L. BERGAMASCHI ([dagger]), R. BRU ([double dagger]), A. MARTINEZ ([section]), AND M. PUTTI ([dagger]) ([dagger]) Department of Mathematical Methods and Models for Scientific Applications, University of Padova, Italy, ({berga, putti}@dmsa.unipd.it). Work partially supported by the Italian MIUR project "Numerical Models for Multiphase Flow and Deformation in Porous Media". ([double dagger]) Instituto de Matematica Multidisciplinar, Departamento de Matematica Aplicada, Universidad Politecnica de Valencia, Spain, (rbru@mat.upv.es). Work supported by the Spanish DGI (FEDER) grant MTM200402998, Generalitat Valenciana project GRUPOS03/062 and the research programme of the Univ. Politecnica of Valencia. Part of the work of this author was carried out during his visit to the Department of Mathematical Methods and Models for Scientific Applications of the Univ. of Padova. ([section]) Department of Pure and Applied Mathematics, University of Padova, Italy, (acalomar@math.unipd.it). Work supported by the research fellowship "Parallel implementations of exponential integrators forODEs/PDEs". TABLE 5.1 Results on MFE matrix with [B.sub.0] = ILU(0). preconditioner [k.sub.max] nlit iter ILU(0)([J.sub.O])  7 851 ILU(0)([J.sub.k])  7 754 RNBILU(0) 1 7 442 RNBILU(0) 2 7 470 RNBILU(0) 3 7 501 RNBILU(0) 5 7 529 NBILU(0) [infinity] 6 515 preconditioner CPU tot precond ILU(0)([J.sub.O]) 11.59 0.01 ILU(0)([J.sub.k]) 8.65 0.04 RNBILU(0) 6.51 0.08 RNBILU(0) 6.56 0.06 RNBILU(0) 6.93 0.06 RNBILU(0) 7.09 0.06 NBILU(0) 9.67 0.06 TABLE 5.2 Results on MFE matrix with [B.sub.0] = AINV(0). preconditioner [k.sub.max] nlit iter AINV(0.1)([J.sub.O])  7 882 AINV(0.1)([J.sub.k])  7 908 RNBAINV(0.1) 1 8 574 RNBAINV(0.1) 2 7 517 RNBAINV(0.1) 3 7 647 RNBAINV(0.1) 4 7 502 RNBAINV(0.1) 5 7 561 NBAINV(0.1) [infinity] 7 655 preconditioner CPU tot precond AINV(0.1)([J.sub.O]) 18.35 0.17 AINV(0.1)([J.sub.k]) 19.70 1.27 RNBAINV(0.1) 14.62 1.57 RNBAINV(0.1) 13.07 0.81 RNBAINV(0.1) 16.10 0.63 RNBAINV(0.1) 12.61 0.44 RNBAINV(0.1) 14.08 0.44 NBAINV(0.1) 17.15 0.26 TABLE 5.3 Results on MFE matrix with [B.sub.0] = ILU(0). preconditioner [k.sub.max] nlit iter ILU(0)([J.sub.0])  14 849 ILU(0)([J.sub.k])  14 597 RNBILU(0) 1 14 436 RNBILU(0) 2 14 415 RNBILU(0) 3 14 419 RNBILU(0) 4 14 407 NBILU(0) [infinity] 15 595 preconditioner CPU tot precond ILU(0)([J.sub.O]) 15.61 0.01 ILU(0)([J.sub.k]) 7.04 0.08 RNBILU(0) 6.51 0.15 RNBILU(0) 6.56 0.12 RNBILU(0) 6.93 0.11 RNBILU(0) 7.09 0.11 NBILU(0) 14.86 0.17 TABLE 5.4 Results on MFE matrix with [B.sub.0] = AINV(0.05). preconditioner [k.sub.max] nlit iter AINV(0.05)([J.sub.O])  14 852 AINV(0.05)([J.sub.k])  14 545 RNBAINV(0.05) 1 14 332 RNBAINV(0.05) 2 14 347 RNBAINV(0.05) 3 14 348 RNBAINV(0.05) 4 14 333 RNBAINV(0.05) 5 14 353 NBAINV(0.05) [infinity] 14 630 preconditioner CPU tot precond AINV(0.05)([J.sub.O]) 18.99 0.19 AINV(0.05)([J.sub.k]) 29.96 5.42 RNBAINV(0.05) 18.34 5.06 RNBAINV(0.05) 17.72 2.91 RNBAINV(0.05) 16.13 2.06 RNBAINV(0.05) 15.57 1.67 RNBAINV(0.05) 16.27 1.26 NBAINV(0.05) 18.73 0.34 TABLE 5.5 Results on the 3DFE matrix with [B.sub.0] = ILU(0). preconditioner [k.sub.max] nlit iter ILU(0) ([J.sub.o])  17 1880 ILU(0) ([J.sub.k])  16 354 ILU(0) ([J.sub.k]) 2 16 353 RNBILU(0) 1 16 230 RNBILU(0) 2 16 232 RNBILU(0) 3 16 250 RNBILU(0) 4 16 251 NBILU(0) [infinity] 17 1132 preconditioner CPU tot precond ILU(0) ([J.sub.o]) 1125.4 0.4 ILU(0) ([J.sub.k]) 193.4 6.4 ILU(0) ([J.sub.k]) 194.1 6.2 RNBILU(0) 137.7 9.4 RNBILU(0) 137.8 6.3 RNBILU(0) 149.7 5.5 RNBILU(0) 150.8 4.8 NBILU(0) 826.4 4.7 TABLE 5.5 Results on the 3DFE matrix with [B.sub.0] = AINV(0.03). preconditioner [k.sub.max] nlit iter AINV(0.03) ([J.sub.o])  17 1991 AINV(0.03) ([J.sub.k])  16 682 AINV(0.03) ([J.sub.k]) 4 16 705 RNBAINV(0.03) 1 16 418 RNBAINV(0.03) 4 16 448 RNBAINV(0.03) 7 16 508 RNBAINV(0.03) 10 16 556 NBAINV(0.03) [infinity] 17 1186 preconditioner CPU tot precond AINV(0.03) ([J.sub.o]) 755.3 3.5 AINV(0.03) ([J.sub.k]) 470.4 151.9 AINV(0.03) ([J.sub.k]) 355.6 34.0 RNBAINV(0.03) 356.7 154.5 RNBAINV(0.03) 262.1 37.4 RNBAINV(0.03) 281.9 29.1 RNBAINV(0.03) 289.9 17.7 NBAINV(0.03) 591.0 5.2 TABLE 5.6 Results on the 3DFE matrix with [B.sub.0] = AINV(0.03). preconditioner [k.sub.max] nlit iter CPU tot precond AINV(0.03)([J.sub.o])  17 1991 755.3 3.5 AINV(0.03)([J.sub.k])  16 682 470.4 151.9 AINV(0.03)([J.sub.k]) 4 16 705 355.6 34.0 RNBAINV(0.03) 1 16 418 356.7 154.5 RNBAINV(0.03) 4 16 448 262.1 37.4 RNBAINV(0.03) 7 16 508 281.9 29.1 RNBAINV(0.03) 10 16 556 289.9 17.7 NBAINV(0.03) [infinity] 17 1186 591.0 5.2 

Reader Opinion