# Evaluating matrix functions for exponential integrators via Caratheodory-Fejer approximation and contour integrals.

AMS subject classification. 65L05, 41A20, 30E20

1. Introduction. According to Minchev and Wright , the main computational challenge in the implementation of any exponential integrator is the need for fast and computationally stable evaluations of the exponential and related functions. We are interested in solving problems

(1.1) [??] = Au + g{u,t)

where the matrix represents the spatial discretization of a linear elliptic differential operator such as the Laplacian and is a nonlinear function in and . In many problems is negative semidefinite. Exponential integrators are time-stepping formulas for (1.1) that separate the linear term involving , which is solved exactly by a matrix exponential, from the nonlinear term. The simplest example is the exponential forward Euler method, given by

[u.sub.n+1] = [e.sup.[DELTA]tA]un + [DELTA]t [[psi].sub.1]([DELTA]tA)g([u.sub.n],[t.sub.n]),

where (z) = (ez - 1) /z. There are many other exponential schemes and some of these ideas have been reinvented several times . In recent years the interest in exponential integrators has heightened. Krylov methods to compute functions were introduced by Saad  and Hochbruck and Lubich , Cox and Matthews  and Krogstad  introduced one-step methods with 4th order accuracy in many circumstances, and Hochbruck and Ostermann  showed how 4th order could be achieved for all problems. Exponential multi-step formulas require fewer matrix function evaluations per step than one-step methods and are computationally very promising. Recently they have been rediscovered by Beylkin  and Cox and Matthews after being introduced by Norsett  in 1969. Kassam and Trefethen applied the one-step method of Cox and Matthews to stiff PDEs such as the Korteweg-de Vries, Kuramoto-Sivashinsky, Allen-Cahn and Grey-Scott equations [23, 24] and compared them with more standard schemes.

We follow a recent convention [3, 21, 22] and introduce

[[psi].sub.1](z) = 1/(l - 1)! [[integral].sup.1.sub.0] [e.sup.(1 - [theta])z][[theta].sup.l-1]d[theta], l = 1,2,....

In addition we define [[psi].sub.0](z) = [e.sup.z], which enables us to utilize the recurrence relation

(1.2) [[psi].sub.1](z) = [[psi].sub.l-1](z) - [[psi].sub.l-1](0)/z, 1 [greater than or equal to] 1.

For the first few values of l we have

[[psi].sub.1](z) = [e.sup.z] - 1/z, [[psi].sub.2](z) = [e.sup.z] - z - 1/[z.sup.2], [[psi].sub.3](z) = [e.sup.z] - [z.sup.2]/2 - z - 1/[z.sup.3].

The Taylor series representation of these functions is given by

(1.3) [[psi].sub.1](z) = [[infinity].summation over (k=l)] 1/k! [z.sup.k-l].

The functions [[psi].sub.1] are entire. Nevertheless, a numerical challenge one encounters in utilizing them is that a direct computation based on these identities suffers from cancellation errors for z close to the origin [17, 24]. To address this problem Cox and Matthews  made use of the Taylor series. This technique works for scalars and diagonal matrices. An alternative stable evaluation is based on (1.2) and a Cauchy integral representation on a circle of radius centered at z, for |z| < 1/2. Following Kassam and Trefethen  this is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [t.sub.k] = 2[pi]/M. To avoid cancellation, the circles should not come close to the origin. This idea can be generalized to non-diagonal matrices A, where the contour [GAMMA] has to enclose the spectrum of A:

(1.4) [[psi].sub.1](A) = 1/2[pi]i [[integral].sub.[GAMMA]] [[psi].sub.1](s)[(sI - A).sup.-1] ds.

In practice we are interested in f(A)b rather than f(A) and herein lies a further strength of this idea. We can evaluate

[[psi].sub.1](A)b = 1/2[pi]i [[integral].sub.[GAMMA]] [[psi].sub.1](s)[(sI - A).sup.- 1] b ds

by a numerical quadrature scheme that solves a linear system at each node [s.sub.k] on the contour [GAMMA]. In particular this idea is successful for the matrix exponential and it is closely related to rational approximations as shown in . Instead of evaluating (1.4) for l > 0 we give in [section]5 an alternative integral representation of [[psi].sub.i] not involving the function [[psi].sub.i] in the integrand.

Another method for evaluating the [psi] functions that is widely used nowadays is Pade approximation, as suggested by Beylkin et al. , Hochbruck et al. , and Minchev and Wright . The idea is based on scaling and squaring, which is a popular method for computing the matrix exponential [18, 34]. This approach is restricted to matrices of moderate dimension as the evaluation of f(A)b requires the explicit computation of f(A).

We suggest instead the use of uniform rational Chebyshev approximations or the application of the trapezoid rule on Talbot-type contours. The first idea was put forward previously by Lu . Lu claims that the coefficients are hard to compute and gives the rational approximation of type (14,14) for [[psi].sub.i] in a partial fraction decomposition. He used the Remes algorithm and a multiple precision environment. Here we shall show that in fact, the required approximations can be computed readily in standard precision by the Caratheodory-Fejer method as in  and .

We also discuss in [section]4 the approximation of these functions in a common set of poles, which is advantageous in some applications.

Throughout this work we use direct methods to solve linear systems although our ideas are by no means restricted to them.

2. The asymptotic behavior. Methods based on rational approximations get their power from the fast exponential decay of the error introduced by the approximant. In the case of [psi] functions we can give precise statements about the convergence. Let [P.sub.n] denote the set of all polynomials of degree at most n [member of] N with real coefficients. Let [R.sub.mn] denote the set {p/q | p [member of] [P.sub.m], q [member of] [P.sub.n], q [??] 0}, n [member of] N of rational functions. The best rational approximant [r.sup.*.sub.mn] = [r.sup.*.sub.mn] (f, [R.sup.-];x) [member of] [R.sub.mn] to the function f on [R.sup.-] = (-[infinity], 0] and the minimal approximation error [E.sub.mn] = [E.sub.mn](f,[R.sup.-]) are defined by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] denotes the sup-norm on [R.sup.-]. The best approximant [r.sup.*.sub.mn] ([[psi].sub.1] [R.sup.-]) exists and is unique .

Cody, Meinardus and Varga  showed that [E.sub.nn] (ex(x)[R.sup.-]), M decreases geometrically as n [right arrow] [infinity]. In 1986 Gonchar and Rakhmanov  proved that the rate of convergence is given by

(2.1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where v, Halphen's constant, is the unique positive root of [[SIGMA].sup.[infinity].sub.n=1] n[v.sup.n]/1 - [(-v).sup.n] = 1/8. This constant was studied by Halphen as early as 1886 . The proof by Gonchar and Rakhmanov confirmed a conjecture by Magnus  and previous numerical computations by Trefethen and Gutknecht  and Carpenter, Ruttan and Varga .

A sharper result than (2.1) was conjectured by Magnus  and subsequently proved by Aptekarev :

[E.sub.nn] (ex(x),[R.sup.-]) = 2[v.sup.n+1/2] (1 + o(l)) as n [right arrow] [infinity].

We define [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], if this limit exists, and conjecture that the limit does indeed exist with [E.sub.l] = [E.sub.o] = v for all l.

CONJECTURE 2.1. For all l [member of] N the asymptotic decay of the error is [E.sub.l] = v.

Results about asymptotic convergence for best rational approximations are notoriously difficult to prove. A successful proof might follow the footsteps of Gonchar and Rakhmanov , or might be based on induction utilizing recurrence relations for non-normal matrices which we are going to introduce in [section]4. Numerical experiments give compelling indications that the conjecture is valid.

For practical purposes we are interested in the number of poles necessary to achieve a desired accuracy. The asymptotic convergence rate is of limited use here, although it would serve as a first indicator.

3. Caratheodory-Fejer approximation on the negative real line. An efficient method for constructing near-best rational approximations is the Caratheodory-Fejer (CF) method, which was introduced first for the problem of constructing approximations on the unit disc [42, 43]. By utilizing a conformal map from the unit circle to a real interval or the negative real line it was shown in  that the method is very efficient for real approximation, too. The idea can also be generalized for other domains in the complex plane .

These approximations are so close to optimal that the method can often be regarded as exact in practice. Magnus  has argued that the approximations of exp(x) produced by the CF method differ from the true best approximations by only about O([56.sup.-n]. The absolute difference between the best CF and best approximations is below standard machine precision for [greater than or equal to] 9.

These ideas have not been exploited much over the last two decades. Today we can compute CF approximations on the fly in fractions of a second. In  a MATLAB code is presented that computes rational approximations of ex(x) on the negative real line with an error as small as 2 x [10.sup.-14]. We have made some minor modifications to adapt the code for [psi] functions.

CF approximation enables us to estimate the error for all at once as they appear as singular values of a certain matrix within the construction process. For the applications we have in mind, a rational uniform approximation with an error of [10.sup.-6] is often appropriate. The asymptotic behavior discussed in the last section suggests that n = 6 poles may be sufficient to achieve this accuracy. Computations confirm that for n = 6, the error we commit by replacing [[psi].sub.l] by its CF approximation is indeed smaller than [10.sup.-6] for l = 0,1,... ; see Fig. 3.1. For all exponential integrators we have used it is sufficient to compute [[psi].sub.l] up to 1 = 4.

We use a partial fraction expansion of the rational approximations,

[r.sub.n](z) = [p.sub.n](z)/[q.sub.n](z) = [r.sub.[nfinity]] + [n.summation over (j=1)] [c.sub.j]/z - [z.sub.j],

where [c.sub.j] is the residue of the pole [z.sub.j] and [r.sub.[infinity]] = r([infinity]). As the denominator [q.sub.n](z) is a polynomial with real coefficients, the poles come in conjugate pairs.

4. Approximation in common poles. When implementing exponential integrators it is an attractive option to use a set of common poles for all functions. Using this strategy one can evaluate [[psi].sub.l](hA)v for several different by linear combination of the solutions [x.sub.j] of a fixed set of systems (A - [z.sub.j]I)[x.sub.j] = v with only the coefficients of the linearcombination depending on . This situation is very frequent when dealing with exponential integrators. Typically the use of common poles makes exponential integrators faster by a factor of to 3.5. The precise cost savings depend ultimately on the type of integrator being used and whether it is possible to store LU decompositions, etc. Rather than devising an optimization problem that imposes a set of common poles as a constraint and choosing them so that associated error functions are as small as possible, we generalize an identity by Saad .

Proposition 4.1. Let

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[FIGURE 3.1 OMITTED]

where z [member of] C. Then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Saad established this identity for l = 0. He was not concerned with [psi] functions of higher order. The proof for l > 0 carries over.

Proof. We observe that [B.sup.n.sub.z] = I and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

As [[psi].sub.l](z) = [[SIGMA].sup.[infinity].sub.k=l] 1/k! [z.sup.k-l] we get

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

If z is replaced by a matrix this idea has been suggested amongst others by Hochbruck et al.  and Saad  for computing [[psi].sub.1](A). However, this direct approach has two disadvantages: the resulting matrix [B.sub.A] has twice the size of A, and it does not inherit properties such as symmetry from A.

Given a rational approximation

[r.sub.(l)](z) = [r.sub.[infinity]] + [n.summation over (j=1)] [c.sub.j]/z - [z.sub.j]

of [[psi].sub.l], and noting that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

the entry of [r.sub.(l)]([B.sub.z]) approximating [[psi].sub.l+1] is

[r.sub.(l)][([B.sub.z]).sub.1,2] = [n.summation over (j=1)] [c.sub.j][z.sup.- 1.sub.j]/z - [z.sub.j].

This identity implies the recurrence relation

(4.1) [r.sub.(l+k)](z) = [n.summation over (j=1)] [c.sub.j][z.sup.-k.sub.j]/z - [z.sub.j], k [member of] Z

for rational approximations using common poles of [psi] functions. This generalizes a result by Minchev . These approximations are far from optimal and yet they provide reasonable quality on the negative real axis. For k > 0 we can give error bounds that provide some insight.

The matrices [B.sub.z] are diagonalizable for z [not equal to] 0 with eigenvalues 0 and z, hence [B.sub.z] = [V.sub.z][[LAMBDA].sub.z][V.sup.-1.sub.z]. And yet it is not appropriate to estimate the error by

(4.2) [||f([B.sub.z]) - [[psi].sub.l]([B.sub.z])||.sub.2] [less than or equal to] [kappa].sub.2]([V.sub.z])[||f([[LAMBDA].sub.z]) - [[psi].sub.l]([[LAMBDA].sub.z])||.sub.2]

where [[kappa].sub.2]([V.sub.z]) = [||[V.sub.z]||.sub.2][||[V.sup.-1.sub.z]||.sub.2] is the 2-norm condition number of [V.sub.z], as the condition number of [V.sub.z] tends to infinity for z approaching the origin, see Fig. 4.1. If this were not the case we would be able to give a simple proof based on induction for Conjecture 2.1. A more useful error bound for small z is given by the following result.

Theorem 4.2. If f and g are analytic on the negative real axis [R.sup.-] then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Proof. This is a special case of Theorem 11.2.2 in .

If [r.sub.(l+i)] is the rational approximation induced by [r.sub.(l)] using the recurrence relation above we can bound the error by

(4.3) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Using approximations in common poles the accuracy of best rational approximations can not be achieved when using the same number of poles for both ideas. Nevertheless, this approach is rather robust as the derivatives of the error term in (4.3) are small, too. In the context of exponential integrators it is important to work with approximations providing an accuracy beyond the level of the truncation error of the integrator.

We give in Table 4.1 some numerical results for small degrees of the rational approximations.

The entries in the first column of Table 4.1 can be further improved by introducing a positive shift s. As [e.sup.z] = [e.sup.s][e.sup.z-s], we can approximate the second factor by a CF approximation,

(4.4) [e.sup.z] [approximately equal to] [e.sup.s] ([r.sub.[infinity]] + [n.summation over (j=1)] c.sub.j]/z - (s + [z.sub.j])).

Lu  introduced this shift to deal with matrices that have negative and in addition small positive eigenvalues. We have observed that a shift of O(1) gives only slightly weaker results for [e.sup.z] but significantly better rational approximations induced by (4.1) and (4.4) to functions of higher order. We summarize our results in Table 4.2.

[FIGURE 4.1. OMITTED]

5. Talbot contours and integrals of Cauchy type. The results of the last section give us a new perspective on contour integrals, too. If [GAMMA] is a contour enclosing z with winding number 1, then

(5.1) [e.sup.z] = 1/2[pi]i [[integral].sub.[GAMMA]] [e.sup.s]/s - z ds.

When z becomes a matrix instead of a scalar, the same approach works, with the term 1/(s - z) becoming the resolvent matrix [(sI - A).sup.-1]. In the context of this work is a Hankel contour, that is, a deformed Bromwich contour that winds around the negative real axis in the counter-clockwise sense, see Fig. 5.1. In particular it encloses all eigenvalues of the negative semidefinite matrix A.

[FIGURE 5.1. OMITTED]

In  various choices for such contours are discussed. Although the integral does not depend on this choice, the convergence of the results gained by an evaluation with the trapezoid rule on the contour can be optimized a great deal.

The contour [GAMMA] is represented as the image of the real line R under an analytic function [phi]. Then (5.1) can be written as

(5.2) [e.sup.z] = 1/2[pi]i [[integral].sup.[infinity].sub.-[infinity]] [e.sup.[phi]([theta])]/[phi]([theta]) - z [phi]'([theta])d[theta].

The integrand is an exponentially decaying function. By truncating R to a finite interval one therefore commits only an exponentially small error. Following  we shall arbitrarily fix this interval as [-[pi], [pi]]. In [-[pi], [pi]] we take n points [[theta].sub.k] spaced regularly at a distance 2[pi]/n, and our trapezoid approximation to (5.2) becomes

(5.3) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [s.sub.k] = [phi]([[theta].sub.k]) and [w.sub.k] = [phi]'([[theta].sub.k]).

Using an optimized version of Talbot's original contours it is possible to achieve a convergence Rate of O([3.89.sup.-n]) [45,46]. In particular it is possible to get almost down to machine precision with as few as 24 poles, which come in 12 conjugate pairs.

The exponential decay of the integrand in (5.2) is missing once we try to generalize the approach for [psi] functions of higher order. The term [[psi].sub.l] is only algebraically decaying, which is too slow for most applications in practice . An alternative approach to enforce the exponential decay might be to introduce an additional reparametrization of the real line R by transformations as discussed in .

But given the rational approximation (5.3) of [e.sup.z] the approximation for [[psi].sub.1](z) induced by (4.1) is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

This is the trapezoidal rule applied to the integral of Cauchy type

[I.sub.1] = 1/2[pi]i [[integral].sub.[GAMMA]] [e.sup.s]/s 1/s - z ds,

which is indeed an alternative integral representation of [[psi].sub.1].

Theorem 5.1. Let C be a closed contour encircling the points O and z [member of] C with winding number 1. Then

[[psi].sub.l](z) = 1/2[pi]i [[integral].sub.C] [e.sup.s]/[s.sup.l] 1/s - z ds.

Proof. Multiplying the Taylor series representation (1.3) of [[psi].sub.l](z) by [z.sup.l] reveals that [[psi].sub.l](z) is the regular part of the Laurent series for [e.sup.z]/[z.sup.l]:

[e.sup.z]/[z.sup.l] = [[psi].sub.l](z) + [l-1.summation over (k=0)] 1/k! [z.sup.k-l].

Given the linearity of the path integral it is sufficient to show that

I = 1/2[pi]i [[integral].sub.C] [l-1.summation over (k=0)] 1/k! [s.sup.k-l] 1/s - z ds

vanishes. Changing the order of summation and integration yields

I = [l-1.summation over (k=0)] 1/k! 1/2[pi]i [[integral].sub.C] [s.sup.k-l] 1/s - z ds.

If z = 0 all integrals vanish. Let z [not equal to] 0 and define [I.sub.k] = 1/2[pi]i [[integral].sub.C] [s.sup.k-l] 1/s - z ds. The integrals [I.sub.k] are solved by residue calculus

[I.sub.k] = Res ([s.sup.k-l]/s - z, 0) + Res ([s.sup.k-l]/ s - z, z).

The latter residue is [z.sup.k-l]. The residue of [s.sup.-m]/s - z at the origin, where m is a positive integer, is given as

1/(m - 1)! [d.sup.m-1]/d[s.sup.m-1] 1/s - z[|.sub.s=0] = 1/(m - 1)! [d.sup.m-1]/d[a.sup.m-1] 1/a[|.sub.a=-z] = -[z.sup.-m],

which implies [I.sub.k] = 0 for all k = 0,..., l - 1.

In this integral representation the integrand is exponentially decaying along a Hankel contour and is therefore of greater practical use than (1.4). Although contour integrals need more points than optimal rational approximations they are somewhat more flexible when adaptive integrators are implemented as

(5.4) [[psi].sub.l](hA) = 1/2[pi]I [[integral].sub.[GAMMA] [e.sup.hs]/[(hs).sup.l][(sI - A).sup.-1] ds,

where h is a time step. This identity may give hope that [[psi].sub.l](hA) can be evaluated with the same resolvent matrices independently of h. Unfortunately, a balance of the truncation and discretization error cannot be maintained. The truncation error grows as h shrinks, whereas the discretization error grows with h. However, accuracy can be maintained throughout certain ranges of h. Experiments to identify them for the exponential are reported in .

Exponential integrators, such as the method by Krogstad, often need [[psi].sub.l](hA)v and [[psi].sub.l](1/2hA)v. Using best rational approximation we have to solve n linear systems for both products. Achieving similar accuracy with Talbot contours it is typically enough to solve 2n linear systems once and to apply (5.4). Hence, we do not expect a trade-off with respect to best rational approximations in this situation.

6. Exponential integrators. It would go far beyond the scope of this article to introduce exponential integrators in detail. Instead we have decided to pick two typical but rather distinctive members of the huge family of exponential integrators. Various others are classified in .

Quite often exponential integrators have close relatives amongst the more established methods. There are exponential versions of Runge-Kutta methods and multistep methods which try to overcome the problems of their relatives for stiff problems by treating the linear term exactly.

A typical exponential Runge-Kutta method is the one-step method of Krogstad . Like the method of Cox and Matthews  it has 4th order accuracy in many circumstances. In the worst case the order reduces to 2 for Cox and Matthews or 3 for Krogstad. Order reduction has been studied in detail by Ostermann and Hochbruck . They have introduced a scheme that gives 4th order in all cases. We follow their formulation and introduce exponential Runge-Kutta methods for (1.1) as

[u.sub.n+1] = [u.sub.n] + h [s.summation over (i=1)] [b.sub.i](hA)[G.sub.ni],

[u.sub.ni] = [u.sub.n] + h [i-1.summation over (j=1)] [a.sub.ij](hA)[G.sub.nj],

[G.sub.nj] = g([t.sub.n] + [c.sub.j]h,[U.sub.nj]) + A[u.sub.n].

[FIGURE 5.2. OMITTED]

In order to simplify the notation we use the abbreviations

[[psi].sub.i,j] = [[psi].sub.i,j](hA) = [[psi].sub.i]([c.sub.j]hA), 2 [less than or equal to] j [less than or equal to] s

and

[[psi].sub.i] = [[psi].sub.i](hA).

We only give the method of Krogstad:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The implementation can be done "columnwise". Starting with [G.sub.n1] it is possible to evaluate all matrix-vector products of the first column. We calculate the savings introduced by approximating in common poles in the first column by counting the matrix-vector products. The terms [a.sub.21] and [a.sub.31] can be solved by solving one set of shifted linear systems rather than two sets. The terms [a.sub.41] and [b.sub.1] need an alternative set of poles, as a different scaling parameter c is used. In such situations contours offer some flexibility through equation (5.4). Using optimal poles we would have to solve the systems for [psi].sub.1], [psi].sub.2] and [psi].sub.3]. Thus we save the factor 5/2 for the first column. Repeating the argument for all columns we save a factor of 12/6 in total. This factor does not take into account the additional initial costs of computing the LU decompositions for 5 instead of 2 sets of shifted systems.

Exponential relatives of classic multistep methods are more attractive in terms of the underlying linear algebra than exponential Runge-Kutta methods. An early reference is an article by Norsett , but they have been rediscovered recently by Beylkin  and also by Cox and Matthews . Livermore has applied them to problems from magnetohydrodynamics . Ostermann et al.  have analyzed their stability and Calvo and Palencia  gave details about the starting process for abstract Cauchy problems. A complete derivation is given in the thesis of Minchev .

The underlying formula for (1.1) is here

(6.1) [u.sub.n] = [e.sup.hA][u.sub.n-1] + h [k.summation over (l=0)] [[beta].sub.l][g.sub.n-l].

The coefficients [[beta].sub.l] are linear combinations of matrix functions. The explicit ([[beta].sub.0] [equivalent to] 0) exponential Adams-Bashforth method of order 4 is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

This method serves as the predictor. An implicit exponential Adams-Moulton method may serve as corrector,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

It is standard for classic multistep methods to perform only the first step of a fixed-point iteration to solve (6.1). This idea is often referred as "PECE" form [10, Chapter 7.4]. Comparing the predicted and corrected result serves as an error control in implementations of the classic multistep methods. In order to predict it is necessary to evaluate [e.sup.hA][u.sub.n-1] and [[psi].sub.l](hA)[g.sub.n-1], l [greater than or equal to] 1. It is sufficient to solve two sets of shifted linear systems when working with a set of common poles. For a k-step method we therefore save the factor (k + l)/2 when using the same set of poles for [[psi].sub.1],..., [[psi].sub.k]. This factor does not take into account the possibility to reuse the poles for [[psi].sub.1],..., [[psi].sub.k] to compute [e.sup.hA][u.sub.n-1]. Using that approach we reduce the number of LU decompositions by a factor 2 of and may achieve an additional speedup by solving linear systems with 2 right-hand sides rather than solving 2 linear systems with one right-hand side each. Next one has to evaluate g{[u.sub.n], [t.sub.n]) in order to compute [[beta].sub.0] by solving again a set of shifted systems. All other coefficients are linear combinations of vectors already available from previous steps.

7. Numerical experiments. In this section we illustrate the potential of exponential integrators relying on rational approximations by two examples. Both examples are nonlinear reaction-diffusion equations with rather large diffusion constants to make the linear term both stiff and dominant in the evolution. The equations are posed in 1D and 2D, which results in advantageous band structures in the finite difference discretizations of the diffusion term.

These particular equations could be solved more efficiently using a diagonalization by means of Fourier transforms, or by spectral collocation methods. It should therefore be emphasized that they serve solely as a convenient means to illustrate our technique for dealing with [psi] functions. Errors introduced by the spatial discretization are not taken into account here.

We compare exponential integrators with established numerical methods for stiff systems of ordinary differential equations. Although we have worked with a uniform mesh in time, we find that our implementation can be competitive even with state-of-the-art adaptive methods such as Radau collocation methods  or the multistep backward differentiation formulas of MATLAB  when applied to reaction-diffusion equations with a mild nonlinearity. The methods we have compared are:

* The explicit exponential Runge-Kutta methods of Krogstad. We used two CF approximations of [[psi].sub.1](s) and [[psi].sub.1]{1/2s) with 12 poles each, and varied the number of poles; see Figures 7.2 and 7.3.

* The exponential multistep methods of order 4, 6 and 8. Starting values were computed using MATLAB's ode15s integrator. The startup calculation was not taken into account in the timings. We used a CF approximation of the exponential with 12 poles and a shift of s = 1.

* MATLAB's ode15s integrator, which is an adaptive solver based on backward differentiation formulas. Rather than reducing the stepsize to achieve higher accuracy, we reduced the tolerances for the absolute and relative error. The method is implicit and the linear systems are solved using direct methods, that is UMFPACK, which has been incorporated with effective matrix reorderings in MATLAB's ode15s in recent releases.

* The method RADAU5, which is an implementation of a fifth-order implicit Runge-Kutta method of RADAU IIA type, with stages and automatic stepsize control. For this we used a MATLAB implementation adapted from the thesis of Tee . The code is available online. (1)

The results reported here depend very strongly on the underlying numerical linear algebra, and in particular, an exponential integrator may perform better or worse than a competitor simply because of a switch from a direct method to an iterative method or vice versa. Therefore, for consistency, we used direct methods for all of our computations, relying on UMFPACK as the common underlying framework for solving all linear systems. The complex shifted symmetric matrices were reordered using a sparse reverse Cuthill-McKee ordering and symmetric approximate minimum degree permutations .

Before we present the results in detail, we shall take a moment to describe common properties of the graphs describing the performance of the methods being used. In Figures 7.2 and 7.3 the vertical axes represent the relative error. For all experiments we have computed an "exact" solution by using ode15s with very tight error tolerances. The error is then calculated as the 2-norm of the difference between the approximation and the exact solution, divided by the 2-norm of the solution. Thus the error plotted in the graphs is a relative one. The scaling of the horizontal axes varies. The relative time-steps displayed are scaled by the overall simulation time scale. A relative time-step h implies that 1/h steps have been taken. The computer time displayed is the CPU time we have measured. As is well known, such timings should not be relied upon too precisely, since in MATLAB we have limited control of internal routines optimizing certain code fragments. The number of evaluations of refers to the number of evaluations of the nonlinear reaction term.

The experiments were performed with MATLAB 7.4 on a HP workstation xw4200 with a 3.2 GHz Pentium 4 processor and 1 GByte of RAM running Windows XP

[FIGURE 7.1. OMITTED]

7.1. The Fisher equation. The Fisher equation  is a one-dimensional reaction- diffusion problem with a "logistic" reaction term

[u.sub.t] = D[u.sub.xx] + ru(l - u) x [member of][0,2].

For the experiments reported here we chose D = 0.05 and r = 0.01. The initial function is chosen as

[u.sub.0](x, t = 0) = exp(-20x) - x(x - 2) [cos.sup.2](5x[pi]/2).

The boundary values are fixed as

u(x = 2,t) = 0, u(x = 0,t) = 1.

We solve the equation for 0 [less than or equal to] t [less than or equal to] 0.1. The equation is semidiscretized by standard finite differences with a 3-point stencil on a regular grid with [DELTA]x = 1/2000. This results in an extremely stiff system. The matrix A introduced by this discretization is of dimension N = 1999 and tridiagonal. The left-most eigenvalue of the scaled matrix D A is approximately -2 x [10.sup.5]. Tridiagonal systems can be solved in O(N) operations, so the matrix is very well suited for any integrator based on rational approximation.

The results in Figure 7.2 emphasize the power of exponential integrators. In the first plot one should note the large relative timesteps of O([10.sup.-1]. This implies that only 10 steps have been taken. In the second plot one sees that exponential integrators outperform radau5 and ode15s. The Krogstad, Multistep 4 and Multistep 6 curves in this plot have benefitted by factors of approx. 2, 2.5 and 3.5, respectively, through the use of common poles. In the third plot graphs labeled [Krogstad.sub.n] show the relative error of Krogstad's method when using only n poles. The tremendous advantage of exponential integrators is revealed in fourth plot. They need far fewer evaluations of the nonlinear term.

[FIGURE 7.2. OMITTED]

7.2. The Allen-Cahn equation in 2D. The Allen-Cahn equation in 2D reads as 

[u.sub.t] = [epsilon][DELTA]u + (u - [u.sup.3]).

We solve this equation for [less than or equal to] t [less than or equal to] 0.1 with [epsilon] = 0.1 on the square [[0, 1].sup.2] with homogeneous Neumann boundary conditions. The initial condition is a trigonometric polynomial:

[u.sub.0] = c [8.summation over (i=1)] [8.summation over (j=1)] [r.sub.ij] cos(i[pi]x) cos(j[pi]y).

The coefficient matrix R is constructed, so as to be arbitrary but reproducible, by taking the first 64 digits of [pi]:

[r.sub.11] = 3, [r.sub.21] = 1, [r.sub.31] = 4, ..., [r.sub.12] = 5, ..., [r.sub.88] = 2.

These numbers are then normalized by

[r.sub.ij] = [r.sub.ij]/5 - 1.

The parameter c is chosen such that max |[u.sub.0]| = 1 on the square. For the spatial discretization A of the Laplacian we use standard finite differences on [[0, 1].sup.2]. The matrix A is of dimension [10.sup.4] x [10.sup.4] and symmetric.

[FIGURE 7.3. OMITTED]

The results of the experiment are shown in Figure 7.3. Broadly speaking the conclusions for this experiment are like those for the Fisher equation. Again the use of common poles has speeded up the Krogstad and multistep timings by factors of 2 to 4.

8. Conclusions and outlook. We have shown that [psi] functions can be evaluated for matrix arguments efficiently using rational approximations constructed via Caratheodory-Fejer approximation or contour integrals. This enables us to implement competitive exponential integrators for large stiff systems of ODEs. The rational approximations are typically twice as fast as the contour integrals as they require half as many poles for the same accuracy.

Exponential integrators rely on the fast evaluation of the matrix-vector product [[psi].sub.l](A)b for several l at once. Therefore we proposed the approximation in a set of common poles as in equation (4.1) and found that this enables us to reduce the work per step dramatically.

We should also mention that similar techniques can be used for other functions of interest .

Acknowledgments. Parts of this work were done when T.S. stayed for six weeks at the University of Dusseldorf, enjoying the hospitality of Marlis Hochbruck. She and Alexander Ostermann taught us a great deal about exponential integrators. Herbert Stahl gave us advice on best rational approximations. Christoph Ortner's insight on reaction-diffusion equations has always been helpful.

* Received October 31, 2006. Accepted for publication September 13, 2007. Published online on December 13, 2007. Recommended by M. Hochbruck.

REFERENCES

 S. M. Allen and J. W. Cahn, A microscope theory for antiphase boundary motion and its application to antiphase domain coarsening, Acta Metall., 27 (1979), pp. 1085-1095.

 A. I. Aptekarev, Sharp constants for rational approximations of analytic functions, Mat. Sb., 193 (2002), pp. 1-72.

 H. Berland, B. Skaflestad, and W. Wright, EXPINT--A MATLAB package for exponential integrators, ACM Trans. Math. Softw., 33 (2007).

 G. Beylkin, J. M. Keiser, and L. Vozovoi, A new class of time discretization schemes for the solution of nonlinear PDEs, J. Comput. Phys., 147 (1998), pp. 362-387.

 M. P. Calvo and C. Palencia, A class of explicit multistep exponential integrators for semilinear problems, Numer. Math., 102 (2006), pp. 367-381.

 A. J. Carpenter, A. Ruttan, and R. S. Varga, Extended numerical computations on the "1/9" conjecture in rational approximation theory, in Rational approximation and interpolation, P. Graves-Morris, E. Saff, and R. Varga, eds., Lecture Notes in Mathematics, 1105, Springer, Berlin, 1984, pp. 383-411.

 W. J. Cody, G. Meinardus, and R. S. Varga, Chebyshev rational approximations to [e.sup.-x] in [0, +[infinity]) and applications to heat-conduction problems, J. Approx. Theory, 2 (1969), pp. 50-65.

 S. M. Cox and P. C. Matthews, Exponential time differencing for stiff systems, J. Comput. Phys., 176 (2002), pp. 430-455.

 T. A. Davis, Direct methods for sparse linear systems, SIAM, Philadelphia, PA, 2006.

 P. Deuflhard and F. Bornemann, Scientific computing with ordinary differential equations, Springer-Verlag, New York, 2002.

 S. W. Ellacott, On the Faber transform and efficient numerical rational approximation, SIAM J. Numer. Anal., 20 (1983), pp. 989-1000.

 R. Fisher, The wave of advance of advantageous genes, Annals of Eugenics, 7 (1937), pp. 355-369.

 G. H. Golub and C. F. Van Loan, Matrix computations, 3rd ed., Johns Hopkins University Press, Baltimore, 1996.

 A. A. Gonchar and E. A. Rakhmanov, Equilibrium distributions and degree of rational approximation of analytic functions, Math. USSR Sbornik, 62 (1989), pp. 305-348.

 E. Hairer and G. Wanner, Solving ordinary differential equations. II. Stiff and differential-algebraic problems, Springer-Verlag, Berlin, 1991.

 G. H. Halphen, Traite des fonctions elliptiques et de leurs applications. Premiere partie. Theorie des fonctions elliptiques et de leurs developpement en series, Gauthier-Villars, Paris, 1886. http://moa.cit.cornell.edu/

 N. J. Higham, Accuracy and stability of numerical algorithms, 2nd ed., SIAM, Philadelphia, 2002.

 --, The scaling and squaring method for the matrix exponential revisited, SIAM J. Matrix Anal. Appl., 26(2005), pp. 1179-1193.

 M. Hochbruck and C. Lubich, On Krylov subspace approximations to the matrix exponential operator, SIAM J. Numer. Anal., 34 (1997), pp. 1911-1925.

 M. Hochbruck, C. Lubich, and H. Selhofer, Exponential integrators for large systems of differential equations, SIAM J. Sci. Comput., 19 (1998), pp. 1552-1574.

 M. Hochbruck and A. Ostermann, Explicit exponential Runge-Kutta methods for semilinear parabolic problems, SIAM J. Numer. Anal., 43 (2005), pp. 1069-1090.

 --, Exponential Runge-Kutta methods for parabolic problems, Appl. Numer. Math., 53 (2005), pp. 323-339.

 A.-K. Kassam, Solving reaction-diffusion equations ten times faster, Tech. Report NA 03/16, Oxford University Computing Laboratory, 2003.

 A.-K. Kassam and L. N. Trefethen, Fourth-order time-stepping for stiff PDEs, SIAM J. Sci. Comput., 26 (2005), pp. 1214-1233.

 S. Krogstad, Generalized integrating factor methods for stiff PDEs, J. Comput. Phys., 203 (2005), pp. 72-88.

 P. Livermore, An implementation of the exponential time differencing scheme to the magnetohydrodynamic equations in a spherical shell, J. Comput. Phys., 220 (2007), pp. 824-838.

 Y. Y. Lu, Computing a matrix function for exponential integrators, J. Comput. Appl. Math., 161 (2003), pp. 203-216.

 A. P. Magnus, CFGT determination of Varga's constant '1/9', 1985. handout of the Pade approximation meeting organized by C. Brezinski at Marseille-Luminy, October 14-18, 1985.

 --, Asymptotics and super asymptotics of best rational approximation error norms for the exponential function (the '1/9' problem) by the Caratheodory-Fejer method, in Nonlinear Methods and Rational Approximation II, A. Cuyt et al., ed., Kluwer, Dordrecht, 1994, pp. 173-185.

 G. Meinardus, Approximation of functions: Theory and numerical methods, Springer, Berlin, 1967.

 B. V. Minchev, Computing analytic matrix functions for a class of exponential integrators, Reports in Informatics 278, University of Bergen, Bergen, Norway, 2004.

 --, Exponential integrators for semilinear problems, PhD thesis, University of Bergen, 2004.

 B. V. Minchev and W. M. Wright, A review of exponential integrators for first order semi-linear problems, Tech. Report 2/05, Norwegian University of Science and Technology, Trondheim, Norway, 2005.

 C. Moler and C. Van Loan, Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later, SIAM Rev., 45 (2003), pp. 3-49.

 M. Mori, Discovery of the double exponential transformation and its developments, Publ. RIMS, Kyoto University, 41 (2005), pp. 897-935.

 S. P. Norsett, An A-stable modification of the Adams-Bashforth methods, in Conf. on Numerical Solution of Differential Equations (Dundee, Scotland, 1969), Springer, Berlin, 1969, pp. 214- 219.

 A. Ostermann, M. Thalhammer, and W. Wright, A class of explicit exponential general linear methods, BIT, 46 (2006), pp. 409-431.

 Y. Saad, Analysis of some Krylov subspace approximations to the matrix exponential operator, SIAM J. Numer. Anal., 29 (1992), pp. 209-228.

 T. Schmelzerand L. N. Trefethen, Computing the gamma function using contour integrals and rational approximations, SIAM J. Numer. Anal., 45 (2007), pp. 558-571.

 L. F. Shampine and M. W. Reichelt, The MATLAB ODE suite, SIAM J. Sci. Comput, 18 (1997), pp. 1-22.

 T.-W. Tee, An adaptive rational spectral method for differential equations with rapidly varying solutions, PhD thesis, Oxford University, 2006.

 L. N. Trefethen, Rational Chebyshev approximation on the unit disk, Numer. Math., 37 (1981), pp. 297-320.

 --,Chebyshev approximation on the unit disk, in Computational aspects of complex analysis, H. Werner et al., ed., D. Reidel Publishing, Dordrecht, 1983, pp. 309-323.

 L. N. Trefethen and M. H. Gutknecht, The Caratheodory-Fejer method for real rational approximation, SIAM J. Numer. Anal., 20 (1983), pp. 420-436.

 L. N. Trefethen, J. Weideman, and T. Schmelzer, Talbot quadratures and rational approximations, BIT, 46 (2006), pp. 653-670.

 J. A. C. Weideman and L. N. Trefethen, Parabolic and hyperbolic contours for computing the Bromwich integral, Math. Comp., 76 (2007), pp. 1341-1356.

(1) See the webpage of the author: www.comlab.ox.ac.uk/thomas.schmelzer

THOMAS SCHMELZER ([dagger]) AND LLOYD N. TREFETHEN ([double dagger])

([dagger]) Computing Laboratory, Oxford University, United Kingdom (thoms@comlab.ox.ac.uk). Thomas Schmelzer is supported by a Rhodes Scholarship.

([double dagger]) Computing Laboratory, Oxford University, United Kingdom (lnt@comlab.ox.ac.uk).
```TABLE 4.1

Using degree n rational approximations with common poles the maximal
error [[phi].sub.1]-r is given for l = 0, 1, 2, 3 and n = 6, 8, 10, 12.
A column represents CF approximations for fixed l. An example: Given
the CF approximation of degree 10 for [[phi].sub.1] the error committed
by approximating [[phi].sub.3] with the same poles using identity (4.1)
is 7.3e--09. To determine an approximation for the maximal error we
have computed the difference in 500 points distributed in [-10.sup.5]
to [-10.sup.-5].

[[phi].sub.0] [[phi].sub.1]

n = 6 1.0e--06 9.3e--05
5.3e--05 8.5e--08
4.6e--04 4.0e--08
1.6e--03 3.1e--05

n = 8 1.2e--08 1.7e--06
8.0e--07 7.5e--10
9.1e--06 4.7e--08
4.2e--05 4.9e--07

n = 10 1.4e--10 2.9e--08
1.1e--08 7.1e--12
1.6e--07 5.6e--10
9.1e--07 7.3e--09

n = 12 1.6e--12 4.7e--10
1.6e--10 6.8e--14
2.6e--09 6.5e--12
1.8e--08 1.0e--10

[[phi].sub.2] [[phi].sub.3]

n = 6 2.2--03 3.0--02 [[phi].sub.0]
9.7e--06 2.7e--04 [[phi].sub.1]
7.0e--09 9.5e--07 [[phi].sub.2]
2.9e--07 5.6e--10 [[phi].sub.3]

n = 8 6.2e--05 1.2e--03 [[phi].sub.0]
1.3e--07 5.5e--06 [[phi].sub.1]
4.8e--11 9.9e--09 [[phi].sub.2]
2.8e--09 3.0e--12 [[phi].sub.3]

n = 10 1.5e--06 3.8e--05 [[phi].sub.0]
1.8e--09 1.0e--07 [[phi].sub.1]
3.7e--13 1.1e--10 [[phi].sub.2]
2.7e--11 1.9e--14 [[phi].sub.3]

n = 12 3.1e--08 1.0e--06 [[phi].sub.0]
2.7e--11 1.7e--09 [[phi].sub.1]
4.3e--15 1.2e--12 [[phi].sub.2]
2.7e--13 5.6e--16 [[phi].sub.3]

TABLE 4.2

Same as the first column of Table 4.1, but with a shift s introduced as
in (4.4). The numbers are better, and s = 1 seems a good choice in
practice.

s = 1/2 s = 1 s = 2 s = 5

n = 6 1.6e--06 2.7e--06 7.5e--06 1.5e--04 [[phi].sub.0]
1.0e--05 1.1e--05 2.3e--05 2.4e--04 [[phi].sub.1]
2.2e--05 2.4e--05 1.8e--05 1.3e--04 [[phi].sub.2]
9.7e--05 4.4e--05 4.2e--05 9.4e--05 [[phi].sub.3]

n = 8 1.9e--08 3.2e--08 8.7e--08 1.7e--06 [[phi].sub.0]
1.5e--07 1.5e--07 2.5e--07 2.8e--06 [[phi].sub.1]
4.3e--07 3.8e--07 5.7e--07 3.0e--06 [[phi].sub.2]
1.3e--06 6.6e--07 5.6e--07 1.6e--06 [[phi].sub.3]

n = 10 2.4e--10 3.e7--10 1.0e--09 2.0e--08 [[phi].sub.0]
1.1e--09 1.7e--09 3.4e--09 3.9e--08 [[phi].sub.1]
9.0e--09 6.9e--09 7.5e--09 4.8e--09 [[phi].sub.2]
1.2e--08 1.0e--08 8.8e--09 3.2e--08 [[phi].sub.3]

n = 12 2.6e--12 4.3e--12 1.2e--11 2.4e--10 [[phi].sub.0]
2.1e--11 3.0e--11 4.9e--11 6.1e--10 [[phi].sub.1]
1.0e--10 5.3e--11 8.7e--11 6.0e--10 [[phi].sub.2]
3.4e--10 2.3e--10 1.8e--10 7.1e--10 [[phi].sub.3]
```
COPYRIGHT 2007 Institute of Computational Mathematics
No portion of this article can be reproduced without the express written permission from the copyright holder.