Printer Friendly

Expokit: a software package for computing matrix exponentials.

Expokit provides a set of routines aimed at computing matrix exponentials. More precisely, it computes either a small matrix exponential in full, the action of a large sparse matrix exponential on an operand vector, or the solution of a system of linear ODEs with constant inhomogeneity. The backbone of the sparse routines consists of matrix-free Krylov subspace projection methods (Arnoldi and Lanczos processes), and that is why the toolkit is capable of coping with sparse matrices of large dimension. The software handles real and complex matrices and provides specific routines for symmetric and Hermitian matrices. The computation of matrix exponentials is a numerical issue of critical importance in the area of Markov chains and furthermore, the computed solution is subject to probabilistic constraints. In addition to addressing general matrix exponentials, a distinct attention is assigned to the computation of transient states of Markov chains.

Categories and Subject Descriptors: G.1.3 [Numerical Analysis]: Numerical Linear Algebra; G.1.7 [Numerical Analysis]: Ordinary Differential Equations--initial value problems; G.4 [Mathematics of Computing]: Mathematical Software

General Terms: Algorithms

Additional Key Words and Phrases: Krylov methods, Markov chains, matrix exponential


Evaluating the action of the matrix exponential operator on an arbitrary vector is an important problem that arises in mathematics, physics, and engineering. For example, in control theory, a central problem consists in solving the system of ODEs

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] where w(t) is the state vector, A [is an element of] [R.sup.nxn] is the state companion matrix, u(t) [is an element of] [R.sup.m] is the control, and B [is an element of] [R.sup.nxm]. The state vector w(t) is given explicitly by


As another example, a successful and widely used way of modeling the behavior of many physical systems consists in enumerating the (mutually exclusive) states in which the system may be at a given time and then describing the interaction between these states. The analysis of performance of these systems requires quantifying several attributes, including reliability, availability, and performability. Frequently, finite-state Markov chains constitute powerful mathematical tools that are used to achieve these purposes [Berman and Plemmons 1979; Ciardo et al. 1990; Neuts 1981; Seneta 1981; Stewart 1994]. Under suitable hypotheses, the evolution of several physical systems may be governed by the Chapman-Kolmogorov system of differential equations:


Its analytic solution is w(t) = [e.sup.tA]v and represents the transient probability distribution of the Markov chain. The coefficient matrix A is an infinitesimal generator of order n, where n is the number of states in the Markov chain. Thus we have A [is an element of] [R.sup.nxn], with elements [a.sub.ij] [is greater than or equal to] 0 when i [not equal to] j and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Useful performance parameters can be derived after computing transient solutions. In particular, the ith component of w(t) is the probability that the physical system will be in the state numbered i at time t. An in-depth analysis of the behavior of a given system is usually done by investigating the influence of model parameters (e.g., the initial condition v) on the solution vector w(t). As the number of trials grows, the amount of computation explodes. Moreover, in real-time simulations, calculations must be done within a very short delay, and as the size of the problem grows, only efficient computers, running efficient software, are able to perform at the desired speed.

The computation of the matrix exponential is often not an end in itself. It can also be used in a preconditioner-like manner to find the rightmost eigenvalues, e.g., Saad [1992b, p. 277] and Meerbergen and Sadkane [1996], and it can be the elemental building-block in some solution techniques of ODEs and DAEs, e.g., Lawson [1967], Hochbruck and Lubich [1997], and Hochbruck et al. [1998].

A direct way to define the matrix exponential exp(A) is undoubtedly through the exponential power series expansion


whose convergence is guaranteed for any square matrix A. Other definitions exist, each alternative being of theoretical and/or practical interest, depending on the desired end and on the class of matrices under consideration. Although papers and textbooks dealing with matrix exponentials are abundant in the literature, Moler and Van Loan [1978] cautioned that practical implementations are "dubious" in the sense that a sole implementation might not be entirely reliable for all classes of problems. The explicit computation of the matrix exponential function is difficult when the argument matrix is of large norm or of wide departure from normality. Besides, the difficulty grows worse when the order of the matrix is large.

The case where the matrix is of moderate dimension has benefited from extensive studies such as Ward [1977], Parlett and Ng [1985], Fassino [1993], just to name a few, which resulted into implementations that appear satisfactory in several uncontrived practical instances. Traditional approaches are either based on Taylor series, rational Pade approximations, or the Schur decomposition--the later being used in conjunction with a recurrence relationship among the elements of the exponential of the triangular factor [Parlett 1976; Parlett and Ng 1985]. Apart from the Taylor series approach, these techniques require the full matrix and thus are applicable only when the order of the matrix does not exceed a few hundreds.

Expokit deals with small matrices as well as large sparse matrices. The software package handles real and complex matrices and supplies specific routines for symmetric and Hermitian matrices. A distinct attention is given to matrices arising from Markov chains. The backbone of the sparse routines consists of matrix-free Krylov subspace projection techniques (the well-known Arnoldi and Lanczos processes), and that is why the toolkit is capable of coping with sparse matrices of large dimension. It seems these techniques were long established among chemical physicists without much justification other than their satisfactory behavior. But since the theoretical characterization of Gallopoulous and Saad [1992] and Saad [1992a], they are gaining wide acceptance, and their use here and elsewhere shows remarkable success (e.g., see Gallopoulous and Saad [1992], Saad [1992a], Leyk and Roberts [1995], Sidje and Stewart [1998], and related references). Recently, Hochbruck and Lubich [1997] made a major contribution by improving the a priori error bounds considerably, thus yielding a better understanding as to why the techniques are working so well. Within the field of Markov chains, relevant studies appear in Sidje [1994] and Philippe and Sidje [1995].

The knowledge gathered in recent research warrants now the production of intelligible software that will assist scientists in tackling some of the significantly large problems arising in modeling and engineering. An attempt in this direction briefly appears in Saad [1994], but Expokit is the first comprehensive package aiming specifically at this purpose right from the outset. A number of scientists have already shown interest in using the software, and their positive feedback somewhat reflects the appropriateness and usefulness of the approach. Investigations toward parallel algorithms that will enable for addressing much larger problems have already been initiated in Sidje [1994; 1997]; however, we shall concentrate here on the serial framework. We start by presenting the foundation of the algorithms (Sections 2 through 5). We subsequently illustrate their utilization (Section 6) and indicate where the portable routines implementing them can be retrieved (Section 7).


Expokit provides routines for computing small matrix exponentials. They are based on rational Chebyshev or Pade approximations. The rational Chebyshev approximation comes from extending the minimax Chebyshev theory to rational fractions. Cody et al. [1969] have stated and solved the problem: find [C.sub.pp](x) [equivalent] [N.sub.pp](x)/[D.sub.pp](x) such that


where [R.sub.pq] denotes the class of (p, q)-degree rational functions. For p = 1,2, ..., 14, the coefficients of the best approximants [N.sub.pp](x) and [D.sub.pp](x) were computed and listed. Subsequently, Carpenter et al. [1984] extended the list up to p = 30. The positive point here is that it is possible to compute (and keep) the poles {[[Theta].sub.i]} and to consider the partial fraction expansion


to obtain directly


The poles {[[Theta].sub.i]} and the coefficients {[[Alpha].sub.i]} were computed and listed in Gallopoulos and Saad [1992] for p = 10 and p = 14. Expokit only implements the case p = 14. When [Tau]H is real, an LU decomposition is saved for complex conjugate pair of poles. Thus the whole cost can be reduced by half. When [||[Tau] H||.sub.2] becomes large, we have found this approach less stable than the diagonal Pade approximations which are theoretically A-acceptable (e.g., see Iserles and Norsett [1991]). Given the domain of applicability specified in (4), the approximations above are more suited for symmetric negative definete matrices, in which case

||exp([Tau]H) - [C.sub.pp] [(-[Tau]H)||.sub.2] [is less than or equal to] [[Lambda].sub.pp]

where [[Lambda].sub.pp] is an explicitly known constant referred to as the uniform rational Chebyshev constant [Varga 1990]. It is now known that [[Lambda].sub.pp] [approximately equal] [10.sup.-p], which means that a type (p, p)-approximation yields p-digit accuracy. However, this bound does not hold for a general matrix H having a complex spectrum and/or a poorly conditioned system of eigenvectors. As shown in Figure 1, if employed where not intended, these Chebyshev approximants become invalid, and large errors occur. Arguably, one can always apply a shift to the exponential as [e.sup.z] = [e.sup.s][e.sup.(z-s)] [approximately equals] [e.sup.s][C.sub.pp](-(z - s)), but [e.sup.s] quickly becomes large and magnifies the error.


The type (p, q) Pade approximation for [e.sup.x] is the (p, q)-degree rational function [P.sub.pq](x) [equivalent] [N.sub.pq](x)/[D.sub.pq](x) obtained by solving the algebraic equation


i.e., [P.sub.pq](x) must match the Taylor series expansion up to order p + q. For stability and economy of computation, it is advantageous to set p = q. Indeed, for the exponential function, we fortuitously have

(2) [P.sub.pp](x) = [N.sub.pp](x)/[N.sub.ppp](-x)



with [c.sub.0] = 1, [c.sub.k] = [c.sub.k]-l((p + 1 - k)/((2p + 1 - k)k)). As noted in Sidje [1994], it is more economical to use the following irreducible form rather than (2):


The Horner evaluations of the numerator and the denominator in (3) need half the operations of (2), and a careful implementation uses only four extra matrices with no shuffling of data (see the routine _PADM in this toolkit). However, beyond these considerations there is a major drawback of the Pade approximations: they are only accurate near the origin so that the approximation of exp([Tau]H) is not valid when [||[Tau]H||.sub.2] is too large. Moreover when [Tau]H has widely spread eigenvalues, the computation of [P.sub.pp]([Tau]H) involves an ill-conditioned linear system. Fortunately these problems disappear, and an acceptable accuracy can be obtained even for a small degree p if we make use of the exponential property [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] which is referred to as "scaling and squaring." We then use the approximation [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] evaluated by repeated squaring. The principal drawback of the resulting algorithm may come from the fact that if [much greater than] 1 then, the computed squares can be contaminated by rounding errors, and the cost becomes large. An inverse error analysis made in Moler and Van Loan [1978] has shown that if [||[2.sup.-2][Tau]H||.sub.[infinity]] 1/2 then




Thus a value of p = 6 is generally satisfactory. Other roundoff error considerations are studied in detail in Ward [1977].

As we shall see below, large sparse techniques rely upon small dense methods. Of the two small dense methods just described, Expokit uses by default the Pade method at the core of its large sparse techniques--users can decide otherwise. The preference went to the Pade method because it resolves the definite and indefinite cases equally while the Chebyshev method is apt to compute the direct action of the matrix exponential, exp([Tau]H)y, when it is known in advance that H is symmetric negative definite.

Algorithm 1. Compute w(t) = exp(tA)v
   w : = v ; [t.sub.k] : = 0;
   [H.sub.m+2] : = zeros[m + 2, m + 2];
   while [t.sub.k] [is less than] t do
    v : = w ; [Beta] : = [||v||.sub.2];
    [v.sub.1] : = u/[Beta] ;
    for j : = 1 : m do {Arnoldi process}
      p : = [Av.sub.j];
      for i : = 1 : j do
        p : = p - [h.sub.ij][v.sub.i];
      [h.sub.j+1], j : = [||p||.sub.2];
      if [h.sub.j+1], j [is less than or equal to] tol  [||A||
      [v.sub.j+1] : = p/[h.sub.j+1], j;
    H(m + 2, m + 1): = 1;
      [Tau] = step size;
      [F.sub.m+2] : = exp([Tau][H.sub.m+2]);
      w : = [Beta][V.sub.m+1] F(1 : m + 1,1);
      [err.sub.loc] : = local error estimate
    until [err.sub.loc] [is less than or equal to] [Delta]tol;
    [t.sub.k] : = [t.sub.k] + [Tau];

Consider a large sparse n-by-n matrix A (real or complex), an n-vector v, and a scalar t [is an element of] R (presumably the "time"). For ease of presentation, consider t [is greater than] 0 since exp(-tA) = exp(t(-A)). Algorithm i computes an approximation of w(t) = exp(tA)v. The algorithm purposely sets out to compute the matrix exponential times a vector rather than the matrix exponential in isolation. The underlying principle is to approximate


by an element of the Krylov subspace [K.sub.m](tA, v) = Span{v, (tA)v, ..., [(tA).sup.m-1]v}, where m, the dimension of the Krylov subspace, is small compared to n, the order of the principal matrix (usually m [is less than or equal to] 50 while n can exceed many thousands). The approximation being used is

(6) w(t) = [Beta][V.sub.m+1] exp(t[H.sub.m+1])[e.sub.1]

where [e.sub.1] is the first unit basis vector, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [H.sub.m+1] =[[h.sub.ij]] are, respectively, the orthonormal basis and the upper Hessenberg matrix resulting from the well-known Arnoldi process (e.g., see Saad [1992b] and Golub and Van Loan [1996]). When the matrix is symmetric or Hermitian, the Arnoldi process is replaced by the Lanczos process, and computational savings occur. If the minimal degree of v is some integer r (v [is less than or equal to] m [is less than or equal to] n) then an invariant subspace is found, and the approximation [Beta][V.sub.v] exp(t[H.sub.v])[e.sub.1] is exact. This situation is usually referred to as a "happy breakdown" and, in exact arithmetic, happens at least when m = n. The mathematical basis of the selected approximation has been documented [Gallopoulos and Saad 1992; Hochbruck and Lubich 1997; Saad 1992a; Sidje 1994]. In particular, it has been established that this approximation is better than the m-fold Taylor expansion--which highlights the fact that for the same amount of matrix-vector products, the Krylov approximation is better than the Taylor approximation (the Krylov approach involves however more local calculation, notably the Gram-Schmidt sweeps). In fact, Hochbruck and Lubich [1997] have shown that the error in the Krylov method behaves like [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] when m [is greater than or equal to] 2t [[parallel]A[parallel].sub.2]. They gave sharper bounds depending on the class of the matrix and/or the location and shape of its spectrum, which illustrated that the technique can work quite well even for moderate m. Using Chebyshev series expansion, Druskin and Knizhnerman [1995] and Stewart and Leyk [1996] have also provided other relevant insights on the error bound for the symmetric case.

The distinctive feature to underscore in the Krylov approximation is that the original large problem (10) has been converted to the small problem (11) which is more desirable. The explicit computation of exp(t[H.sub.m+1]) is performed using known dense algorithms. In the present toolkit, this nucleus is handled either with the irreducible rational Pade method combined with scaling-and-squaring or the uniform rational Chebyshev approximation as shown earlier.

The description of the algorithm would be incomplete if we omit to mention that, in reality, due to stability and accuracy requirements, w (t) is not computed in one stretch. On the contrary, a time-stepping strategy along with error estimations is embedded within the process. Typically, the algorithm evolves with the integration scheme




Consequently, in the course of the integration, one can output discrete observations (if they are needed) at no extra cost. Nevertheless, it is clear from (12) that the crux of the problem remains an operation of the form [e.sup.[Tau]A]v, with different [Tau]'s and v's. The selection of a specific step size [Tau] is made so that [e.sup.[Tau]A]v is now effectively approximated by [Beta][V.sub.m+1] exp([Tau][H.sub.m+1])[e.sub.1]. Following the procedures of ODEs solvers, an a posteriori error control is carried out to ensure that the intermediate approximation is acceptable with respect to expectations on the global error. The starting point of all these critical issues is the following expansion series established in Saad [1992a]:





The functions [Psi].sub.j] are positive, increasing, and [[Psi].sub.j+1] [is less than or equal to] [[Psi].sub.j]/j in [0,+[infinity]). Moreover they become smoother as j increases. A way to control the error and the step size has been described in Sidje [1994] and Philippe and Sidje [1995] as we shall now summarize. For a small step size the next term in (8) is an appropriate estimate of the local truncation error. But this estimate is insufficiently accurate for a large step size. This is due to the fact that in that case, in magnitude, the terms of the expansion series grow before they decay.

Algorithm 2. Local truncation error estimation



if err1 [much greater than] err2 then {small step size: quick convergence)

err : = err2;

else if err1 [is greater than] err2 then {slow convergence}

err : = err2/(1 - (err2/err1));

else {err1 [is less than] err2: asymptotic convergence}

err : - err1;


[err.sub.loc] : = max(err, roundoff);

A study of the asymptotic behavior of the error term in (12) suggested using the estimator above. It is an emulation of the problem of approximating an infinite series by its pth partial sum:

(1) if the terms of the series decrease rapidly (in magnitude) then the next (p + 1)st term is an appropriate estimate of the error;

(2) if the terms decrease slowly then, the geometric limit is considered;

(3) if the terms start decreasing at order P where P [much greater than] p, then the (p + 1)st term has no meaning. We take the pth term as an estimate of the error. This choice is enlightened in Hochbruck et al. [1998, Section 6.3].

The following assertions, homologous to Sidje [1994, p. 99], show an effective way to cheaply compute the coefficients [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

THEOREM 1. Let c [is an element of] [C.sup.m] and




PROOF 1. Consider the block upper triangular decomposition




From [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] we have [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] Multiplying on the right by [[Epsilon].sub.j] to extract the jth column, we obtain the recurrence relations



Corollary 1. Let c [is an element of] [C.sup.m] and




PROOF. Applying Theorem 1 with [H.sub.m] replaced by [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and taking the conjugate transpose of the result yields the assertion.

Notice that these results remain valid without assumptions on [H.sub.m] (it needs not be Hessenberg and/or invertible). When setting [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] in particular, the desired coefficients of the expansion series (8) and (9) can be recovered in the first column just below the mth row of exp([Tau] [H.sub.m+p]). The monitoring of the step size [Tau] and the size of the Krylov basis m then follows heuristics of ODE solvers (e.g., see Gustafsson [1991]). We base the step size selection on the measure

[[Epsilon].sub.1] = [parallel][[Epsilon].sub.j][parallel]/[[Tau].sub.j-1], error per unit step (EPUS)

where [parallel][[Epsilon].sub.j][parallel] is the local truncation error approximated by err. Unless otherwise specified, [parallel] [multiplied by] [parallel] denotes the euclidian norm. If tol denotes the prescribed tolerance to be achieved, the first step size is chosen to satisfy a known theoretical bound, and within the integration process, the step size is selected by means of the formula

(10) [[Tau].sub.j] = [Gamma] [(tol/[[Epsilon].sub.j]).sup.1/r] [[Tau].sub.j-1]

rounded to 2 significant digits to prevent numerical noise. The value of r is m - 1 or m depending on whether the error estimate err comes from err1 or err2, respectively (see Algorithm 2). A step is rejected if [[Epsilon].sub.j+1] [is greater than] [Delta] tol. (The scalars [Gamma] and [Delta] are safety factors intended to reduce the risk of rejection of the step. They have been assigned the classical values 0.9 and 1.2, respectively--they can be modified by the user.) With this step size mechanism, the "accumulated global error" is upper bounded for a fixed integration domain regardless of the number of steps used:


Notice that since [parallel][[Epsilon].sub.j][parallel] is estimated by err which itself is subject to (4), the error measurement should be understood in a weighted sense. Further more, since for j = 1, ..., k, [[Epsilon].sub.j] are local truncation error vectors, i.e., [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], we obtain


and therefore the error actually satisfies (letting t [equivalent] [t.sub.k])


It is well known that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] if and only if a(A) [is less than] 0, where [Alpha](A) = max {Re([Lambda]), [Lambda] [is an element of] [Lambda](A)}. An upper bound on (18) can be obtained by using


This straight bound is very pessimistic however, and besides, it does not decay with [Alpha](A). A review of various bounds for [parallel][e.sup.[Tau]A] is provided in Van Loan [1977] where it is shown that an effective bound is


with N being the strict upper part of the triangular factor of the Schur decomposition of A. The lower bound [parallel][e.sup.[Tau]A] = [e[Alpha](A)[Tau] is attained if A is normal. In general, even if [Alpha](A) [is less than] 0, [parallel][e[Tau](A)[parallel] can still grow before it decays (the so-called "hump" effect), and whether this situation occurs or not depends on the sign of [micro](A), i.e.,


where [micro](A) = [[Lambda].sub.max]((A + A*)/2) is the logarithmic norm of A; recall that [Alpha](A) [is less than or equal to] [micro](A). Figure 2 gives a detailed characterization of the behavior of [parallel][e.sup.[Tau]A][parallel].


The routines in Expokit attempt to achieve (11), and thus the user may bear in mind the aforementioned issues, especially the potential discrepancies that may result from (4) or (12). These issues are related to the conditioning of the matrix exponential in itself [Van Loan 1977]. Our control (11) is conservative, and together with Algorithm 2, they prove to be nicely effective if the problem is not exaggeratedly ill-conditioned. It is also meaningful to observe that the relative error satisfies the following (recall that [Beta] = [parallel]v[parallel] and [v.sub.1] = v/[Beta]):


for some t [element of] [0, t] and v such that ||v|| = 1. Therefore if [e.sup.tA]v[parallel]/[parallel][e.sup.tA][v.sub.1][parallel] [approximately equals] 1, we can relate (11) to a bound on the relative error. Otherwise if [parallel][e.sup.tA]v[parallel] [much greater than] 1 or [parallel][e.sup.tA][v.sub.1][parallel] [much less than] 1, the upper bound in (19) becomes too large and has little practical value. Upon exit [parallel][e.sup.tA][v.sub.1][parallel] is readily approximated by [parallel]w(t)[parallel]/[Beta], and as a by-product of the computations, Expokit also returns the following approximation:


When a step size is rejected it is reduced by using formula (16) once more, but there is also the opportunity (not implemented in this release) of adjusting (increase or decrease) the dimension of the Krylov subspace by the way of the relation


As an epilogue to the description of the algorithm, it is worth reminding that the primary matrix interacts only through matrix-vector products. Hence the method is a matrix-free method and conveys all the associated advantages.


The Markovian context brings other probabilistic considerations. The approximation of w(t)= exp(tA)v is subject to the constraint that the resulting vector must be a probability vector, thus with components in the range [0,1] and with sum equal to 1. Since the analytic solution of the Chapman-Kolmogorov system of differential equations is w (t), its computation can be addressed totally in the perspective of ODEs. But general purpose ODEs solvers do not bring any guarantee either. Practice shows that they can even produce negative components--which have no meaning in terms of probabilities.

A number of results of interest were provided in Sidje [1994] and Philippe and Sidje [1995] by exploiting the fact that the matrix A is an infinitesimal generator of a Markov chain and satisfies some inherent properties. With the Krylov approach, the sum condition is fulfilled, i.e., [??.sup.T]w([t.sub.k]) = 1 where ?? = [(1, ..., 1).sup.T]. If during the integration process (12) some components of an intermediate approximation happen to be negative, the components under concern are less than the error tolerance (in magnitude); otherwise this intermediate solution would be rejected by the error control. Therefore a handy way to overcome any difficulty is to set the negative components to zero and perform a normalization afterward. Another way to proceed is to reduce the step size. It is mathematically guaranteed that the Krylov approximation is a probability vector for small enough step sizes. Markov chains conform to Figure 2(d), and it is shown that the global error in the approximation grows at most linearly, i.e., (12) becomes [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Additionally, it is possible to detect and cope with excessive roundoff errors as follows. Letting w([t.sub.k]) be the computed value of w([t.sub.k]) and using the fact that [??.sup.T]([t.sub.k]) = 1 and the Holder inequality [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], we have


Hence if the quantity roundoff is far from the machine precision then the computed approximation w([t.sub.k]) is too contaminated, and the process should be stopped. This is a sufficient condition to indicate a high level of roundoff errors. The indicator is deficient when the vector w([t.sub.k]) - w([t.sub.k]) is orthogonal to. This may happen, but it is expected to be rare. Another interesting feature which is also worth mentioning is the capacity to detect the stationary probability distribution [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] the ith component of [w.sub.[infinity] is the probability that the Markov chain will be in the state numbered i at statistical equilibrium). The detection of the steady-state is made possible by the "happy breakdown."

The ensuing customization of the generic algorithm incorporating all these aspects proves to be a fairly reliable and versatile algorithm for the computation of transient states of Markov chains. Extensive experiments conducted in Sidje and Stewart [1998] have shown the benefit of the method.


Quite often the linear system of ODEs from which the matrix exponential arises is nonhomogeneous and has a constant forcing term:


If the matrix A is nonsingular the solution can be written as w(t) = [e.sup.tA](v + [A.sup.-1]u) - [A.sup.-1]u, so with one extra linear system solve it can be recovered as described earlier. However, a much cheaper and unrestricted approach which does not depend on the invertibility of A is also possible through techniques similar to that of the homogeneous case. The explicit solution of (15) is


where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Therefore an integration scheme can be obtained as


Indeed (16) yields


Hence the crux of the integration scheme (17) is now an operation of the form [Psi]([Tau]A)v which can be approximated in a way similar to exp([Tau]A)v. In a more precise sense we have the following generalizations.

THEOREM 2. For every p [is greater than or equal to] 0,


PROOF. For p = 0, we recover directly (8) and (9). Then from (8) we can write


Therefore using the fundamental relation [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], we have


and since this holds for any v, we necessarily have



with the "corrected" equality (19) following from the fact that


which can be proved thanks to the relation [[Psi].sub.1]([Tau][H.sub.m+1])[H.sub.m+1] = [H.sub.m+1][[Psi].sub.1]([Tau][H.sub.m+1]). The proof for higher p proceeds by induction in a similar way. [square]

It is then justified from (19) that one can consider the approximation


so that, in essence, the original large problem (16) reduces to the computation of the small-sized problem [[Psi].sub.1]([Tau][H.sub.m+1])[e.sub.1], and techniques discussed in Section 2 can be applied. For instance, [[Psi].sub.1](z) can be recovered from the Chebyshev partial fraction expansion of [e.sup.z] as


which allows for obtaining [[Psi].sub.1]([Psi][H.sub.m+1])[e.sub.1] directly. However, the Chebyshev approximations are not always valid, so it is desirable to use the robust alternative proposed in Theorem 1 which provides the information needed for the approximation as well as the extra coefficients of the expansion series useful for the estimation of the error. Interestingly also, no special coding is needed. The computation of all these is done at once with a direct call to one of the small matrix exponential routine within the package. Finally, if for j = 1, ..., k we consider the local truncation error vectors [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] then we can infer


and a similar analysis to that done earlier holds.


This section shows experiments with some routines of Expokit. All the examples are executed on a standard SUN4 workstation. The size of the Krylov basis is m = 30 everywhere. Drivers reproducing each example are available in the package: sample_re.f, sample_z.f, sample_g.f, sample_b.f, sample_p.f for the first through to the fifth example, respectively. Another driver (not shown here), sample_d.f, is available, and it depicts the utilization of routines targeted to small dense problems.

6.1 A Binary Markov Example

A Markov system is being modeled in this first example (see Table I). The system consists of a collection of components that are binary, i.e., they have only two states, "good" and "bad." If c is the number of components in the system, the number of states in which the system can be is n = 2c. Rules for constructing automatically the infinitesimal generator are given in Clarotti [1984]. The matrix chosen for the example is of order n = 1,024 with nz = 11,264 (i.e., the number of components is c - 10). We set t = 10, v = [(1,0, ..., 0).sup.T], and tol = [10.sup.-10].

This example aims at illustrating that DGEXPV (the version for General matrices) is often slightly faster than DMEXPV (the version for Markov matrices). There is an overhead when enforcing the probability constraint in DMEXPV. Both versions are provided in Expokit, so in the end the choice of which one to use is left to the satisfaction of the Markovian analyst--depending on whether the impending priority is guaranteed reliability or speed at some risk.

6.2 A Nonsymmetric Example

The matrix of this example is the unsymmetric matrix ORAN1678 of the Harwell-Boeing collection [Duff et al. 1989]. The order is n = 2,529, and the number of nonzero elements is nz = 90,158. This example illustrates the impact of the sparse data storage. DGEXPV is executed with t = 10, v = [(1, . . . , 1).sup.T], tol = 0, using three different data structures (see Barret et al. [1994] and Saad [1994]): Coordinate (COO), Compressed Row Storage (CRS), Compressed Column Storage (CCS). By inputing tol as zero, the code automatically set tol to the square root of the machine epsilon which is about [1.510.sup.-8] actually. It appears that the CCS format fits better the sparsity pattern of ORAN1678. It is often hard to predict which structure fits better an arbitrary sparse matrix. Our observation on various computer architectures suggests that the CRS format is usually a compromise with unknown erratic sparsity patterns. See Table II.

6.3 A Hermitian Example

The matrix of this example comes from a symmetric pattern of order n = 5,300 of the Harwell-Boeing collection. We take the pattern BCSPWR10 and fill-in a Hermitian matrix. Both real and imaginary parts are filled using uniform random numbers within the range [-5, + 5]. The Coordinates (COO) storage is used to hold the matrix, and this yields nz = 21,842 as the effective number of nonzero elements (the conjugate transpose is included explicitly). We set t = 1, v = [(1,0, ..., 0,1).sup.T], and tol = 10-5 The example illustrates the gain of the routine tailored for Hermitian matrices over the general routine. See Table III.

6.4 A Forward-Backward Example

The matrix of this example is the symmetric matrix GR3030 of the Harwell-Boeing collection. The order is n = 900, and the number of nonzero elements is nz= 4,322. We set t = 1, v = (1, ..., 1) T, and tol - [10.sup.-10]. This example shows the computation of [e.sup.-tA][e.sup.tA]v. Two runs are performed in tandem. The first run computes [w.sup.+] = [e.sup.tA]v (forward), and the output of this computation is passed as the input operand of the next run, which computes [w.sup.-] = [e.sup.-tA][w.sup.+] (backward). In exact arithmetic, the final result should be v. The output obtained is very close. We see also that the runs took comparable time. See Table IV.
Table IV. Forward-Backward Example

  Forward with DSEXPV          Backward with DSEXPV
      w+(1 : 5) =                    w-(1:5) =

    3456.5698306801              1.0000000000001
    7.3427169843682              1.0000000000003
    4094.7323184931              1.0000000000003
    1275.0417533589              1.0000000000003
    2939.0163458165              1.0000000000003
    CPU Time = 11 sec.          CPU Time = 10 sec.

6.5 A Nonhomogeneous Example

This is an illustration of an integration with respect to the phi function, i.e., it computes w = exp(tA)v + t[Psi](tA)u, where [Psi](z) = (exp(z) - 1)/z. The matrix of this example is the unsymmetric matrix ORANI678 (CCS format) that was used in Section 6.2. We also set t = 10, tol - 0, and thus the square root of the machine unit is the accuracy tolerance that will be effectively used. We have set the input parameters so as to make a certain number of comparisons. By setting u = 0 in the first run, the expected answer is [w.sup.[1]] = exp(tA)v. This is confirmed by a look at Section 6.2. With the second run the answer should be [w.sup.[2]] = t[Psi](tA)u. To check this we compute the difference [Delta] = (u + [Aw.sup.[2]]) - [w.sup.[1]||/||[w.sup.[1]]|| [approximately equals] [-10.sup.-14] which correlates quite well with the tolerance used. The third run is a general run where neither u nor v are null. We observe that the runs took comparable times. It has been our observation that computing exp(tA)v via the PHI routine may sometimes be faster than via the EXP routine. This may be related to the fact that the right-hand-side function in (17), i.e., the phi function, is smoother than the right-hand-side function in (7), i.e., the exponential function--thus allowing for bigger step sizes. However, they


A World Wide Web site,, has been set up, which allows for accessing the package (or parts of it). A mailing list has also been set up,, and it provides a medium through which users of the package are kept informed of upgrades in the software. Users can subscribe/unsubscribe in this open forum, exchange ideas, and contribute in the extension of the package.

Coding was done in Fortran 77 while using building blocks from the two well-known scientific libraries BLAS and LAPACK. As a precaution, necessary steps have been taken to ensure an unconstrained portability of the software. We include an essential minimal substitute to BLAS and LAPACK. In this way the software is self-contained and can still operate even if these libraries are not yet installed in the environment of the user. The naming scheme used in LAPACK has inspired the naming convention used for the routines in the package. However, because the chart of matrices involved in Expokit is not as exhaustive as that of LAPACK (e.g., all sparse routines in Expokit are matrix-flee), we introduced slight differences. Unless otherwise stated, the names of the Fortran 77 routines in Expokit match the pattern TXYYYZ in accordance with the description on Table VI.
Table VI. Nomenclature

T       indicates the data type
D       double precision
Z       complex*16
G       General
H       Hermitian
S       Symmetric
M       Markov chain matrix
N       Hessenberg
Z       indicates the task scope
M       Matrix output
V       Vector output
X       indicates the matrix
YYY     indicates the task
PAD     Pade
CHB     Chebyshev
EXP     Exponential evaluation
PHI     [Phi] evaluation (nonhomogeneous problem)

Not all combinations are possible. The choice of the pattern above complies with the six-length Fortran 77 recommendation and achieves the dual purpose of avoiding conflicts with LAPACK's names (a cross-check has been done to ascertain this) and facilitating/harmonizing forthcoming extensions of the package. The header in Table VII is a prototype to the routines in the package. Whenever possible readability and user-friendliness are in the first place--provided efficiency is not impeded (e.g., the ordinary user is relieved from handling bulky parameters while the advanced user can fetch them if needed).


To sustain flexibility, the MATLAB counterparts of the algorithms have been coded and are included in the distribution. Thus research codes written in MATLAB can perform better by taking advantage of the new techniques. More to the point, we supply MATLAB scripts allowing experimental matrices composed within the user-friendly environment of MATLAB to be stored into files suitably formatted for loading directly into the Fortran versions. It is not our intention to elaborate on operating directives. A README file is included in the distribution for this purpose. We shall instead outline some top-level routines of relevance that will give a sense of concreteness to the reader.

_EXPV computes w = exp(tA)v; t can be positive or negative. The action of the matrix exponential operator on the operand vector is evaluated directly, i.e., exp(tA) is not computed in isolation before being applied to v. The variants dealing with symmetric/ Hermitian matrices use the cheaper Lanczos process instead of the Arnoldi process. Of particular interest is DMEXPV, the customised version for Markov Chains. This means that a check is done within this program to ensure that the resulting vector w is a probability vector.

IMPORTANT: The well-known transition rate matrix Q of a Markov chain satisfies Q?? = 0 where ?? = [(1,...,1).sup.T]; DMEXPV uses the matrix-vector product y = Ax = [Q.sup.T]x, i.e., the transpose of Q times a vector. Failure to remember this leads to wrong results.

_PHIV computes w = exp(tA)v + t[Psi](tA)u which is the solution of the nonhomogeneous linear ODE problem w' = Aw + u, [w.sub.0] = v. The input parameter t can be positive or negative. If u = 0 this procedure is mathematically equivalent to _EXPV, and if v = 0 it computes t[Psi](tA)u. The variants dealing with symmetric/ Hermitian matrices use the cheaper Lanczos process instead of the Arnoldi process.

_PADM computes the matrix exponential exp(tH) in full where H is a relatively small matrix. The underlying method is the irreducible rational Pade approximation to the exponential function combined with scaling-and-squaring. This procedure can be used in its own right to compute the exponential of a small matrix in full. Computational savings are made in the symmetric/Hermitian variants.

_CHBV computes exp(tH)y where H is a relatively small matrix using the partial fraction expansion of the uniform rational Chebyshev approximation of type (14,14) to the exponential function over the negative real axis. The calculation of each fraction is done using Gaussian elimination with pivoting. The Chebyshev method is intended to compute the direct action of the matrix exponential on a vector when H is negative definite.

blas.f/lapack.f minimal substitute to BLAS and LAPACK. In this way Expokit can be used even if these library packages are not yet installed in the environment of the user. Guidelines for their use (or nonuse) are provided in the Makefile.

expv.m/mexpv.m MATLAB counterparts of _EXPV and DMEXPV.

mat2coo.m utility program aimed at storing a MATLAB matrix into a text file that can be exploited directly. The matrix is stored under the Coordinates storage format (COO).

mat2crs.m/mat2ccs.m utility programs aimed at storing a MATLAB matrix into a text file under the Compressed Row Storage (CRS) format and the Compressed Column Storage (CCS) format. NOTE: Fortran subroutines for matrix-vector multiplication when the matrix is stored using either of the above formats are included in this distribution. They are referred to as _coov, _crsv, and _ccsv. For optimum performances, however, if users have an a priori knowledge of the structure of their matrices, it is recommended to store them using the most advantageous format and to devise the most advantageous matrix-vector multiplication routine. A Fortran converter subroutine, _cnvr, that transforms any of the above formats to another one is supplied in this distribution.

loadcoo.m/loadcrs.m/loadccs.m utility Matlab scripts for loading matrices stored using either mat2coo.m, mat2crs.m, or mat2ccs.m.


This article has detailed the main current ingredients of the Expokit package. The work has benefited from latest results in Krylov approximation methods to the matrix exponential, and new results and extensions were established that enabled effective local and global error estimation, step size control, efficient implementation, and tailorization (e.g., Markov chains). The software is self-contained and accepts real and complex data. It deals with small matrices as well as large sparse matrices, and it makes computational advantage of symmetric/Hermitian matrices. Since the computation of the matrix exponential is difficult, the article has also outlined the issues that can contribute in the understanding of the limitations of the methods if they fail to produce accurate enough results.

It is hoped that numerical methods that need the small matrix exponential in full or that make use of the action of the large sparse matrix exponential and/or the phi function in their own right or as a building-block will find in the software robust and ready-to-use routines that will be helpful. Examples of such methods include Lawson [1967], Leyk and Roberts [1995], Hochbruck et al. [1998], and Meerbergen and Sadkane [1996]. Feedback from current users of the package certainly shows that the package is quite propitious. In addition to the inevitable conventional maintenance, efforts will be undertaken to include refinements and extensions that will keep the software up-to-date with new discoveries. In particular, studies in Sidje [1994; 1997] have already illustrated how an efficient parallelization can be achieved.


BARRET, R., BERRY, M., CHAN, T. F., DEMMEL, J., DONATO, J., DONGARRA, J., EIJKHOUT, V., POZO, R., ROMINE, C., AND VAN DER VORST, H. 1994. Templates for the Solutions of Linear Systems: Building Blocks for Iterative Methods. Society for Industrial and Applied Mathematics, Philadelphia, PA.

BERMAN, A. AND PLEMMONS, R.J. 1979. Nonnegative Matrices in the Mathematical Sciences. Academic Press, Inc., New York, NY.

CARPENTER, A. J., RUTTAN, A., AND VARGA, R.S. 1984. Extended numerical computations on the 1/9 conjecture in rational approximation theory. In Rational Approximation and Interpolation. Springer Lecture Notes in Mathematics, vol. 1105. Springer-Verlag, Berlin, Germany, 383-411.

CIARDO, G., MARIE, R. A., SERICOLA, B., AND TRIVEDI, K. S. 1990. Performability analysis using semi-Markov reward processes. IEEE Trans. Comput. 39, 10 (Oct.), 1251-1264.

CLAROTTI, C.A. 1984. The Markov approach to calculating system reliability: Computational problems. In International School of Physics "Enrico Fermi", Varenna, Italy (Amsterdam, The Netherlands), A. Serra and R. E. Barlow, Eds. North-Holland Physics Publishing, Amsterdam, The Netherlands, 54-66.

CODY, W. J., MEINARDUS, G., AND VARGA, R.S. 1969. Chebyshev rational approximation to exp(-x) in [0,+ [infinity]) and applications to heat conduction problems. J. Approx. Theory 2, 50-65.

DRUSKIN, V. AND KNIZHNERMAN, L. 1995. Krylov subspace approximations of eigenpairs and matrix functions in exact and computer arithmetic. Num. Lin. Alg. Appl. 2, 205-217.

DUFF, I. S., GRIMES, R. G., AND LEWIS, J. G. 1989. Sparse matrix test problems. ACM Trans. Math. Softw. 15, 1 (Mar.), 1-14.

FASSINO, C. 1993. Computation of matrix functions. Ph.D. thesis. Universita di Pisa, Udine, Italy.

GALLOPOULOS, E. AND SAAD, Y. 1992. Efficient solution of parabolic equations by Krylov approximation methods. SIAM J. Sci. Stat. Comput. 13, 5 (Sept.), 1236-1264.

GOLUB, G. H. AND VAN LOAN, C.F. 1996. Matrix Computations. 3rd ed. Johns Hopkins University Press, Baltimore, MD.

GUSTAFSSON, K. 1991. Control theoretic techniques for stepsize selection in explicit Runge-Kutta methods. ACM Trans. Math. Softw. 17, 4 (Dec.), 533-554.

HOCHBRUCK, M. AND LUBICH, C. 1997. On Krylov subspace approximations to the matrix exponential operator. SIAM J. Numer. Anal. 34, 5 (Oct.), 1911-1925.

HOCHBRUCK, M., LUBICH, C., AND SELHOFER, H. 1998. Exponential integrators for large systems of differential equations. SIAM J. Sci. Comput. 19, 5 (Sept.), 1552-1574.

ISERLES, A. AND NORSETT, S.P. 1991. Order Stars. Chapman & Hall, Ltd., London, UK.

LAWSON, J. D. 1967. Generalized Runge-Kutta processes for stable systems with large Lipschitz constants. SIAM J. Numer. Anal. 4, 372-380.

LEYK, T. AND ROBERTS, S.G. 1995. Numerical solving of a free boundary phase field model using Krylov subspace method. In Proceedings of the Computational Techniques and Applications Conference (Melbourne, Australia). World Scientific Publishing Co., Inc., River Edge, NJ.

MEERBERGEN, K. AND SADKANE, M. 1996. Using Krylov approximations to the matrix exponentional operator in Davidson's method. Tech. Rep. TW 238, Dept. of Computer Science. Katholieke Universiteit Leuven, Leuven, Belgium.

MOLER, C. B. AND VAN LOAN, C.F. 1978. Nineteen dubious ways to compute the exponentional of a matrix. SIAM Rev. 20, 4, 801-836.

NEUTS, M. F. 1981. Matrix Geometric Solutions in Stochastic Models: An Algorithmic Approach. Johns Hopkins University Press, Baltimore, MD.

PARLETT, B. N. 1976. A recurrence among the elements of functions of triangular matrices. Lin. Alg. Appl. 14, 117-121.

PARLETT, B. N. AND NG, K. C. 1985. Development of an accurate algorithm for exp(B[Tau]). Tech. Rep. PAM-294, Center for Pure and Applied Mathematics, University of California at Berkeley, Berkeley, CA.

PHILIPPE, B. AND SIDJE, R. B. 1995. Transient solutions of Markov processes by Krylov subspaces. In Proceedings of the 2nd International Workshop on the Numerical Solution of Markov Chains (Raleigh, NC), W. J. Stewart, Ed. Kluwer Academic, Amsterdam, The Netherlands.

SAAD, Y. 1992a. Analysis of some Krylov subspace approximations to the matrix exponential operator. SIAM J. Numer. Anal. 29, 1 (Feb.), 209-228.

SAAD, Y. 1992b. Numerical Methods for Large Eigenvalue Problems. Manchester University Press, New York, NY.

SAAD, Y. 1994. Sparskit: A basic tool kit for sparse matrix computation, version 2. Tech. Rep., Computer Science Dept., University of Minnesota, Minneapolis, MN.

SENETA, E. 1981. Non-negative Matrices and Markov Chains. 2nd ed. Springer-Verlag, New York, NY.

SIDJE, R.B. 1994. Parallel algorithms for large sparse matrix exponentials: Application to numerical transient analysis of Markov processes. Ph.D. thesis. University of Rennes 1, Rennes, France.

SIDJE, R. B. 1997. Alternatives for parallel Krylov basis computation. Num. Lin. Alg. Appl. 4, 4, 305-331.

SIDJE, R. B. AND STEWART, W.J. 1998. A numerical study of large sparse matrix exponentials arising in Markov chains. Comput. Stat. Data Anal. To be published.

STEWART, D. E. AND LEYK, T.S. 1996. Error estimates for Krylov subspace approximations of matrix exponentials. J. Comput. Appl. Math. 72, 2, 359-369.

STEWART, W.J. 1994. Introduction to the Numerical Solution of Markov Processes. Princeton University Press, Princeton, NJ.

VAN LOAN, C.F. 1977. The sensitivity of the matrix exponential. SIAM J. Numer. Anal. 14, 6, 971-981.

VARGA, R. S. 1990. Scientific Computation on Mathematical Problems and Conjectures. CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 60.

WARD, R. C. 1977. Numerical computation of the matrix exponential with accuracy estimate. SIAM J. Numer. Anal. 14, 4, 600-610.

Received: April 1996; revised: March 1997 and July 1997; accepted: July 1997

Author's address: Department of Mathematics, University of Queensland, Brisbane, QLD 4072, Australia; email: Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee.

[C] 1998 ACM 0098-3500/98/0300-0130 $5.00
COPYRIGHT 1998 Association for Computing Machinery, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1998 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Sidje, Roger B.
Publication:ACM Transactions on Mathematical Software
Date:Mar 1, 1998
Previous Article:The computation of spectral density functions for singular Sturm-Liouville problems involving simple continuous spectra.
Next Article:An Object-Oriented Framework for Block Preconditioning.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters