# A network programming approach in solving Darcy's equations by mixed finite-element methods.

Abstract. We use the null space algorithm approach to solve the augmented systems produced by the mixed finite-element approximation of Darcy's laws. Taking into account the properties of the graph representing the triangulation, we adapt the null space technique proposed in , where an iterative-direct hybrid method is described. In particular, we use network programming techniques to identify the renumbering of the triangles and the edges, which enables us to compute the null space without floating-point operations. Moreover, we extensively take advantage of the graph properties to build efficient preconditioners for the iterative algorithm. Finally, we present the results of several numerical tests.

Key words. augmented systems, sparse matrices, mixed finite-element, graph theory

AMS subject classifications. 65F05, 65F10, 64F25, 65F50, 65G05

1. Introduction. The approximation of Darcy's Laws by Mixed Finite-Element techniques produces a finite-dimensional version of the continuous problem which is described by an augmented system. In this paper, we present an analysis of a null space method which uses a mixture of direct and iterative solvers applied to the solution of this special augmented system. The properties of this method, in the general case, have been studied in  where its backward stability is proved, when using finite-precision arithmetic, and where a review of the bibliography on the topic is also presented. Here, we will take advantage of network programming techniques for the design of a fast algorithm for the direct solver part and for the building of effective preconditioners. The relationship between the graph properties of the mesh and the augmented system has been pointed out in . Several authors used similar data structures and network techniques in a rather different context or for different purposes. In [1, 10, 28] similar techniques have been suggested in the area of computational electromagnetics for gauging vector potential formulations. In the field of computational fluid dynamics, analogous methods have been applied to the finite-difference method for the solution of Navier-Stokes equations [3, 24]. Finally, in , a similar approach in the approximation of a 3-D Darcy's Law by Hybrid Finite-Element techniques is studied.

The null space algorithm is a popular approach for the solution of augmented systems in the field of numerical optimization but is not widely used in fluid dynamics. For a review of other existing methods for the solution of saddle point problems we advise to read the comprehensive survey . Among the possible alternative methods, we indicate the direct approach where a sparse [LDL.sup.T] decomposition of the symmetric augmented matrix is computed using preprocessing that will help to minimize the fill-in during the factorization combined with one by one and two by two numerical pivot strategies [18, 17]. In our numerical experiments we will compare our approach with one of these direct solvers.

We point out that our null space method is an algebraic approach to the computation of the finite-element approximation of H(curl) which characterizes the subspace of the divergence-free vector fields in H(div), [32, 7, 34, 35]. Nevertheless, we emphasize that our method does not require the explicit storage of the null space and, therefore, of the related finite-element approximation of H(curl). Our approach is applicable when R[T.sub.0] and BD[M.sub.1] finite elements  are used. Nevertheless, we do not consider this a limitation in practical situations: the R[T.sub.0] and BD[M.sub.1] finite elements are widely used in the 3D simulation of physical phenomena, where higher order approximations have an exceeding computational complexity and the indetermination in the evaluations of the physical parameters is high. For the sake of simplicity, we describe our approach only for the R[T.sub.0] finite elements.

In Section 2, we will briefly summarize the approximation process and describe the basic properties of the linear system and augmented matrix. In Section 3, the null space algorithm and its algebraic properties are presented. The direct solver is based on the LU factorization of the submatrix of the augmented system which approximates the divergence operator div. We will see in Section 4, how the basic structures of the matrices involved are described in terms of graph theory and how the LU decomposition can be performed by Network Programming classical algorithms. In particular, we will use the "Shortest Path Tree" (SPT) algorithms to achieve a reliable fast decomposition. Furthermore, the same graph properties allow us to describe the block structure of the projected Hessian matrix on which we will apply the conjugate gradient algorithm. This will be used in Section 5 to develop the preconditioners. Finally, in Section 6, we show the results of the numerical tests that we conducted on selected experiments, in Section 7, we describe the possible extension of our techniques to the three dimensional domain case, and we give our conclusions in Section 8.

2. The analytical problem and its approximation.

2.1. Darcy's Law. We consider a simply connected bounded polygonal domain [omega] in [R.sup.2] which is defined by a closed one-dimensional curve [GAMMA]. The boundary [GAMMA] is the union of the two distinct parts [[GAMMA].sub.D] and [[GAMMA].sub.N], i.e. [GAMMA] = [[GAMMA].sub.D] [union] [[GAMMA].sub.N], where either Dirichlet- or Neumann-type boundary conditions are imposed. In the domain [omega], we formulate the mathematical model that relates the pressure field p (the hydraulic head) and the velocity field v (the visible effect) in a fully saturated soil with an incompressible soil matrix. This relationship is given by the steady Darcy's model equations; the assumption of soil incompressibility implies that the soil matrix characteristics, e.g. density, texture, specific storage, etc, be independent of time, space, and pressure. The Darcy's model equations read as

v = -K grad p, in [omega] (2.1)

div v = f, in [omega] (2.2)

Equation (2.1) relates the vector field v to the scalar field p throughout the permeability tensor K, which accounts for the soil characteristics. Equation (2.2) relates the divergence of v to the right-hand side source-sink term f. These model equations are supplemented by the following set of boundary conditions for v and p:

p|[[GAMMA].sub.D] = [g.sub.D], on [[GAMMA].sub.D] (2.3)

v x n|[[GAMMA].sub.N] = [g.sub.N], on [[GAMMA].sub.N] (2.4)

where n is the unit vector orthogonal to the boundary [GAMMA] and pointing out of [omega], [g.sub.D] and [g.sub.N] are two regular functions that take into account Dirichlet and Neumann conditions, respectively. For simplicity of exposition, [g.sub.N] is taken equal to zero.

2.2. Mixed finite-element method for Darcy's law. In this section, we shortly review some basic ideas underlying the mixed finite-element approach that is used in this work to approximate the model equations (2.1)-(2.2). The mixed weak formulation is formally obtained in a standard way by multiplying equation (2.1) by the test functions

w [member of] V = {q|q [member of] [([L.sup.2]([omega])).sup.2], div q [member of] [L.sup.2]([omega]), q x n|[[GAMMA].sub.N] = 0},

and equation (2.2) by [phi] [member of] [L.sup.2] ([omage]) and integrating by parts over the domain of computation. We refer to  for the definition and the properties of the functional space V. The weak formulation of (2.1)-(2.2) reads as

find v [member of] V and p [member of] [L.sup.2] ([omega]) such that:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] for every w [member of] V,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] for every [phi] [member of] [L.sup.2] ([omega]).

The discrete counterpart of the weak formulation that we consider in this work can be introduced in the following steps. First, we consider a family of conforming triangulations covering the computational domain [omega] that are formed by the sets of disjoint triangles [[??].sub.h] = {T}. Every family of triangulations is regular in the sense of Ciarlet, [13, page 132], i.e. the triangles do not degenerate in the approximation process for h tending to zero. Additionally, we require that no triangle in [[??].sub.h] can have more than one edge on the boundary [GAMMA] nor that a triangle can exist with a vertex on [[GAMMA].sub.D] and any other vertex on [[GAMMA].sub.N]. The label h, which denotes a particular triangulation of this family, is the maximum diameter of the triangles in [[??].sub.h], i.e. [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Then, we consider the functional spaces

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

i.e. the space of the lowest-order Raviart-Thomas vector fields defined on [omega] by using [[??].sub.h], and

[Q.sub.h] = {[phi](x) : [omega] [right arrow] R, [phi](x)|T = const, [for all]T [member of] [[??].sub.h]},

i.e. the space of the piecewise constant functions defined on [omega] by using [[??].sub.h]. For any given h, the two functional spaces [V.sub.h] and [Q.sub,h] are finite-dimensional subspaces of V and [L.sup.2]([omega]), respectively, and are dense within these latter spaces for h [right arrow] 0 . The discrete weak formulation results from substituting the velocity and pressure field v and p by their discretized counterparts [v.sub.h] and [p.sub.h] and reformulating the weak problem in the spaces [V.sub.h] and [Q.sub.h] .

The dimension of the functional space [V.sub.h] is equal to the total number of internal and Dirichlet boundary edges [N.sub.e]. Any element of [V.sub.h] can be expressed as a linear combination of the basis functions [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], where [e.sub.j] may be an internal edge or a boundary edge with a Dirichlet condition. The basis function [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], which is associated to the edge [e.sub.j], is uniquely defined by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (2.5)

where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is the unit vector with direction orthogonal to [e.sub.j] and arbitrarly chosen orientation. Likewise, the dimension of the functional space [Q.sub.h] is equal to the number of triangles [N.sub.T]. In fact, any element of [Q.sub.h] can be expressed as a linear combination of the basis functions {[[phi].sub.j], for j = 1, ..., [N.sub.T]}. The basis function [[phi].sub.j], which is associated to the triangle [T.sub.j], is such that [[phi].sub.j] = 1 on [T.sub.j] and [[phi].sub.j] = 0 on [omega] [T.sub.j].

The solution fields [v.sub.h] and [p.sub.h] of the discretized weak formulation are taken as approximation of the solution fields v and p of the original weak formulation. The solution fields [v.sub.h] and [p.sub.h] are expressed as linear combinations of the basis functions introduced in the previous paragraph. These expansions are formally given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (2.6)

The coefficient [p.sub.j] of the basis function [[phi].sub.j] in (2.6) can be interpreted as the approximation of the cell average of the pressure field p on [T.sub.j]. The coefficient [u.sub.j] of the basis function [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] in (2.6) can be interpreted as the approximation of the flux, or the 0th-order momentum, of the normal component of v on the edge [e.sub.j]. We collect these coefficients in the two algebraic vectors u = {[u.sub.j]} and p = {[p.sub.j]}. It turns out that these vectors are solving the augmented system of linear equations

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (2.7)

where the elements of the matrices M and A of the augmented matrix of the left-hand side of (2.7) and the elements of the vectors q and b of the right-hand side of (2.7) are given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (2.8)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (2.9)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (2.10)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (2.11)

The augmented system (2.7) has a unique solution because it holds that

Ker([A.sup.T]) [union] Ker(M) = {0}.

This property is an immediate consequence of the inf-sup condition, which is satisfied by the functional operators represented by (2.8) and (2.9) on the basis function sets {w[e.sub.j]} and {[[phi].sub.j]}. These functional operators are used to define the discrete weak formulation in the functional spaces [V.sub.h] and [Q.sub.h] (see  for more details).

Finally, we describe some major properties of the matrices M and A that are related to the triangulation [[??].sub.h]. First, we assume that [e.sub.i] be an internal edge, and denote its two adjacent triangles by [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. As the support of the basis function [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is given by the union of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], every row [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] of the matrix M has at most 5 nonzero entries. These nonzero entries are on the columns corresponding to the internal or Dirichlet boundary edges of the triangles [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (the Neumann boundary edges are not considered). Similarly, the row [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] of the matrix A must have two different nonzero entries on the columns corresponding to the two triangles [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Then, we assume that [e.sub.i] be a Dirichlet boundary edge, and denote the unique trianglethat [e.sub.i] belongs to, by [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. In this second case, the row [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] has surely three nonzero entries on the columns corresponding to the three edges of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. From our initial assumption on the triangulations, it follows that the triangle [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] cannot have more than one edge on the boundary [GAMMA]; furthermore, we are actually assuming that this boundary edge is of Dirichlet type. Therefore, the row [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] has one nonzero entry only on the column corresponding to [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Moreover, the rectangular matrix A has maximum rank. By applying the Gauss theorem to the integral in (2.9) restricted to [T.sub.j] (where [[phi].sub.j] = 1) and taking into consideration (2.5), we deduce that the nonzero entries of the matrix A must be either +1 or -1. The sign depends on the relative orientation of the triangle edges with respect to the edge orientation that was arbitrarily chosen in (2.5). Furthermore, for every internal edge [e.sub.i] shared by the triangles [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], the sum of the two nonzero entries on the matrix row [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is zero; formally, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Let us consider the matrix A' = [A|r] that is obtained by augmenting A with the column vector r which is built as follows. The column vector r only has nonzero entries for the row indices corresponding to the edges [e.sub.i] [subset] [[GAMMA].sub.D] and these non-zero entries are such that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. The matrix A' is the incidence matrix of the edge-triangle graph underlying the given triangulation, and, then [31, 12] the matrix A' is a totally unimodular matrix and has rank [N.sub.T]. Therefore, the matrix A has rank [N.sub.T] because every submatrix that is obtained by removing a column of a totally unimodular matrix has full rank [31, 12]. Furthermore, the pattern of M is equal to the pattern of [absolute value of A][[absolute value of A].sup.T], because the nonzero entries of the row [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] must match the nonzero entries of the edges of the triangles [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and from the unimodularity properties of A. By construction, the matrix M is symmetric and positive semidefinite.

3. Null space algorithms. In this section, we take into account the classical null space algorithm for the minimization of linearly constrained quadratic forms, which is described for example in . In particular, we choose the formulation of this algorithm that is based on the factorization of the matrix A.

In order to simplify the notations, we use the letter n instead of the symbol [N.sub.e] to indicate the total number of internal edges and Dirichlet boundary edges of the mesh, and m instead of [N.sub.T] to indicate the total number of triangles of the mesh. It holds that n [greater than or equal to] m. Finally, we denote by [E.sub.1] and [E.sub.2] the n x m and n x (n - m) matrices

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where, [I.sub.m] and [I.sub.n-m] are respectively the identity matrix of order m and the identity matrix of order n - m.

As already stated in Section 2.2, A is a full rank submatrix of a totally unimodular matrix of order n x (m + 1), and its entries are either [+ or -] 1 or 0. It turns out the "LU" factorization of A is obtainable without any floating-point operations, throughout a couple of suitable permutation matrices P and Q (see [31, 12]). In Section 4, we will see how it is possible to determine efficiently P and Q by using network programming algorithms. The matrices P and Q allow us to write the permutation of the matrix A as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3.1)

where [L.sub.1] is a nonsingular lower triangular matrix of order m. Without loss of generality, we assume that all the diagonal entries in [L.sub.1] are equal to 1 and the non-zero entries below the diagonal are equal to -1. This choice simply corresponds to a symmetric diagonal scaling of the permuted augmented system. By introducing the lower triangular matrix

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (3.2)

and recognizing that [E.sub.1] plays the role of the matrix "U", the "LU" factorization of A is given by

PAQ = L[E.sub.1].

In the rest of this section, we assume, for the sake of exposition simplicity that the matrices M and A and the vectors q and b have already been consistently permuted and omit to indicate explicitly the permutation matrices P and Q. Thus, by identifying the matrix A with L[E.sub.1] after the row and column permutations, the augmented matrix in (2.7) can be factorized as follows

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3.3)

We indicate the matrix [L.sup.-1] M[L.sup.-T] by the symbol [??]. The inverse matrix and the transposed inverse matrix of L are formally given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3.4)

The block-partitioning of [L.sup.-1] and [L.sup.-T] induces on [??] the block-partitioned form

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3.5)

where, formally, [[??].sub.ij] = [E.sup.T.sub.i] [??][E.sub.j], for i, j = 1, 2, and we exploited the symmetry of the matrix M (and, of course of [??]) to set [[??].sub.21] = [[??].sup.T.sub.12]. Similarly, we introduce the block partition [([u.sub.1], [u.sub.2]).sup.T] of the velocity vector u and denote, for consistency of notation, the pressure vector p by [u.sub.3]. Thus, the algebraic vector [(u, p).sup.T] = [([u.sub.1], [u.sub.2], [u.sub.3]).sup.T] denotes the solution of the (suitably permuted) linear problem (2.7) discussed in Section 2.2. We use the decomposition (3.3) and take into account the block-partitioned definitions (3.4) and (3.5) of the matrices [L.sup.-1], [L.sup.-T] and [??] to split the resolution of the linear problem (2.7) in the two following steps. First, we solve the linear system

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3.6)

for the auxiliary unknown vector [([z.sub.1], [z.sub.2], [z.sub.3]).sup.T] and, then, we compute the unknown vector [([u.sub.1], [u.sub.2], [u.sub.3]).sup.T] by solving

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3.7)

Note that the left-hand side matrix of (3.6) can be put in a block-triangular form by interchanging the first and the third block of equations. The final computational algorithm, which is solved by the null space strategy, is formulated by introducing the vector h = -[L.sup.-1]q. The vector h is consistently partitioned in the two subvectors [h.sub.1] = [E.sup.T.sub.1]h, and [h.sub.2] = [E.sup.T.sub.2]h according to the block-partitioning (3.5).

NULL SPACE ALGORITHM:

(i) solve the block lower triangular system:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

(ii) and, then, let

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

Note that in step (ii) we have [u.sub.2] = [z.sub.2] in view of the second formula in (3.4).

The null space algorithm as formulated above requires the formal inversion of the matrix [[??].sub.22], which is the projected Hessian matrix of the quadratic form associated to the augmented matrix of (2.7). In order to solve the linear algebraic problem [[??].sub.22][u.sub.2] = [h.sub.2] - [[??].sup.T.sub.12][Z.sub.1] for [u.sub.2], we may proceed in two different ways. If n - m is small, i.e the number of constraints is very close to the number of unknowns, or [[??].sub.22] is a sparse matrix, we may explicitly compute [[??].sub.22] and then solve the linear system involving this matrix by using a Cholesky factorization. Nonetheless, the calculation of the product matrix [L.sup.-1]M[L.sup.-T] might be difficult because of the high computational complexity, which is of order O([n.sup.3]), or because the final matrix itself would be fairly dense despite the sparsity of M. In such cases, we prefer to solve the linear system [[??].sub.22][u.sub.2] = [h.sub.2] - [[??].sup.T.sub.12][z.sub.1] by using the pre-conditioned conjugate gradient algorithm . The conjugate gradient algorithm does not require the explicit knowledge of the matrix [[??].sub.22] but only the capability of performing the product of this matrix times a vector. The product of the block sub-matrix [[??].sub.ij] of [??] times a suitably sized vector y is effectively performed by implementing the formula

[[??].sub.ij]y = [E.sup.T.sub.i][L.sup.-1](M([L.sup.-T][E.sub.j]y)), for i, j = 1, 2. (3.8)

We point out that the formula (3.8) only requires the resolution of the two triangular systems with matrices [L.sup.-1] and [L.sup.-T], which can be easily performed by, respectively, forward and backward substitutions, and the product of the sparse matrix M by an n-sized vector. Furthermore, we observe that the matrix-vector product given by (3.8) is backward stable .

If we use the conjugate gradient method to solve [[??].sub.22][u.sub.2] = [h.sub.2] - [[??].sup.T.sub.12][Z.sub.1], it is quite natural to adopt a stopping criterion which takes advantage of the minimization property of this algorithm. At every step j the conjugate gradient method minimizes the energy norm of the error [delta][u.sub.2] = [u.sub.2] - [[bar.u].sup.(j).sub.2] on a Krylov space between the "exact" solution vector [u.sub.2] and the computed j-th iterative approximation [[bar.u].sup.(j).sub.2]. The energy norm of the vector y [member of] [R.sup.n-m], which is induced by the matrix [[??].sub.22], is defined by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

This norm induces the dual norm

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

for the dual space vectors y' [member of] [R.sup.n-m]. We terminate the conjugate gradient iterations by the following stopping criterion

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3.9) THEN STOP,

where [eta] is an a priori defined threshold with value less than 1. The choice of [eta] clearly depends on the properties of the problem that we want to solve; however, in many practical cases, [eta] can be taken much larger than the machine precision of the floating-point operations. In our experiments, we choose [eta] = h following the results in .

In order to use (3.9), we need some tool for estimating the value [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. This goal can be achieved by using the Gauss quadrature rule proposed in  or the Hesteness and Stiefel rule [25, 4, 37, 38]. This latter produces a lower bound for the error [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] using the quantities already computed during each step of the conjugate gradient algorithm with a negligible marginal cost. In [37, 38], its numerical stability in finite arithmetic has been proved. All these lower bound estimates can be made reasonably close to the value of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] at the price of d additional iterations of the conjugate gradient algorithm. In [4, 22], the choice d = 10 is indicated as a successful compromise between the computational costs of the additional iterations and the accuracy requirements; several numerical experiments support this conclusion ([4, 22, 5, 37, 38]). Finally, following Reference , we estimate

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

by taking into account that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

and replacing [[bar.u].sub.2] with its current evaluation [[bar.u].sup.j.sub.2] at the step j if [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

4. Graph and Network properties. In this section, we first review some basic definitions relative to graph theory (more detailed information can be found in [31, 39]). We also discuss how these definitions are relied to the graph structure underlying the triangulations of the mixed finite element formulation of Section 2.2. Then, we show how a strategy that relies on network programming can be used to determine the permutation matrices P and Q of the "LU" factorization of the matrix A that was introduced in the previous sections. In particular, we show how these two matrices can be constructed in a fast and efficient way by exploting the Shortest-Path algorithm (SPT).

A graph G = {N, A} is made up of a set of nodes N = {[T.sub.i]}i=1, ..., [n.sub.N] and a set of arcs A = {[alpha].sub.j}j=1, ..., [n.sub.A]. An arc [[alpha].sub.j] is identified by a pair of nodes [T.sub.i] and [T.sub.k]; {[T.sub.i], [T.sub.k]} denotes an unordered pair and the corresponding undirected arc, and by [[T.sub.i], [T.sub.k]] an ordered pair and the corresponding directed arc. Either G is an undirected graph, in which case all the arcs are undirected, or G is a directed graph, in which case all the arcs are directed. We can convert every directed graph G to an undirected one called the undirected version of G by replacing each [[T.sub.i], [T.sub.k]] by {[T.sub.i], [T.sub.k]} and removing the duplicate arcs. Conversely, we can build the directed version of an undirected graph G by replacing every {[T.sub.i], [T.sub.k]} by the pair of arcs [[T.sub.i], [T.sub.k]] and [[T.sub.k], [T.sub.i]]. In order to avoid repeating definitions, ([T.sub.k], [T.sub.i]) may denote either an undirected arc {[T.sub.i], [T.sub.k]} or a directed arc [[T.sub.i], [T.sub.k]] and the context resolves the ambiguity.

The nodes [T.sub.i] and [T.sub.k] in an undirected graph G = {N, A} are adjacent if {[T.sub.i], [T.sub.k]} [member of] A, and we define the adjacency of [T.sub.i] by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Analogously, in a directed graph G = {N, A}, we define

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

We define the adjacency of an arc as follows:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

for an undirected graph, and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

for a directed graph.

A path in a graph from node [T.sub.j] to node [T.sub.k] is a list of nodes [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], such that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is an arc in the graph G for i = 1, ..., k - 1. The path does contain the nodes [T.sub.i] for i [member of] [1, ..., k] and arcs ([T.sub.i], [T.sub.i+1]) for i [member of] [1, ..., k - 1] and does not contain any other nodes and arcs. The nodes [T.sub.j] and [T.sub.k] are the ends of the path. The path is simple if all of its nodes are distinct. The path is a cycle if k > 1 and [T.sub.j] = [T.sub.k] and a simple cycle if all of its nodes are distinct. A graph without cycles is acyclic. If there is a path from node [T.sub.w] to node [T.sub.v] then [T.sub.v] is reachable from [T.sub.w].

An undirected graph is connected if every node of its undirected version is reachable from every other node and disconnected otherwise. The maximal connected subgraphs of G are its connected components.

A rooted tree is an undirected graph that is connected and acyclic with a distinguished node [T.sub.R], called root. A rooted tree with k nodes contains k - 1 arcs and has a unique simple path from any node to any other one. When appropriate we shall regard the arcs of a rooted tree as directed. A spanning rooted tree [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] in G is a rooted tree which is a subgraph of G with [n.sub.N] nodes. If [T.sub.v] and [T.sub.w] are nodes in [T.sub.R] and [T.sub.v] is in the path from [T.sub.R] to [T.sub.w] with [T.sub.w] [not equal to] [T.sub.v] then [T.sub.v] is an ancestor of [T.sub.w] and [T.sub.w] is a descendant of [T.sub.v]. Moreover, if [T.sub.v] and [T.sub.w] are adjacent then [T.sub.v] is the parent of [T.sub.w] and [T.sub.w] is a child of [T.sub.v]. Every node, except the root, has only one parent, which will be denoted by parent([T.sub.v]), and, moreover, it has zero or more children. A node with zero children is a leaf. We will call the arcs in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] in-tree and the arcs in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] out-of-tree. A forest is a node-disjoint collection of trees.

The depth([T.sub.i]) of the node [T.sub.i] in a rooted tree is the (integer) top-down distance from the root node that is recursively defined by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

and, similarly, the height([T.sub.i]) is the (integer) down-top distance of the node [T.sub.i] from the deepest leaf that is recursively defined by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The subtree rooted at node [T.sub.i] is the rooted tree consisting of the subgraph induced by the descendants of [T.sub.i] and having root in [T.sub.i]. The nearest common ancestor of two nodes is the deepest node that is an ancestor of both. A node [T.sub.i] is a branching node if the number of its children is greater than or equal to two. For each out-of-tree arc [[T.sub.j], [T.sub.k]], the cycle of minimal length or the fundamental cycle is the cycle composed by [[T.sub.j], [T.sub.k]] and the paths in [T.sub.R] from [T.sub.j] and [T.sub.k] to their nearest common ancestor.

We can associate the graph G = {N, A} to the triangulation [[??].sub.h] as follows. First, we associate a distinct node of the graph to every triangle of the mesh, e.g. [T.sub.i] is the (unique) node corresponding to the triangle [T.sub.i], for i = 1, ..., [N.sub.T]. Then, the (directed or undirected) arc ([T.sub.i], [T.sub.j]) exists in the arc set A if and only if the triangles [T.sub.i] and [T.sub.j] share an edge. Furthermore, we add the root node [T.sub.R], which represents the "exterior world" [R.sup.2] \ [omega], to the node set N, and the arcs ([T.sub.R], [T.sub.k]) for every node [T.sub.k] associated to a triangle [T.sub.k] with a boundary edge of Dirichlet type to the arc set A. The incidence matrix of the graph G is the n x m totally unimodular matrix A' that has been introduced at the end of Section 2.2; its rank is m(= [N.sub.T]).

If we remove the column corresponding to the root from A', we obtain the matrix A of problem (2.7), which also has rank m. Moreover, every spanning tree of G with root in [T.sub.R] induces a partition of the rows of A in in-tree rows and out-of-tree rows. If we renumber the in-tree arcs first and the out-of-tree arcs last, we can permute the in-tree arcs and the nodes such that the permuted matrix A has the form

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where [L.sub.1] [member of] [R.sup.m x m] is a lower triangular and non-singular matrix, see Reference [31, 12].

As the matrix A is obtained by simply removing the root column from the totally unimodular matrix A', then, the entries of the matrix [L.sup.-1.sub.1] must also be [+ or -] 1 or 0. The non-zeroes of the matrix [L.sub.2][L.sup.-1.sub.1] are also equal to [+ or -] 1 and its rows correspond to the out-of-tree arcs. The number of nonzeros of a row of this matrix is equal to the number of arcs in the cycle of minimal length that the corresponding out-of-tree arc forms with the in-tree arcs. We recall (see Section 3) that without loss of generality, all the diagonal entries in [L.sub.1] can be chosen equal to 1 and, therefore, the entries outside the diagonal are 0 or -1. This signifies selecting the directions of the arcs in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] as [[T.sub.i], parent([T.sub.i])], [for all][T.sub.i]. Given the out-of-tree arc [[T.sub.j], [T.sub.k]], the values of the nonzero entries in the corresponding row of [L.sub.2][L.sup.-1.sub.1] will be 1 if both the nodes of the in-tree arc corresponding to the nonzero entry are ancestors of [T.sub.k], and will be -1 if both the nodes of the in-tree arc corresponding to the nonzero entry are ancestors of [T.sub.j].

We now give some basic results the proof of which is straightforward.

LEMMA 4.1. Let [T.sub.v] be a branching node with k children. The descendants of [T.sub.v] can be partitioned in k sets [N.sub.1], ..., [N.sub.k] such that [N.sub.i] [intersection] [N.sub.j] = 0, for i [not equal to] j. Each ([T.sub.i], [T.sub.j]) [member of] A with [T.sub.i] [member of] [N.sub.i] and [T.sub.j] [member of] [N.sub.j] is an out-of-tree arc.

If we define the adjacency of a set of nodes [S.sub.N] [subset] N and the adjacency of a set of arcs [S.sub.A] [subset] A as follows:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

we have the following Corollary.

COROLLARY 4.1. Let [T.sub.v] be a branching node with k children and [N.sub.1], ..., [N.sub.k] such that [N.sub.i] [intersection] [N.sub.j] = 0, for i [not equal to] j be the partitioning of the descendants of [T.sub.v]. Let ([T.sub.q], [T.sub.j]) [member of] A and ([T.sub.p], [T.sub.i]) [member of] A be out-of-tree arcs, with [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. The minimal length circuits [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are disjoint:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Proof. From Lemma 4.1 the descendants of the children of the branching node [T.sub.v] form disjoint subtrees. If the two circuits [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] had an arc in common, this arc should simultaneously be an in-tree arc of the two subtrees of nodes [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Thus, this arc would close a a circuit on the ancestor [T.sub.v] but this fact is in contradiction with the definition of a tree (which cannot contain closed circuits).

Similarly, if [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] had a common arc, this arc should be an in-tree arc because it would lie on [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. This fact is in contradiction with the results of Lemma 4.1 because this arc would connect two disjoint sets.

Finally, if the root [T.sub.R] of the spanning tree is a branching node with k children [T.sup.D.sub.i] , i = 1, ..., k, the subtrees having [T.sup.D.sub.i] as roots form a forest. Therefore, the matrix [L.sub.1] can be permuted in block diagonal form with triangular diagonal blocks.

We refer to [31, 39] for surveys of different algorithms for computing spanning trees. An optimal choice for the rooted spanning tree is the one minimizing the number of nonzero entries in the matrix Z = [L.sup.-T][E.sub.2] the columns of which span the null space of [A.sup.T]. In , it is proved that the equivalent problem of finding the tree for which the sum of all the lengths of the fundamental circuits is minimal, is an NP-complete problem. In [9, 14, 15, 16, 20, 27, 33] several algorithms have been proposed which select a rooted spanning tree reducing or minimizing the number of nonzero entries in Z.

In this paper, we propose two different approaches based on the introduction of a function cost(x) defined on each arc of the graph and describing some physical properties of the original problem.

From Corollary 4.1 it follows that a rooted spanning tree, having the largest possible number of branching nodes, normally has many disjoint circuits. The columns of Z corresponding to these disjoint circuits are structurally orthogonal, i.e. the scalar product of each pair is zero, independent of the values of the entries.

Moreover, we choose the function cost : A [right arrow] [R.sub.+] in the following way:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.1)

Using the cost(x) function, we can compute the spanning tree rooted in [T.sub.R] that is actually solving the SPT problem  on the graph. In particular, we have chosen to implement the heap version of the shortest path tree algorithm (see the SHEAP algorithm that is described in Reference ).

The resulting spanning tree has the interesting interpretation that the path "circumnavigates" low permeability regions in the sense specified below. In the presence of "islands" of very low permeability in [omega], i.e. regions with very large values for [K.sup.-1], the paths from the root to a node corresponding to a triangle that lies outside the islands of low permeability will be made of nodes corresponding to triangles that are also outside the islands. In this sense, we can state that the shorthest path tree "circumnavigates the islands". Therefore, the set of these paths reasonably identifies the lines where the flux is expected to be greater.

Owing to the fact that we assume a null cost for the arcs connected to the root node [T.sub.R], both strategies provides a forest if the number of zero cost arcs is greater than one. We observe that we do not need to build the matrix A explicitly: the tree (or forest) can be computed by using the graph only. Moreover, the solution of the lower and upper triangular systems can be performed by taking advantage of the parent function alone. This strategy results in a very fast and potentially highly parallel algorithm. For a problem with only Dirichlet conditions and an isotropic mesh (i.e. the number of triangles along each direction is independent from the direction), we may have a forest with a number of trees proportional to the number of triangles with a boundary edge. This number is O([h.sup.-1]), and, therefore, the matrix [L.sub.1] has O([h.sup.-1]) diagonal blocks of average size O([h.sup.-1]). Finally, we point out that most non-leaf nodes in the SPT have two children.

5. Preconditioning and quotient tree. In the presence of islands of very low permeability in [omega], the values of K can differ by many orders of magnitude in the different regions of the domain. In this case, the projected Hessian matrix [[??].sub.22] has a condition number which is still very high. It is then necessary to speed up the convergence of the conjugate gradient by use of a preconditioner. Obviously, it is not usually practical to explicitly form [[??].sub.22] to compute or identify any preconditioner. In this section, we will show how we can compute the block structure of [[??].sub.22] using only the tree and the graph information. We will then use the block structure to compute a preconditioner.

Denoting by H the matrix [L.sub.2][L.sup.-1.sub.1], the projected Hessian matrix [[??].sub.22] can be written as follows:

[[??].sub.22] = [Z.sup.T]MZ = [M.sub.22] + H[M.sub.11][H.sup.T] - ([M.sub.21][H.sup.T] + H[M.sub.12]). (5.1)

The row and column indices of the matrix [[??].sub.22] correspond to the out-of-tree arcs. Moreover, we recall that the matrix H has entries -1, 0, 1, its row indices correspond to the out-of-tree arcs, and the number of nonzeros in one of its rows will be the number of arcs in the cycle of minimal length which the corresponding out-of-tree arc forms with the in-tree arcs.

LEMMA 5.1. Let [alpha] and [beta] be two out-of-tree arcs, and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] be their corresponding circuits of minimal length, then

[M.sub.[alpha][beta]] [not equal to] 0 [left and right arrow] [alpha], [beta] [member of] [Adj.sub.[alpha]] [intersection] [Adj.sub.[beta]] (5.2)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5.3)

Proof. Because the orientation of the arcs in the graph is arbitrary and will be determined by the sign of the solution, we will use the undirected version of the graph. Let [alpha] = ([T.sub.v], [T.sub.w]), [beta] = ([T.sub.x], [T.sub.y]), where the graph nodes [T.sub.v], [T.sub.w], [T.sub.x] and [T.sub.y] respectively correspond to the mesh cells [T.sub.1], [T.sub.2], [T.sub.3] and [T.sub.4]. The out-of-tree arc [alpha] uniquely corresponds to the common edge [e.sub.12] between the triangles [T.sub.1] and [T.sub.2], and the out-of-tree arc [beta] uniquely corresponds to the common edge [e.sub.34] between the triangles [T.sub.3] and [T.sub.4]. From (2.8), we have that [M.sub.[alpha][beta]] [not equal to] 0 if and only if [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Since [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], then [M.sub.[alpha][beta]] [not equal to] 0 if and only if the two supports share a mesh cell. Assuming, without loss of generality that [T.sub.1] = [T.sub.3] is the shared cell, (5.2) follows from the definition of Adj.

In (5.3) the [right arrow] is trivial. We give a proof ab absurdo for the [left arrow]. First of all, we note that the cardinal number of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is less or equal than 3, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (a triangle has three edges). If we assume that [C.sub.[alpha]] [intersection] Adj([C.sub.[beta]]) [intersection] 0, [C.sub.[beta]] [intersection] Adj([C.sub.[alpha]]) [not equal to] 0 and [C.sub.[alpha]] [intersection] [C.sub.[beta]] = 0, there must exist a node [T.sub.j] such that [T.sub.j] belongs to both the paths in the tree linking [T.sub.v] to [T.sub.w] and [T.sub.x] to [T.sub.y] respectively. Because each node in the path has only one parent and one child, the [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] contains the parent and the child of the first path and those of the second path. Because we assumed that [C.sub.[alpha]] [intersection] [C.sub.[beta]] = 0 the cardinal number of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] would be 4 which is in contradiction with the upper bound on the cardinal number.

Thus, from (5.1) and the previous Lemma 5.1, we have the following Corollary.

COROLLARY 5.1. Let [alpha] and [beta] be two out-of-tree arcs, and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] be their corresponding circuits of minimal length, then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and

[C.sub.[alpha]] [intersection] [C.sub.[beta]] = 0 [[??].sub.[alpha][beta]] = 0 (5.4)

Proof. The first part of the Corollary follows directly from the expansion of (5.1). The implication (5.4) follows from Lemma 5.1 taking into account that [C.sub.[alpha]] [intersection] Adj([C.sub.[beta]]) [contains or equal to] [C.sub.[alpha]] [intersection] [Adj.sub.[beta]], [C.sub.[beta]] [intersection] Adj([C.sub.[alpha]]) [contains or equal to] [C.sub.[beta]] [intersection] [Adj.sub.[alpha]], and [Adj.sub.[alpha]] [intersection] [Adj.sub.[beta]] [not equal to] 0 [right arrow] [C.sub.[alpha]] [intersection] [C.sub.[beta]] [not equal to] 0.

Corollary 5.1 gives the complete description of the pattern of [[??].sub.22] in terms of the rooted spanning tree. However, we observe that the results (5.3) and (5.4) rely on the 2-D structure of the mesh and they cannot be generalized to a mesh in 3-D.

In the second part of this section, we build the block structure of [[??].sub.22] using the Quotient Tree concept, without explicitly forming the matrix [[??].sub.22]. In the following, we process the root node separately from the other nodes. The root node will always be numbered by 0, and if it is a branching node with k children the tree will contain k subtrees directly linked to the root. Given a rooted spanning tree [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], let B [subset or equal to] N \ {0} be the set of the branching nodes in [T.sub.R], and let C [subset or equal to] N \ {0} be the set of the leaves in [T.sub.R]. We define the set

y = B [union] L = {[T.sub.1], ..., [T.sub.k]}.

If y = N \ {0}, then [T.sub.R] is a binary tree. Otherwise, we can compute the following paths:

[for all][T.sub.i] [member of] y

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The path [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] connects [T.sub.i] to all its ancestors which are not branching nodes, and it can contain [T.sub.i] alone.

The set [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is a partition of N:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Therefore, we can build the quotient tree

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and the quotient graph

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where Adj |T is the restriction of the Adj operator to the graph T.

For the sake of clarity in the following, we will call the quotient tree nodes Q-nodes. The root of the quotient tree is still the node 0, and each subtree rooted at its children has a binary quotient tree.

5.1. Data structures and separators. In this subsection, we shortly review the basic data structures that we used to implement the null space method presented in the previous section. We also discuss some major details concerning the renumbering strategy that allows us to perform a nested dissection-like decomposition of the matrix A. This kind of permutation and the resulting sub-block decomposition makes it possible to build the block-Jacobi preconditioner mentioned in the next sub-section and considered in the numerical experiments of Section 6. More details on the data structure and algorithm implementation are reported in the final appendix.

The data structure that is used for the graph G is based on a double representation of the sparse matrix A. This double representation implements the collection of compressed row and column sparse data which is described in  and allows us to access simultaneously to matrix rows and columns. We assume that rows and columns are consistently permutated when we renumber the graph node. This special design facilitates and speeds up the algorithms that are reported in the appendix.

Trees, which are used to perform row/column permutations, are represented by storing the following data for any node:

* the parent node,

* the node children list,

* the chain index,

* the depth.

In Figure 5.1, we give a simple example of a tree and, in Table 5.1, we list the labels of each node. It is relevant to observe that the reordering obtained by the depth first search of T renumbers the nodes such that the nodes forming a chain are consecutive. Therefore, we can directly access the list using vector arrays.

[FIGURE 5.1 OMITTED]

After the identification of the chains, we build the quotient tree, and, descending the quotient tree T/B with a "depth first search" algorithm, we renumber the nodes and build its data structure. In the new data structure, we associate with each Q-node the following objects:

* Q-parent in T/B,

* first and last node in T of the chain corresponding to the Q-node,

* depth in T/B,

* Q-last, the last Q-node of the subtree rooted in the current Q-node,

* Q-star, the list of the out-of-tree arcs in G which have one extreme belonging to the Q-node,

* Q-children, the list of the Q-nodes children of the Q-node.

In Figure 5.2, we show the quotient tree relative to the example of Figure 5.1, and in Table 5.2, we list the labels of each Q-node.

Taking advantage of the data structures described above, we can order the out-of-tree arcs in the following way. Firstly, we identify the Q-nodes which are children of the external root (node 0), and the subtrees rooted in each of these Q-nodes. Then, we separate each of the subtrees from the others marking the out-of-tree edges that connect it to the others. The out-of-tree arcs lying within one of these subtrees cannot have fundamental cycles with the out-of-tree arcs lying within one of the others, because of Corollary 4.1. This corresponds to a one-way dissection applied to [[??].sub.22]. Then, within each of the subtrees, we seek the out-of-tree arcs that separate the two subtrees rooted in the Q-nodes children of the Q-node root of the subtree containing both of them. This is equivalent to a nested dissection strategy applied to one of the diagonal blocks resulting from the previous one-way dissection phase. In Figure 5.3 (a), we show the result of the one-way dissection on the matrix A when the root node has only three descendant subtrees. Note that each subtree is now disconnected from the others, is a binary tree and it can be identified by the Q-node on which it is rooted. If the nested dissection process is recursively re-applied on these matrix sub-blocks, we obtain the matrix structure shown in Figure 5.3 (b - c).

[FIGURE 5.2 OMITTED]

5.2. Preconditioners. The a priori knowledge of the structure of [[??].sub.22], allows us to decide what kind of preconditioner we can afford. If we are subjected to a strong limitation of the size of the memory in our computer, we can choose among several alternative preconditioners. We have a choice ranging from the diagonal matrix obtained by using the diagonal part of [M.sub.22] and the block Jacobi preconditioner using the diagonal blocks corresponding to the separators. The possibility of using the simplest choice of the diagonal of [M.sub.22] is sensible because the SPT algorithm places on this diagonal the biggest entries of the diagonal of M. In Section 6, we will give numerical evidence that this choice is very efficient for several test problems. Nevertheless, in the presence of strong discontinuities and anisotropies in the permeability function K, we are obliged to use either the diagonal Jacobi preconditioner or a block diagonal Jacobi.

[FIGURE 5.3 OMITTED]

6. Numerical experiments.

6.1. Test problems. We generated the test problems using four different domains. The first two are square unit boxes and in the second one we have four rectangular regions where the tensor K assumes different values. In Figure 6.1, we plot the geometry of Domain 1 and the boundary conditions. In Figure 6.2, we plot the geometry of Domain 2 and the boundary conditions. The values of the tensor K are chosen as follow

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The two remaining domains have an L-shape geometry. In Figure 6.3, we plot the geometry and the boundary conditions of the Domain 3. In Figure 6.4, we plot the geometry of the fourth and last Domain 4 and the relative boundary conditions: within the domain, we have four rectangular regions where the tensor K takes the same values defined for the second domain in (6.1).

In (2.2), we take the right-hand side f(x) = 0 in all our test problems. For the domains one and three, the tensor K in (2.1) is isotropic. For a given triangulation, its values are constant within each triangle and this value is computed following the law:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [r.sub.i] [member of] [0,1] are numbers computed using a random uniform distribution. For each domain, we generated 4 meshes using TRIANGLE . In Tables 6.1 and 6.2, we report, for our domains, the number [N.sub.T] of triangles, the number [N.sub.E] of edges, the number [N.sub.V] of vertices of each mesh, the number M.nnz of nonzero entries in the matrix M, the number A.nnz of nonzero entries in matrix A, and the corresponding value of h.

[FIGURE 6.1 OMITTED]

[FIGURE 6.2 OMITTED]

6.2. Practicalities. We analysed the reliability of the stopping criterion when we change the parameter d. In Figures 6.5 and 6.6, we display the behaviour of the estimates of the true relative energy norm of the error for Mesh 3, Domain 3 and Domain 4: for the other cases the behaviour is similar. In all our test we choose [eta] = h. The results show that the choice d = 10 is the best compromise between reliability and cost. When convergence is slow and there are regions where the slope changes rapidly, the choice d = 5 can be inaccurate. We reserve for future work the study of a self-adaptive technique which will change the value of d with the slope of the convergence curve. In Section 3, we discussed the opportunity of starting the estimate of the relative error only when [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. This makes it possible a reduction of the number of additional matrix-vector products. Both figures show an initial phase where the estimates have not been computed because of the introduction of this check on the absolute value of the error.

[FIGURE 6.3 OMITTED]

[FIGURE 6.4 OMITTED]

Moreover, to avoid an excessive number of additional matrix-vector products in the stopping criterion, we choose to update the value of the denominator [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] every 10 steps of the conjugate gradient method. The energy norm of [[bar.u].sup.(k).sub.2] converges quite quickly to the energy norm of the solution and this justifies our choice. In Figures 6.7 and 6.8, we see that after 25% of the iterations, the ratio [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is greater than 0.9.

6.3. Numerical results. We generated and ran all our test problems on a SPARC processor of a SUN ENTERPRISE 4500 (4CPU 400 MHertz, 2GByte RAM). In our test runs, we compare the performance of our approach with the performance of MA47 of the HSL2000 library . The package MA47 implements a version of the LD[L.sup.T] decomposition for symmetric indefinite matrices that takes advantage of the structure of the augmented system . The package is divided into three parts corresponding to the symbolic analysis where the reordering of the matrix is computed, the factorization phase, and the final solution using the triangular matrices.

[FIGURE 6.5 OMITTED]

[FIGURE 6.6 OMITTED]

Similarly, the null space algorithm which we implemented, can be subdivided into three phases: a first symbolic phase where the shortest path tree and the quotient tree are computed, a second phase where the projected Hessian system is solved by the conjugate gradient algorithm, and a final third phase where we compute the pressure. This enables us to compare the direct solver MA47 with the null space approach in each single phase.

Generally, in the test runs that we will present, we fix the parameter d in the stopping criterion to the value of 10. Nevertheless, we will show the influence of different choices on the parameter d on the stopping criterion using Mesh 3.

In Table 6.3, we give the CPU times (in seconds) and the storage (in MByte) required by MA47 and the CPU times (in seconds) of the null space algorithm where we use the diagonal of [M.sub.22] to precondition the projected Hessian matrix within the conjugate gradient algorithm.

[FIGURE 6.7 OMITTED]

[FIGURE 6.8 OMITTED]

From Table 6.3, we see that the null space algorithm performs better in the case of random permeability which can be a realistic simulation of an underground situation. Nevertheless, the global CPU time of the null space algorithm can be 10 times more than the CPU time of the direct solver. We point out that the MA47 storage requirement for the L and D factors grows with the size of the problem whereas the null space algorithm needs only the storage of the matrices M and A. We forecast that this will become even more favourable to the null space algorithm when we want to solve 3D simulations: for these problems the MA47 storage could become so large that we could be obliged to use an out-of-core implementation.

In Table 6.4, we display the behaviour of three different preconditioners on the conjugate gradient iteration number. We fixed the mesh (Mesh 3) and we use as a preconditioner one of the following matrices: diag([M.sub.22]), diag([[??].sub.22]) (the classical Jacobi), and blockdiag([[??].sub.22]) (block Jacobi) which has been computed using the quotient tree of the shortest path tree (see Section 5). For each preconditioner and each domain, we display the number of conjugate gradient iterations, the CPU time (in seconds) for the building of the matrix, and the CPU time (in seconds) spent by the conjugate gradient algorithm to solve the projected Hessian linear system. From the results of Table 6.4, we conclude that the simplest preconditioner diag([M.sub.22]) is faster even if the conjugate gradient algorithm does more iterations than the conjugate gradient algorithm using the other preconditioners. The Jacobi and the block Jacobi preconditioner building cost is very high and overwhelms the good performance of the conjugate gradient algorithm.

In Table 6.5, we report the CPU time (in seconds) spent by our implementation of the Algorithms A.1, A.2, and A.3. These algorithms build the nested dissection structure of [[??].sub.22] and the null space matrix Z. In particular, the nested dissection structure can be useful in building a parallel version of the matrix-vector products involving the implicit form of the matrix [[??].sub.22]. Moreover, it is possible to use the structure for the evaluation of the complexity of the explicit computation of the null space matrix Z. We observe that comparing Table 6.5 and Table 6.3, the computational cost of Algorithms A.1, A.2, and A.3 is comparable with the cost of the MA47 symbolic phase. Algorithms A.1, A.2, and A.3 are the basis on which it is possible to build a parallel version of the matrix-vector computation (3.8). In particular, it is possible to use Algorithm A.3 to generate a suitable data structure for the solution of the triangular systems involving the matrices L and [L.sup.T].

7. Generalization to the 3-D case. Almost all the results of the paper can be generalized to the case [omega] [subset] [R.sup.3]. The only exceptions are Lemma 5.2 and Corollary 5.1. In practice, we cannot predict a priori the pattern of the projected Hessian [[??].sub.22] when the graph A is not planar, as is the case of the 3D meshes. In particular, we can build the shortest path tree in the 3D case as described in Section 4. In the resulting tree (or forest) each parent has at most 3 children. Moreover, for a problem with only Dirichlet boundary conditions and an isotropic mesh the resulting forest has a number of trees proportional to the number of triangles meshing the boundary surfaces. This number is proportional to [h.sup.-2]. Therefore, the potential parallelism in solving the lower and upper triangular matrices in the null space algorithm increases of one order of magnitude from the two dimensional to the three dimensional case. The extension to the three dimensional domain of the preconditioners presented in Section 6.3 is straightforward. Furthermore, the increased parallelism suggest the possibility of reducing considerably the cost of building the block versions based on the quotient tree such as diag([[??].sub.22]) or blockdiag([[??].sub.22]).

In particular, we want to highlight that the absence of fill-in is promising when we need to solve Darcy's equations in 3D domains. It is reasonable to expect that the fill-in and complexity of a direct solver, applied to the augmented systems related to the approximation of Darcy's equations in 3D domains, would grow as O([m.sup.4/3]) and O([m.sup.2]) respectively [30, 29]. Instead, our algorithm does not have any fill-in and its complexity only depends on the condition number of the scaled projected Hessian matrix [Z.sup.T] MZ. We can reasonably assume that this condition number does not change with the dimension of the domain [omega], analogous to the behaviour of the classical finite-element method. Therefore, the stopping criterion will stop the conjugate gradient algorithm after a number of steps which is not dependent on the dimension of the domain [omega]. The performance analysis of our algorithm when applied to three dimensional domains will be considered in future work.

8. Conclusions. We have analysed a variant of the null space algorithm that takes full advantage of the relation between the structure of the augmented system and the network programming structure of the mixed finite-element approximation of the Darcy's equations. We remark that this method can be applied to any diffusion equation. We compared the performance of our prototype version of the algorithm with a well established direct solver and we concluded that even if our implementation can be 10 times slower than the direct solver, the absence of fill-in makes our code competitive for large problems.

We did not implement a parallel version of our algorithm. Nevertheless, we are confident that a parallel version will speed up the performance. In particular, the matrix-vector product involving [Z.sup.T]MZ can largely benefit from the parallelization, where the block structure of [L.sub.1] can be exploited.

Finally, once the null basis has been computed, it can be used to solve a sequence of problems (2.1-2.2) with different permeability tensors, i.e., different soil characteristics, since in this case only the block M changes and A stays the same. In contrast, direct solvers, like MA47, computing a Gaussian factorization of the augmented system or the Schur-complement based approaches cannot take advantage of this. The same applies to the case of time-dependent or nonlinear problems.

This advantage is fairly important in the three dimensional cases.

Appendix. In this appendix, we describe the major details regarding the data structures and algorithm implementations of Section 5.1.

The matrix A is the incidence matrix of the graph representing the unstructured bidimensional mesh used in the computations. The the row/column compressed data structure mentioned in Section 5.1  can thus be easily implemented as follows by using vector arrays. Each column of A has at most three nonzero entries; then, we store for each node (i.e. triangle) i the three arcs (i.e. edges) indices consecutively in position i, i + 1, i + 2. If the node corresponds to a triangle having an edge on the Neumann boundary, we explicitly store a 0 which means that the triangle has the root as a neighbour. Analogously, as each row j of A has only two nonzero entries, we store the two nodes (i.e. triangles) describing the arc j in position j and j + 1 of a vector. Again, we need to explicitly store some 0 values for the Dirichlet edges. The overhead of the explicit storage of these zeroes is less than the one we would have if we stored the pointers within the classical row and column compressed forms because the number of the boundary edges is of the order of the square root of the number of triangles in the mesh.

Before starting the one-way dissection phase, we visit the tree and identify the out-of-tree arcs in each Q-node that are within the Q-node and the arcs directly linking the external root (node 0) to a Q-node. These arcs are stored in sep.list which is a queue data structure. The phase relative to the one-way dissection is implemented by the following algorithm:
```ALGORITHM A.1.

procedure root.sep.count(sep.size)
for each i [member of] children (0) do
[Q.sub.i] = chain(i); sep.size([Q.sub.i]) = 0 ;
for [Q.sub.k] = [Q.sub.i]: Q-last([Q.sub.i]), do
for each l = (p, q) [member of] Q-star([Q.sub.k]) with
chain(p)=[Q.sub.k] do
if q > last(Q-last([Q.sub.i])) or q < first([Q.sub.i]) then
sep.size([Q.sub.i]) = sep.size([Q.sub.i]) + 1;
end if;
end for each;
end for;
end for each;
end procedure.

for [Q.sub.k] = [Q.sub.root]: Q-last([Q.sub.root]), do
for each l = (p, q) [member of] Q-star([Q.sub.k]) with
chain(p)= [Q.sub.k] do
if mask(l) = 0 and {q > last(Q-last([Q.sub.i])) or
q < first([Q.sub.i])} then
insert l in sep.list ;
sep.size = sep.size + 1;
end if;
end for each;
end for;
end procedure.

root.sep.count(sep.size)
sort Q-children([Q.sub.root]) in decreasing order of sep.size ;
for each child [member of] Q-children([Q.sub.root]) do
end for each;
end procedure.
```

Because each out-of-tree arc can be counted only twice, the complexity of Algorithm A.1 is O(2[xi]), where [xi] is the number of nonzero entries in the submatrix [L.sub.2] and is equal to twice the number of the out-of-tree arcs.

From Corollary 4.1, Corollary 5.1, and Lemma 5.1 it follows that each block i, of size sep.size(i), contains out-of-tree arcs that have fundamental cycles intersecting each other. Therefore, each of these blocks corresponds to a full diagonal block in [[??].sub.22]. The order of the diagonal block is equal to the corresponding value of sep.size. The out-of-tree arcs in sep.list form the external separator. The external separator identifies a curve on the mesh. Thus, because the graph is planar, the size [chi] of the external separator will be O([square root of m]) (m is the number of triangles in the mesh) [30, 29].

Finally, we renumber the out-of-tree arcs, such that the arcs in the external separator are the last ones and the arcs lying within a descendant subtree of root are consecutive. This is equivalent to permute the rows of the matrix [L.sub.2] so that we have a block diagonal submatrix followed by a block of rows connecting the previous diagonal blocks.

In the following Algorithm A.2, we denote by [Q.sub.root] the root of one of the Q-subtrees obtained from the one-way dissection phase. The algorithm proceeds as follows. Basically, we visit each subtree and we reorder the Q-children list of each Q-node such that the first child is the root of the subtree with the least number of nodes, and the other children are sorted in increasing order with respect to the number of nodes in their respective subtrees. Moreover, we label each out-of-tree arc l with the quotient tree ancestor QT-ancestor(l), the Q-node closing the fundamental cycle of l in the spanning tree.

By means of QT-ancestor and by the data structure of the spanning tree, we can implicitly describe the structure of the null space matrix Z.
```ALGORITHM A.2.

QT-ancestor)
if Q-children([Q.sub.root]) [not equal to] 0 then
[Q.sub.2] tail of Q-children([Q.sub.root]) list;
sep.size([Q.sub.2]) = 0;
for each l [member of] Q-star([Q.sub.root]) do
insert l in sep.list ;
sep.size([Q.sub.2]) = sep.size([Q.sub.2]) +1;
QT-ancestor(l) = [Q.sub.root];
end if
end for each;
end if;
end procedure
```

Algorithm A.2 takes advantage of the binary structure of the quotient subtree and of the reordering of the Q-children list whose first entry has the least number of descendants. These two properties allow the possibility of separating the two subtrees rooted in the children of the current [Q.sub.root] visiting only one of them and the Q-star of [Q.sub.root]. Moreover, at each recursion we visit and separate a subtree which, in the worst case, has half of the nodes of the tree in it. Let [N.sub.Q] be the number of nodes in one of the subtrees obtained after the one-way dissection phase. The Nested Dissection phase applied to this subtree will visit all the levels of the tree. For each level, algorithm A.2 will separate the subtrees with the least number of descendants. In the worst case, when the number of nodes in each subtree is half of the nodes of the previous subtree which contains it, the number of levels is O (log [N.sub.Q]). Thus, the worst complexity of algorithm A.2 is O([N.sub.Q] log [N.sub.Q]).

If the tree is unbalanced, i.e. only one of the two children of a Q-node is a branching Q-node, the complexity of algorithm A.2 is O([N.sub.Q]). Using the sep.size and sep.list, we can build the data structure ND, which is composed of ND.list and ND.pointer. ND.list contains the entries of sep.list in reverse order. ND.pointer([Q.sub.i]) contains the pointer to the first entry in ND.list of the separator of the tree rooted in [Q.sub.i]. In the following algorithm, we denote by [N.sub.out] the number of out-of-tree arcs. We point out that the number of entries in sep.list is equal to [N.sub.out].
```ALGORITHM A.3.

procedure nested.dissection(sep.list, sep.size, ND, QT-ancestor)
for l = 1:[N.sub.out] do
ND.list([N.sub.out] - l +1) = sep.list(l);
temp = QT-ancestor([N.sub.out] - l +1);
QT-ancestor([N.sub.out] - l +1) = QT-ancestor(l);
QT-ancestor(l) = temp;
end for;
size = 0;
for [Q.sub.i] = [Q.sub.root] : Q-last([Q.sub.root]) do
size = size +sep.size([Q.sub.i]);
ND.pointer([Q.sub.i]) = [N.sub.out] - size + 1;
end for;
end procedure
```

The ND.list gives the permutation that will reorder rows and columns of [[??].sub.22] in a nested dissection order. Finally, we can build the pattern of the upper triangular part of [[??].sub.22] by using the following algorithm A.4, which takes advantage of the consecutive order of the Q-nodes forming a tree, obtained by applying a depth first search on T/B, and of the nested ordering of the out-of-tree arcs in ND. We point out that for each [Q.sub.i], the rows and the columns with indices between ND.pointer([Q.sub.i]) and ND.pointer([Q.sub.i]+1) form a full diagonal block in [[??].sub.22]. This pattern of the [[??].sub.22] is described by the usual compressed row collection data structure (irow,jcol) where for the row i of [[??].sub.22] is stored in jcol(irow(i): irow(i+1)-1).
```ALGORITHM A.4.

procedure insert.columns(ND, QT, irow, jcol, jcount, count)
for [Q.sub.k] = QT+1: Q-last(QT) do;
for each l = (p, q) [member of] Q-star([Q.sub.k]) with chain(p)=
[Q.sub.k] do
if q > last(Q-last([Q.sub.i])) or q < first([Q.sub.i]) then
jcount = jcount + 1;
jcol(jcount) = l;
count = count + 1;
end if;
end for each;
end for;
end procedure;
procedure [[??].sub.22]-pattern(ND, QT-ancestor irow, jcol)
irow(1) = 1; jcount = 0;
for i = 1:[N.sub.out] do
count = 0;
[Q.sub.2] tail of Q-children(QT-ancestor(i)) list;
for k = i:ND.pointer([Q.sub.1]+1) - 1 do
jcount = jcount + 1;
jcol(jcount) = ND.list(k);
count = count + 1;
end for;
let ([p.sub.i], [q.sub.i]) the ends of arc i;
Q[p.sub.i] = chain([p.sub.i]); Q[q.sub.i] = chain([g.sub.i]);
if QT-ancestor(i) [not equal to] Q[p.sub.i] and QT-ancestor(i)
[not equal to] Q[q.sub.i] then
insert.columns(ND, QT-ancestor(i), irow, jcol, jcount,
count)
irow(i+1) = irow(i) + count;
else
if [q.sub.i] > last(Q-last([Q.sub.1])) or [q.sub.i]
< first([Q.sub.1]) then
insert.columns(ND, [Q.sub.1], irow, jcol, jcount,
count);
irow(i+1) = irow(i) + count;
else
insert.columns(ND, [Q.sub.2], irow, jcol, jcount,
count);
irow(i+1) = irow(i) + count;
end if;
end if;
end for;
end procedure
```

Finally, we observe that the final value of jcount gives the total number of nonzero entries of the upper triangular part of [[??].sub.22]. Therefore, without the explicit computation of the real values of the nonzero entries in [[??].sub.22] we can predict the total amount of memory we need for storing the matrix.

REFERENCES

 R. ALBANESE AND G. RUBINACCI, Integral formulation for 3D eddy-current computation using edge elements, IEEE Proc. A, 135 (1988), pp. 457-462.

 P. ALOTTO AND I. PERUGIA, Mixed finite element methods and tree-cotree implicit condensation, CALCOLO, 36 (1999), pp. 233-248.

 R. AMIT, C. A. HALL, AND T. A. PORSCHING, An application of network theory to the solution of implicit Navier-Stokes difference equations, J. Comp. Phys., 40 (1981), pp. 183-201.

 M. ARIOLI, A stopping criterion for the conjugate gradient algorithm in a finite element method framework, Numer. Math., 97 (2004), pp. 1-24.

 M. ARIOLI AND L. BALDINI, A backward error analysis of a null space algorithm in sparse quadratic programming, SIAM J. Matrix Anal. and Applies., 23 (2001), pp. 425-442.

 M. ARIOLI, J. MARYSKA, M. ROZLOZNIK, AND M. TUMA, Dual variable methods for mixed-hybrid finite element approximation of the potential fluid flow problem in porous media, to appear in Electron. Trans. Numer. Anal., 2005. Special Volume on Saddle Point Problems: Numerical Solution and Applications. Electron. Trans. Numer. Anal., 22 (2006), pp. 17-40, http://etna.mcs.kent.edu/vol.22.2006/pp17-40.dir/pp17-40.html.

 D. N. ARNOLD, R. S. FALK, AND R. WINTHER, Preconditioning in H(div) and applications, Math. Comp., 66 (1997), pp. 957-984.

 M. BENZI, G. H. GOLUB, AND J. LIESEN, Numerical solution of saddle point problems, Acta Numerica, 14 (2005), pp. 1-137.

 M. W. BERRY, M. T. HEATH, I. KANEKO, M. LAWO, R. J. PLEMMONS, AND R. C. WARD, An algorithm to compute a sparse basis of the null space, Numer. Math., 47 (1985), pp. 483-504.

 O. BIRO, K. PREIS, G. VRISK, K. R. RICHTER, AND I. TICAR, Computation of 3D magnetostatic fields using a reduced scalar potential, IEEE Trans. Magnetics, 29 (1993), pp. 1329-1332.

[111 F. BREZZI AND M. FORTIN, Mixed and Hybrid Finite Element Methods, vol. 15, Springer-Verlag, Berlin, 1991.

 R. BRUALDI AND H. RYSER, Combinatorial Matrix Theory, Cambridge University Press, 1991.

 P. G. CIARLET, The Finite Element Method for Elliptic Problems, North-Holland, Amsterdam, 1978.

 T. F. COLEMAN AND A. POTHEN, The null space problem I. Complexity, Algebraic Discrete Math., 7 (1986), pp. 527-537.

 --, The null space problem II. Algorithms., Algebraic Discrete Math., 8 (1987), pp. 544-563.

 N. DEO, G. M. PRABHU, AND M. S. KRISHNA MOORTHY, Algorithms for generating fundamental cycles in a graph, ACM, Trans. Math. Softw.,, 8 (1982), pp. 27-42.

 I. S. DUFF, A. M. ERISMAN, AND J. REID, Direct Methods for Sparse Matrices, Oxford University Press, Oxford, UK, 1989.

 I. S. DUFF, N. I. M. GOULD, J. K. REID, J. A. SCOTT, AND K. TURNER, The factorization of sparse symmetric indefinite equations, J. Numer. Anal., 11 (1991), pp. 181-204.

 G. GALLO AND S. PALLOTTINO, Shortest path methods: a unifying approach, Mathematical Programming Study, 26 (1986), pp. 38-64.

 J. R. GILBERT AND M. T. HEATH, Computing a sparse basis for the null space, Algebraic Discrete Math., 8 (1987), pp. 446-459.

 P. H. GILL, W. MURRAY, AND M. H. WRIGHT, Practical Optimization, Academic Press, London, UK, 1981.

 G. H. GOLUB AND G. MEURANT, Matrices, moments and quadrature II; how to compute the norm of the error in iterative methods, BIT, 37 (1997), pp. 687-705.

 A. GREENBAUM, Iterative Methods for Solving Linear Systems, SIAM, Philadelphia, PA, 1997.

 C. A. HALL, Numerical solution of Navier-Stokes problems by the dual variable method, SIAM, J. Alg. Disc. Meth., 6 (1985), pp. 220-236.

 M. HESTENES AND E. STIEFEL, Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Standards, 49 (1952), pp. 409-436.

 HSL, Harwell Software Library. A collection of Fortran codes for large scale scientific computation. http://www.ese.elre.ac.uk/Activity/HSL, 2000.

 A. ITAI AND M. RODEH, Finding a minimum circuit in a graph, SIAM, J. Comput., 7 (1978), pp. 413-423i.

 L. KETTUNEN, K. FORSMAN, AND A. BOSSAVIT, Gauging in Whitney spaces, IEEE Trans. Magnetics, 35 (1999), pp. 1466-1469.

 G. L. MILLER, S.-H. TENG, W. THURSTON, AND S. A. VAVASIS, Geometric separators for finite-element meshes, SIAM J. Sci. Comput., 19 (1998), pp. 364-386.

 G. L. MILLER AND W. THURSTON, Separators in two and three dimensions, in STOC '90: Proceedings of the twenty-second annual ACM symposium on Theory of computing, New York, NY, USA, 1990, ACM Press, pp. 300-309.

 K. G. MURTHY, Network Programming, Prentice Hall, Englewood Cliffs, NJ, 1992.

 J. C. NEDELEC, Mixed finite elements in [R.sup.3], Numer. Math., 35 (1980), pp. 315-341.

 A. POTHEN, Sparse null basis computations in structural optimization, Numer. Math., 55 (1989), pp. 501-519.

 R. SCHEICHL, A decoupled iterative method for mixed problems using divergence free finite element, Tech. Report maths0011, University of Bath, 2000.

 --, Iterative Solution od Saddle Point Problems Using Divergence-free Finite Elements with Applications to Groundwater Flow, PhD thesis, University of Bath, 2000.

 J. R. SHEWCHUCK, A two-dimensional quality mesh generator and Delaunay triangulator. http://www.cs.cmu.edu/afs/cs.cmu.edu/project/quake/public/www/triangle.html, 1996.

 Z. STRAKOS AND P. TICHY, On error estimation by conjugate gradient method and why it works in finite precision computations, Electron. Trans. Numer. Anal., 13 (2002), pp. 56-80, http://etna.mcs.kent.edu/vol.13.2002/pp56-80.dir/pp56-80.html.

 --, Error estimation in preconditioned conjugate gradients, BIT, (to appear).

 R. E. TARJAN, Data Structures and Network Algorithms, SIAM, Philadelphia, PA, 1983.

* Received January 12, 2005. Accepted for publication September 5, 2005. Recommended by M. Benzi. The work of the first author was supported in part by EPSRC grants GR/R46641/01 and GR/S42170. The work of second author was supported by EPSRC grant GR/R46427/01 and partially by the CNR Short-Term Mobility Programme, 2005.

M. ARIOLI ([dagger]) AND G. MANZINI ([double dagger])

([dagger]) Rutherford Appleton Laboratory, Chilton, Didcot, Oxfordshire, OX11 0QX, UK (M.Arioli@rl.ac.uk).

([double dagger]) Istituto di Matematica Applicata e Tecnologia Informatica C.N.R., via Ferrata 1, 27100 Pavia, Italy (marco.manzini@imati.cnr.it).
```TABLE 5.1
Labels of nodes in the tree of Figure 5.1.

Node index Parent Children list Chain Index depth

0 0 1,21,31 0 0
1 0 2 1 1
2 1 3 1 2
3 2 4 1 3
4 3 5,10 1 4
5 4 6 2 5
6 5 7 2 6
7 6 8 2 7
8 7 9 2 8
9 8 2 9
10 4 11 3 5
11 10 12 3 6
12 11 13 3 7
13 12 14,17 3 8
14 13 15 4 9
15 14 16 4 10
16 15 4 11
17 13 18 5 9
18 17 19 5 10
19 18 20 5 11
20 19 5 12
21 0 22 6 1
22 21 23 6 2
23 22 24 6 3
24 23 25,28 6 4
25 24 26 7 5
26 25 27 7 6
27 26 7 7
28 24 29 8 5
29 28 30 8 6
30 29 8 7
31 0 32 9 1
32 31 33 9 2
33 32 9 3

TABLE 5.2
Labels of the Q-nodes in Figure 5.2.

Q-node Q-parent First, last Depth
node of chain

1 0 1,4 1
2 1 5,9 2
3 1 10,13 2
4 3 14,16 3
5 3 17,20 3
6 0 21,24 1
7 6 25,27 2
8 6 28,30 2
9 0 31,33 1

Q-node Q-last Q-star Q-children

1 5 4,8 2,3
2 2 1,2,5,6,7
3 5 5,6,8 4,5
4 4 7,9,10
5 5 3,9,10
6 8 4,11 7,8
7 7 13,14
8 8 12,13,14
9 9 11,12

TABLE 6.1
Data relative to the meshes for domains 1 and 2.

Mesh 1 Mesh 2 Mesh 3 Mesh 4

NT 153 1567 15304 155746
NE 246 2406 23130 234128
NV 94 840 7827 78383
M.nnz 1164 11808 114954 1168604
A.nnz 429 4599 45601 466319
h 0.2090 0.0649 0.0225 0.0069

TABLE 6.2
Data relative to the meshes for domains 3 and 4.

Mesh 1 Mesh 2 Mesh 3 Mesh 4

NT 156 1494 15206 150033
NE 251 2305 23102 225599
NV 96 812 7843 75567
M.nnz 1187 12269 114662 1125797
A.nnz 442 4386 45462 449281
h 0.1625 0.0590 0.0186 0.0063

TABLE 6.3
MA47 vs null space algorithm: CPU times (in seconds) and storage
(in Mbytes)

Mesh Domain MA47
Symbolic Factorization

1 1 0.008 0.013
1 2 0.008 0.008
1 3 0.008 0.010
1 4 0.007 0.007
2 1 0.088 0.173
2 2 0.088 0.101
2 3 0.079 0.152
2 4 0.084 0.095
3 1 1.058 3.028
3 2 1.070 1.577
3 3 1.035 2.393
3 4 1.041 1.310
4 1 14.63 264.6
4 2 14.55 49.03
4 3 13.58 84.33
4 4 13.54 34.49

Mesh MA47
Solver Storage

1 0.001 0.048
1 0.001 0.035
1 0.001 0.032
1 0.001 0.040
2 0.006 0.634
2 0.005 0.458
2 0.006 0.556
2 0.005 0.418
3 0.114 9.15
3 0.084 6.06
3 0.111 8.14
3 0.081 5.72
4 1.543 132.32
4 1.091 81.89
4 1.481 118.90
4 1.067 77.62

Mesh null space algorithm

Symbolic CG(#Iterations) Solve

1 0.002 0.013 (12) 0.002
1 0.002 0.013 (14) 0.002
1 0.001 0.013 (13) 0.001
1 0.002 0.016 (17) 0.001
2 0.017 0.145 (19) 0.018
2 0.007 0.275 (35) 0.019
2 0.011 0.145 (20) 0.018
2 0.009 0.265 (37) 0.017
3 0.911 4.681 (41) 0.290
3 0.087 11.63 (101) 0.265
3 0.290 4.945 (44) 0.280
3 0.088 12.06 (106) 0.268
4 7.8 210.6 (176) 2.941
4 1.179 463.0 (390) 2.855
4 3.887 298.7 (272) 2.748
4 1.056 445.6 (395) 2.733

TABLE 6.4
Comparison between the preconditioners: CPU times and # Iterations

Domain Preconditioner CG #Ite- CPU Time (in seconds)
rations
Building CG solve

1 diag([M.sub.22]) 41 0.026 4.681
1 diag([[??].sub.22]) 28 13.60 3.212
1 blockdiag([[??].sub.22]) 19 15.06 2.854
2 diag([M.sub.22]) 101 0.025 11.63
2 diag([[??].sub.22]) 79 13.16 8.859
2 blockdiag([[??].sub.22]) 69 12.13 10.18
3 diag([M.sub.22]) 44 0.025 5.049
3 diag([[??].sub.22]) 26 13.94 2.954
3 blockdiag([[??].sub.22]) 19 15.63 3.122
4 diag([M.sub.22]) 106 0.025 12.06
4 diag([[??].sub.22]) 92 13.10 10.04
4 blockdiag([[??].sub.22]) 79 13.23 12.37

TABLE 6.5
CPU Time (in seconds) for the computation of nested dissection
structure for null space matrix Z.

Mesh Domain CPU Time (in Seconds)
Algorithms A.1,A.2,A.3

1 1 0.001
1 2 0.001
1 3 0.002
1 4 0.001
2 1 0.025
2 2 0.009
2 3 0.022
2 4 0.006
3 1 1.225
3 2 0.159
3 3 0.982
3 4 0.178
4 1 29.89
4 2 4.775
4 3 28.53
4 4 5.118
```
COPYRIGHT 2006 Institute of Computational Mathematics
No portion of this article can be reproduced without the express written permission from the copyright holder.