# Parallel Solution Methods and Preconditioners for Evolution Equations.

1 Introduction

Traditional solution methods for time-dependent partial differential equations are based on time-stepping methods. These methods can be explicit or implicit, however they are by nature sequential in time, i.e., need many time steps to be executed one after another, which make these methods slow, thus, demanding much computing time. Parallelism might occur only within each timestep. For reasons of numerical stability, explicit methods may require the use of very small time-steps, which, for real life problems, may require infeasibly many time steps. Implicit methods are stable but the timesteps should still be chosen small enough, to achieve a sufficiently small time-discretization error. Alternatively, higher order stable methods can be used which, however, add even further to the computational complexity.

In the time-stepping framework parallelism is achieved within each timestep. For implicit methods, in each time interval one has to solve a system of equations, usually of elliptic type, thereby applying known solution techniques, with enough degree of parallelism, such as some multilevel or multigrid techniques.

The search for parallelism in time has a long history. As an example, we refer to , where the parallelism is sought between the steps of suitably constructed Diagonally Implicit Runge-Kutta (DIRK) methods, originally constructed for solving stiff ordinary differential equations (cf. e.g., ). Still, the nature of the computations remains serial and within the time-stepping framework. However, there exist alternatives to time-stepping methods, where one still needs to solve systems, similar to those arising in the implicit methods, but in parallel.

In many practically important applications, such as in optimal control, the control function is periodic, for instance, an alternating current in electromagnetic problems. Then the equation becomes time-harmonic and the solution can be approximated with a truncated Fourier series expansion. Due to the orthogonality of the trigonometric functions, for linear problems the computation of the Fourier coefficients separate and one can compute the solution for each period fully in parallel. Hence, the solution process is perfectly parallelizable across the different frequencies. Due to the large size of the discretized systems one must use iterative solution methods, see e.g. , , , . The preconditioner used should then also be well parallelizable.

For each frequency, a system on two-by-two or four-by-four block matrix form arises. In a series of publications ([6, 7, 8, 10, 12]) it has been demonstrated that for such problems a very efficient preconditioner, based on the PREconditioned Square matrix Block (PRESB) method, exists and leads to very tight eigenvalue bounds and, hence, few iterations. This holds uniformly with respect to mesh size, control cost and other regularization parameters as well as to the problem parameters. The method has been shown to outperform other published methods for the problems considered. We argue further, that the structure of the PRESB method allows for additional levels of parallelization, achieved by applying well-developed and efficient computational toolboxes for the basic operator equations arising in the PRESB method, such as the Algebraic Multigrid (AMG) method, provided via publicly available scientific computing libraries. To illustrate the idea we consider an optimal control problem, constrained by an evolutionary partial differential equation.

For nonlinear problems one can use ideas, presented in [3, 9, 37] and utilize a two-grid or multi-level method, where the nonlinear equation is solved on a coarse grid, then the solution is interpolated to the fine grid and a corrected solution of the linearized Newton type equation is computed just once, which normally suffices to get a solution with an error of the same order as the fine mesh discretization error. Such approaches can save much computational labor.

The idea of using truncated Fourier series expansions is introduced in Section 2. We present an introductory example of an optimal control problem for an evolution equation in Section 3. The description of the solution method and the preconditioner is found in Section 4. Section 5 comprises a short presentation of the suggested approach for nonlinear problems. Some discussion of the computational and communication costs of the methods is given in Section 6. An illustrating example for the heat equation with numerical test results are presented in Section 7. We also shortly refer to an application of the method for solving eddy current electromagnetic problems, as presented in .

2 Truncated Fourier series expansions for linear evolution equations

Consider an evolution equation

[mathematical expression not reproducible], (2.1)

where L is a time-independent linear and coercive elliptic type differential operator with boundary and initial conditions u(x, t) = 0 on [partial derivative][OMEGA], u(x, 0) = [u.sub.o](x).

2.1 Truncated Fourier expansion

As is well known, every sufficiently smooth periodic function, say F(t), that is periodic on -T [less than or equal to] t [less than or equal to] T, has a Fourier series expansion

[mathematical expression not reproducible]. (2.2)

Utilizing the Fourier expansion representation, as we show in the sequel, enables us to strongly increase the degree of parallelism of the underlying solution methods. As the problem to solve is, in general, not periodic, we explore the option to extend it and make it periodic.

First we recall that defining F(t) on -T [less than or equal to] t [less than or equal to] T to be a periodic extension of some given function f (t), defined on 0 [less than or equal to] t [less than or equal to] T can be done in many different ways. There are two periodic extensions of f (t) that turn out to be particularly useful, namely,

[mathematical expression not reproducible]

We note, that when F is even, it has a cosine Fourier expansion and when F is odd, it has a sine Fourier expansion. Figure 1 illustrates an even extension of f (x), that is, the solution on the interval [T, 2T] is the mirror of the solution on [0, T].

Following the above lines, in order to get a periodic solution we extend the problem (2.1) symmetrically to the time interval (0, 2T], where f (x, 2T - t) = -f (x, t), 0 < t [greater than or equal to] T. Hence, it holds also that u(x, 2T-t) = u(x, t), 0 < t [less than or equal to] T. The differential equation then becomes an evolution equation with periodic conditions,

[mathematical expression not reproducible]. (2.3)

This makes it possible to use a truncated Fourier series,

[mathematical expression not reproducible], (2.4)

where [[omega].sub.k] = k [[pi]/T], k = 0, 1, ..., N are the angular frequencies and [u.sub.k] (x) are the Fourier coefficients at x [member of] [OMEGA].

The expansions (2.2) and (2.4) represent the trigonometric form of a Fourier series. We consider, however, the more general complex exponential form of a Fourier series. The expression [a.sub.k] cos(k[theta]) + [b.sub.k] cos(k[theta]) can be stated using exponentials and we can rewrite (2.2) as

[mathematical expression not reproducible]

where [c.sub.k] = 1/2([a.sub.k] - i[b.sub.k]) and [c.sup.*.sub.k] = 1/2([a.sub.k] + i[b.sub.k]) and i is the imaginary unit. The complex formulation of a Fourier series is considered to be precursor of the Fourier transform which attempts to apply Fourier analysis for non-periodic functions. The complex form is widely used by engineers, for example in circuit and control theory.

Let assume that the solution of (2.3) is of even more general form, namely,

[mathematical expression not reproducible] (2.5)

Substituting (2.5) in the differential equation (2.3), we obtain

[mathematical expression not reproducible]

A multiplication with cos([[omega].sub.k]t) respectively with sin([[omega].sub.k]t), and integration over the time interval (0, 2T), thereby using the orthogonality of the trigonometric functions, results in the equations

[mathematical expression not reproducible] (2.6)

Here we have used the elementary relations

[mathematical expression not reproducible]

and the symmetry of f (x, t) = f (x, 2T-t), which gives [f.sub.s.sup.k] (x) = 0.

A proper choice of N could be made by trial and error. In some problems one can instead utilize classical truncation error expressions and estimate the solution error via the truncation error. We refer to [21, 23] for such an estimate showing that the error in the Fourier expansion decreases at least as O(1/[square root of N]), N [right arrow] [infinity]. In practice, the solution is often much smoother and can be represented by a few-term Fourier expansion, so it is not needed to choose a large value of N. However, as shown below, the bigger N, the more parallelism is enabled in the solution method.

2.2 Galerkin variational FEM

Let [u.sup.c.sub.k] [member of] U, [u.sup.s.sub.k] [member of] V, where U and V are appropriate subspaces of [H.sub.0]([OMEGA]) and let [U.sub.h] and [V.sub.h] be some finite-dimensional (finite element) subspaces. Let M be the corresponding Galerkin mass matrix and L the corresponding stiffness matrix, for the inner product in [L.sub.2]([OMEGA]). Then, after multiplying the equations (2.6) with the basis functions in [U.sub.h] and [V.sub.h] and forming the finite element (FEM) variational formulation of (2.6), the equations take the algebraic form

[mathematical expression not reproducible] (2.7)

where [u.sup.c.sub.k], [u.sup.s.sub.k] are the discrete FE solution vectors and [f.sup.c.sub.k]--the vector representation of the source function corresponding to wk and the chosen basis functions.

For k = 0 we get the solutions L[u.sub.0] = [f.sup.c.sub.k], L[v.sub.0] = 0, i.e., [v.sub.0] = 0. We 'multiply the second equation in (2.7) by the complex unit number i to get the block matrix system,

[mathematical expression not reproducible] (2.8)

Clearly, the system (2.8) can be further reduced to the fourth order problem,

([LM.sup.-] L + [[omega].sup.2.sub.k] M [u.sup.c.sub.k] = [LM.sup.-1] [f.sup.c.sub.k]. (2.9)

However, to avoid working with fourth order operators we keep the two-by-two block form in (2.8). Note that the matrix in (2.8) is indefinite and Hermitian. If one prefers to use only real arithmetics, the two-by-two block complex matrix can be reformulated as a four-by-four real-valued block matrix, see e.g.  for an example of this.

3 An optimal control problem with an evolution state equation

Optimal control problems with a PDE as state equation have been studied in many publications. Such publications, related to this paper, are for instance [17, 18, 22, 23, 29] and the previously mentioned [6, 7, 8, 12]. Here we emphasize the case when the constraint is an evolutionary equation.

In general, initially before the exponential decay caused by the elliptic nature of the problem has had any significant damping influence, the solution to an evolution equation shows a less smooth behaviour. When the diffusion coefficient in L is relatively small, this may occur particularly late in time. As t [right arrow] [infinity], eventually the solution becomes close to the stationary solution, L[u.sub.[infinity]] = f. In some problems, such as electrical machineries, the primary aim is to find the stationary solution to the time-dependent problem. Then one aim is to find an initial value function which makes the convergence to the stationary solution smooth for all times, 0 < t < T. In addition, we want to find a source control function such that the solution for t = T is close to some given, target solution [u.sub.d](x).

However, we pose here the task to find a solution u(x, t) of the evolution equation for all t,

[mathematical expression not reproducible]

and a control function v(x, t) such that u(x, t) is as close as possible to a desired solution [u.sub.d](x, t). The boundary conditions are chosen to be homogeneous only for simplicity, [OMEGA] is a given bounded Lipschitz domain and L is an elliptic operator, i.e. satisfies a coercivity condition, (Lu, w) [greater than or equal to] [alpha](u,w), [alpha] > 0, for all w [member of] [H.sup.1]([OMEGA]). For example, L can be the convection-diffusion operator, Lu = -v[DELTA]u + w x [nabla]u + cu, where v > 0 and c - 1/2 [nabla] x w [greater than or equal to] 0. The solution is Controlled by the source function v [member of] [L.sup.2]([OMEGA]) x (0, T] which, for simplicity we assume to be distributed in the whole domain and also to satisfy the homogeneous boundary conditions.

Such optimal control problems must be regularized to obtain a well-posed problem. For this purpose, we use the simplest Tikhonov regularization form. Let [lambda] be a Lagrange multiplier to handle the constraint equation, [partial derivative]u/[partial derivative]t + Lu = v. The optimal control problem can then be formulated as a saddle type functional,

[mathematical expression not reproducible]

where the Lagrangian functional J(u, v, [lambda]) equals

[mathematical expression not reproducible] (3.1)

Here [beta] > 0 is the regularization parameter for the cost of the control function v. The first order necessary conditions, [[nabla].sub.u, v, [lambda]] T (u, v, [lambda]) = 0, are also sufficient for the existence of a solution. Here [partial derivative]T/[partial derivative]v = 0 leads to the simple relation, - [lambda] + [beta]v = 0, which enables us to reduce the arising linear system. Hence, substituting [lambda] = [beta]v in (3.1) gives

[mathematical expression not reproducible]

where [L.sup.*] is the adjoint operator to L and where we have made use of the homogeneous boundary conditions. The first order conditions give now

[mathematical expression not reproducible] (3.2)

Where [??] = [square root of [beta]]v. Here u and v satisfy the given homogeneous boundary conditions but there are no initial conditions prescribed.

Choosing a proper set of finite element basis functions and multiplying the equations in (3.2)(right) with these basis functions as test functions, we obtain the semi-discrete (FEM) formulation, corresponding to (3.2):

[mathematical expression not reproducible] (3.3)

Here M is the mass matrix corresponding to the basis functions and the [L.sub.2]-inner product, L is the stiffness matrix corresponding to the operator L, and u(t), [u.sub.d](t), [??](t) denote the finite element vectors, corresponding to the interior node points in [[OMEGA].sub.h], the discrete finite element mesh.

In the sequel, dropping for simplicity the ~ sign, we next approximate the time-dependent vectors u(t), [u.sub.d](t), [??](t) with truncated Fourier series expansions,

[mathematical expression not reproducible]

where [[omega].sub.k] are the frequencies [mathematical expression not reproducible] has its real-valued part equal to [mathematical expression not reproducible] and its imaginary part - equal to [mathematical expression not reproducible]

Due to the orthogonality of the trigonometric functions, for linear problems, as we consider here, the computations of the different Fourier coefficients separate, so it suffices to consider the resulting equations for just one index k. From (3.3), it follows then,

[mathematical expression not reproducible]

Here we separate the real and the imaginary parts for each equation, to get

[mathematical expression not reproducible]

or, in block matrix form,

[mathematical expression not reproducible]. (3.4)

Note that the off-diagonal blocks are skew-Hermitian. Introduce the notation

[mathematical expression not reproducible]

Then the block matrix in (3.4) can be written as,

[mathematical expression not reproducible]

4 Preconditioning method

Following the approach used in several earlier papers, [6, 7, 8, 12] as a preconditioner to A we choose

[mathematical expression not reproducible] (4.1)

where [mathematical expression not reproducible]. The preconditioned matrix takes the form

[mathematical expression not reproducible].

Further, as shown below, [C.sup.-1] possesses a block-factorized form

[mathematical expression not reproducible]

so,

[mathematical expression not reproducible]

Therefore, besides some matrix-vector multiplication, an action of the preconditioned matrix, involves only a solution with the elliptic type matrices M + [L.sup.*] and M + L, which consist of blocks that are a sum of mass matrices and discretized elliptic operators.

In addition, as outlined below and has been shown in the above cited earlier publications of the first and the second author, the preconditioning leads to tightly clustered eigenvalues and, hence, very fast convergence.

We present now the preconditioner for a matrix in the general form

[mathematical expression not reproducible]

where A, [B.sub.1] and [B.sub.2] are generic matrices of order n x n, A is assumed to be symmetric and positive definite and A + [B.sub.i], i = 1, 2 are nonsingular. Let

[mathematical expression not reproducible] (4.2)

be a preconditioner to A, to be used in a Krylov subspace type of iteration method, such as GMRES  or MINRES . Given a linear matrix preconditioning equation

[mathematical expression not reproducible], (4.3)

by changing the sign of the second equation and adding the first, the system can be written in the equivalent form,

[mathematical expression not reproducible]

where z = x + y. Hence, z = [(A + [B.sub.2]).sup.-1](f-g) and (A + [B.sub.1])x = f - [B.sub.2]z. Therefore, the algorithm to compute the solution of (4.3) can be written as (1) Solve (A + [B.sub.2])z = f--g, (2) Compute [??] = f - [B.sub.2]z, (3) Solve (A + [B.sub.1]) x = [??], (4) Compute y = z - x. Alternatively, it is readily seen that

[mathematical expression not reproducible]

so,

[mathematical expression not reproducible]

Hence, besides some vector additions, the algorithm involves a solution of a linear systems with A + [B.sub.2], a matrix-vector multiplication with [B.sub.2] and a solution with the matrix A + [B.sub.1]. In practice, the solution of the two linear systems contribute to the major cost of computing an action of [C.sup.-1]. In our applications they correspond to discrete elliptic operators, for which very efficient solution methods are known to exist.

As Proposition 1 shows, the eigenvalues are tightly clustered near the unity. See also .

Proposition 1. Let

[mathematical expression not reproducible]

where we assume that A, of order nxn, is symmetric positive definite, [B.sub.2] = [B.sup.*.sub.1], [B.sub.1] = B, B + [B.sup.*] is positive semidefinite and A + B is nonsingular. Then the eigenvalues of [C.sup.-1] A are real and satisfy

1/2 [less than or equal to] 1/1 + [alpha]] [less than or equal to] [lambda] [less than or equal to] 1,

where [alpha] = max (Re([mu])/ [absolute value of mu]}, and [mu] is an eigenvalue of Bz = [mu] Az, z [not equal to] 0. The dimension of the eigenvalue [lambda] =1 is n + [n.sub.0], where [n.sub.0] is the dimension of the nontrivial nullspace of [B.sub.1] + [B.sub.2]. It follows that

[mathematical expression not reproducible] (4.4)

Proof. From

[mathematical expression not reproducible]

it follows

[mathematical expression not reproducible].

Since [B.sub.1] + [B.sub.2] is positive semidefinite, it follows that [lambda] [less than or equal to] 1. Further, for [lambda] [not equal to] 1, it holds [B.sub.1]x = Ay. Hence

(1 = [lambda]) (A + [B.sub.1] + [B.sub.2] + [B.sub.2] [A.sup.-1] [B.sub.1]) x = ([B.sub.1] + [B.sub.2])x

or, since A is spd,

[mathematical expression not reproducible],

where [mathematical expression not reproducible]. Therefore (1 -[lambda])(1 + 2ne([mu]) + [absolute value of [mu]].sup.2]) = 2Re([mu]) or

[mathematical expression not reproducible]

The estimate (4.4) follows directly from the latter relation.

Proposition 1 shows that the relative size, Re([mu])/[absolute value of [mu]], of the real part of the spectrum of [??] = [A.sup.-1/2] [BA.sup.-1/2] determines the lower eigenvalue bound of [C.sup.-1] A and, hence, the rate of convergence of the preconditioned iterative solution method. For a small such relative part the convergence of the iterative solution method will be exceptionally rapid. Such small parts can occur for harmonic problems with a large value of the frequency.

It is further readily seen that the eigenvalue problem has a full eigenvector space, i.e. the preconditioned matrix [C.sup.-1] A is diagonalizable by a similarity transformation formulation, see e.g. .

5 A multilevel-mesh solution method in the context of nonlinear evolution equations

For a nonlinear problem, such as [partial derivative]u/[partial derivative]t + L(u)u = f, where L(a)u = - [nabla] * (a(u)) [nabla] u + w(u) x [nabla]u = f the arising block matrix system is nonlinear and one must use a nonlinear solver.

In a similar way as holds for the classical Newton method, to ensure fast convergence of the nonlinear solver, we need a high quality initial guess see e.g. . For this purpose we can use (nested) discretization meshes, [T.sub.0] [subset] [T.sub.1] [subset] ... [subset] [T.sub.m], obtained by regular refinements of some given coarse mesh ([T.sub.0]). A scheme to obtain a good starting vector, used e.g., in  in the framework of PDE-constrained optimal control problems, is depicted in Algorithm 1.

Thus, in order to solve the nonlinear system on level m we solve it also on all coarser meshes. This, of course, makes the solution procedure computationally demanding.

In [9, 12] the following idea is shown to be very fruitful: solve (accurately) the nonlinear problem on some coarse mesh, prolong the so-obtained solution to the finest ([T.sub.m]) and solve the linearized problem only once, gaining under certain conditions the order of the accuracy of the discretization of the finest level. We note that this idea is not always directly applicable for optimal control problems. For instance, in the framework of the semi-smooth Newton method applied for problems with a box-constrained solution, the stopping criterion is not related to a certain tolerance but to a set of point identifiers where the box-constraint values are taken exactly and that stop changing (). In this sense we cannot control the accuracy of the solution on a mesh by a tolerance and continue the nonlinear iterations until convergence is achieved. The two-level framework for the semi-smooth Newton method can then be applied as follows. Instead of using all levels in the mesh hierarchy, we choose one (or more, not necessarily consecutive coarser meshes), solve the nonlinear problem there, prolong to [T.sub.m] and perform only one Newton step there, thus, solve the linear problem only once. In , the effect of using different combinations of meshes is nicely illustrated.
```Algorithm 1 Constructing an initial guess for the
nonlinear solver using the full mesh hierarchy

Initialize a, b and the solution at [T.sub.0]
For all i = 0,1, ..., m
Solve the nonlinear system on level i
If i < m, prolong the current solution to level i + 1
EndFor
```

6 Computational and communication complexity of the proposed approach

The suggested method, based on periodic function extension and truncated Fourier expansions, enables much work in parallel. To compute the vectors [u.sub.k] and [v.sub.k] for a linear problem as in (3.4), we are able to solve N systems (3.4) for each frequency [w.sub.k], independently. These can be assigned to N subsystems of the hardware resource (consisting of a number of cores or nodes, or accelerators), with size large enough to solve the systems efficiently. We refer to this as a first level of parallelism. These solutions do not require any communication. Communication occurs only when the individual components per frequency must be summed up.

The solution of the system in (3.4) is the second level of parallelism. The framework is very similar to solving a single discrete PDE problem and the approaches for parallelizing the computations are well studied. The system is assumed to be solved by an iterative solution method that accommodates variable preconditioning, such as FGMRES (). Each iteration requires one matrix-vector multiplication, a few vector updates and scalar products, and a solution of a system with the preconditioner C. In turn, the latter requires two solutions with the matrices

[mathematical expression not reproducible] (6.1)

plus one multiplication with [mathematical expression not reproducible]

Using our preconditioner, the solution with F and F* boils down to solutions with the matrices

[mathematical expression not reproducible] (6.2)

and a multiplication with [w.sup.k] [square root of [beta]]M. For the solution with Q and Q* we can use the Conjugate Gradient method preconditioned by some Algebraic Multigrid (AMG) method, see e.g. [20, 27] or by some other well-parallelizable preconditioner such as special AMG versions [10, 11, 13] or such as the gaining popularity Monte Carlo-based approximate inverse preconditioning methods (cf. e.g. ).

Assuming that the partitioning of the discretization mesh is done using an appropriate software tool, such as ParMetis (), we expect a well-balanced distribution of the computational work, good data locality, no communication bottlenecks and good scalability and efficiency of the proposed solution method.

The preconditioner C in (4.2) has been used in the context of multi-phase fluid simulations and parallel performance results are available in .

xxxxxxxxxxxxxxxxxxxxxxxxx

To sum up, we propose a computational procedure that is suitable both for large homogeneous or heterogeneous computer platforms, with two-level parallelism. One level of (coarse grain) parallelism is enabled through the fully decoupled computation of the Fourier coefficients of the state and the control, that can be assigned to relatively large computing entities (nodes, groups of nodes, GPUs, etc., depending on the number of the degrees of freedom in the fine discretization mesh). The second level of parallelism is imposed via the solution of the algebraic systems for each individual frequency. To solve those, we advocate a method, that allows for using ready toolboxes from already available highly optimized parallel libraries, such as HYPRE (), Trilinos (), MFEM (), PETSc (), to name some. The potential of the approach is similar to that of the method to solve fractional diffusion problems reported in [35,36], where also some parallel tests are included.

7 Illustrating test problems

For the purpose of illustrating the approximation of the time-dependent solution and the robustness of the method with respect to mesh and regularization parameters we consider first the heat problem. Then, to further illustrate the robustness of the preconditioning method with respect to a larger set of parameters--meshsize, regularization, angular frequency and other problem parameters, we shortly comment also on an eddy-current time-harmonic problem.

All numerical experiments, reported here, are performed in Matlab on a laptop Lenovo ThinkPad T500 2055 Core 2 Duo T9400 / 2.53 GHz Centrino 2 with 8GB memory.

7.1 The heat equation as constraint

Consider the transient heat equation as a state constraint in the following optimal control problem.

Problem 1.

[mathematical expression not reproducible], (7.1)

where [u.sub.d] is the desired solution. The form of [u.sub.d] is taken as Example 3 in , namely,

[mathematical expression not reproducible].

In (7.1), n = [[0, 1].sup.2] and k(x) is the thermal conductivity. The function x is the characteristic function, being equal to one when the argument is in the given interval and zero elsewhere. In alignment with the assumptions, we infer that the solution u, the desired solution [u.sub.d] and the control v are of the form (2.5) and that [u.sub.d] is extended to the interval [0, 2] by use of symmetric mirroring as

[mathematical expression not reproducible].

The space discretization is done using bilinear finite element basis functions on a square discretization mesh.

In the experiments we use only up to five terms in the extension, i.e., N = 5. The solution procedure is as follows. The outer solution method and the solver for the blocks F and [F.sup.*] (cf. (6.1)), appearing in the preconditioner C (cf. (4.1)) is FGMRES. The very inner solver for the blocks Q and [Q.sup.*] (cf. (6.2)) is the Conjugate Gradient method, preconditioned by AGMG (cf. ). The (relative) stopping criteria are [10.sup.-6], [10.sup.-3] and [10.sup.-3], correspondingly. The results for various h, N and [beta] are presented in Table 1. As in , w is fixed as 2[pi]. The iterations are shown in the form [IT.sub.outer] ([IT.sup.(av).sub.inner]/[IT.sup.(av).sub.agmg]), where [IT.sub.outer] is the number of outer FGMRES iterations, [IT.sup.(av).sub.inner] is the average number of inner FGMRES iterations and [IT.sup.(av).sub.agmg] is the average number of the AGMG-preconditioned conjugate gradient. Even though the implementation is in Matlab, we include as a reference the total execution time (in seconds).

As shown in Table 1, for each individual frequency [w.sub.k], k = 0,1,..., 5, and for all chosen values of h and [beta], the number of outer iterations varies between 6 and 8. The number of inner iterations varies between 1 and 4 and the innermost iterations using the AGMG-preconditioned conjugate gradient vary also within a very short interval, between 2 and 7. This means that the method shows a very robust behaviour. It is worth noting that even though we use Matlab, the total computing time increases with an almost constant factor between 4.5 and 6.8, when [h.sup.-1] is doubled, which is relatively close to the optimal value 4.

For comparison, in Table 2 we include the performance of the block-diagonally preconditioned MINRES, as described in . The stopping criteria are 10-6 for the outer MINRES and 10-3 for the blocks, solved by AGMG.

Table 2 confirms the robust behaviour of MINRES, documented in . We see, however, that for small values of 0, which must be used to obtain a solution sufficiently close to the desired target, our method outperforms MINRES.

To test the accuracy of the solution the resulting approximate solution is compared with the solution of an implicit time-stepping method, the midpoint trapezoidal (or the implicit midpoint) method, briefly recalled for completeness. Consider the ordinary differential equation x'(t) = f (t, x(t)) with initial condition x(0) = [x.sub.0]. The midpoint trapezoidal methods reads as follows:

[mathematical expression not reproducible]

where [[DELTA].sub.t] is the timestep. The method is implicit, second order accurate and unconditionally stable for linear problems, as the one in hand. Consider the equation in (2.1). After space discretization and applying the above time-stepping scheme, the problem becomes as follows:

[mathematical expression not reproducible].

Here, m = T/[[DELTA].sub.t] and [v.sup.opt] is the optimal control, obtained from the optimization problem, evaluated at the required time. For the solution of systems with M + [1/2] [[DELTA].sub.t] L we use the AGMG-preconditioned conjugate gradient method with a relative stopping criterion [10.sup.-3]. We compare the solutions at the final time T. The results are shown in Table 3.

The difference in the solution obtained within the optimal control and the solution obtained via the time stepping scheme is illustrated in Table 4. The difference is computed [[parallel][u.sub.opt] - [u.sub.k][parallel].sub.2]/[[parallel][u.sub.opt][parallel].sup.2] with [u.sub.opt] being the state computed by the optimal control method and [u.sub.k]--computed by the time-stepping method.

Figure 2 shows the obtained optimal control and state vectors at the final time T(= 1), as well as the form of the desired state, approximated by a truncated Fourier expansion with N = 5. Figure 3 visualizes the solution from the time-stepping procedure for different timesteps.

7.2 Eddy current electromagnetic problem with Maxwell's equations as constraint

For completeness, we refer shortly also to a second test problem involving several regularization and problem parameters. The aim here is to demonstrate the applicability of our approach to more difficult problems.

Consider then the linear eddy current case of Maxwell's equations in a bounded domain [OMEGA] with Lipschitz boundary [GAMMA]. The problem reads: find a time-dependent magnetic vector potential y such that

[mathematical expression not reproducible] (7.2)

where [sigma], v and j denote the electrical conductivity, magnetic reluctivity, and the external current density, respectively, and where n is the outward normal vector to [partial derivative][OMEGA]. Note that the conductivity can be zero, such as in air.

Due to the discontinuity of a and to obtain uniqueness in the nonconducting regions, unless the solution is divergence free and a classical inf - sup stability relation holds, the state equation must be regularized. We do this here by adding a positive term [epsilon]y, [epsilon] > 0 to the state equation. The regularized problem takes then the form,

[mathematical expression not reproducible]

In the time-harmonic regime with angular frequency [mathematical expression not reproducible], it leads to finding the complex-valued amplitude [??] satisfying

[mathematical expression not reproducible].

The problem is formulated in sense of distributions: find [??] [member of] [H.sub.0](curl, [OMEGA])

[mathematical expression not reproducible]

for all complex-valued test functions v [member of] [H.sub.0](curl; [OMEGA]), where

[H.sub.0](curl; [OMEGA]) := {v [member of] [L.sup.2][([OMEGA]).sup.3] : curl v [member of] [L.sup.2][([OMEGA]).sup.3], v x n = 0 on [GAMMA]}.

Assuming [??] [member of] [L.sup.2][(OMEGA]).sup.3], the linear form is bounded. Assuming further that [sigma], v [member of] [L.sup.[infinity]([OMEGA]), [sigma](x) [greater than or equal to] 0, and v(x) [greater than or equal to] [v.sub.0] > 0 the bilinear form is bounded and elliptic, therefore, the problem is uniquely solvable and the solution depends continuously on the data.

Using conforming finite element approximations requires continuity of the traces. For this purpose we use Nedelec finite elements, see [25,26]. We arrive at the linear system

(i[omega]M + K)z = b, (7.3)

where [mathematical expression not reproducible] with [[phi].sub.i] (x), i, j = 1, ..., n, being the lowest order Nedelec basis functions. To avoid complex arithmetics we can rewrite (7.3) in real-valued block matrix form

[mathematical expression not reproducible]

where z = x + iy and b = [xi] + i[eta]. We recognize the same type of block matrix that arise in the optimal control problem, constrained by a time-harmonic equation.

For the time-harmonic problem the aim is to compute a periodic steady state solution (y, v) that satisfies (7.2) but not necessarily the initial condition y = [y.sub.0]. Including instead the periodicity condition, y(0) = y(2T), the state equation takes the form

[mathematical expression not reproducible]

We denote the adjoint variable with w. Similarly, the condition w(2T) = 0 is replaced by the periodicity condition, w(0) = w(2T). We consider then a time-harmonic desired state

[y.sub.d](x, t) = [y.sup.c.sub.d](x) cos(wt) + [y.sup.s.sub.d](x) sin(wt).

Due to the linearity of the problem, the state y, the Lagrange multiplier, i.e. co-state w and the control u are time-harmonic as well with the same frequency. Considering the case where the control and observation regions are nonzero only on a subspace. As shown in  this leads to a four-by-four block matrix system with two-by-two block of the same from as before, where both are preconditioned by use of the PRESB method. The arising system is solved by a coupled inner-outer iteration method. Even though one now gets coupled inner-outer iterations, which multiply up, this is a viable approach since, as follows from Proposition 1, the arising condition numbers for both the outer and inner iterations are bounded by a not large number 1 + [alpha], 0 < [alpha] [less than or equal to] 1, for different values of [alpha], as in Proposition 1. Hence, there will be few iterations. Furthermore, as is seen by numerical tests, in practice it suffices to solve the inner systems to a fairly rough relative accuracy, say [10.sup.-2] to get the smallest or nearly the smallest number of outer iterations, so, there will be very few iterations. As before, the major parallelism comes from the parallel solution of the problem for different frequencies in addition to the parallel implementation possible for the inner solution method used.

The robustness of the method has been reported in  for different values of the reluctivity and two meshsizes. In particular, the method is very robust with respective to the frequency [omega].

https://doi.org/ 10.3846/mma.2018.018

Acknowledgements

The work of the first author is supported by the projects LD15105 Ultrascale computing in geosciences and LQ1602 IT4Innovations excellence in science supported by the Ministry of Education, Youth and Sports of the Czech Republic. The work of the third author is funded by the China Scholarship Council (File No. 201606180086) and by the National Natural Science Foundation of China (Grant No. 11771193). His work is performed during his visit at Uppsala University, Sweden.

References

 R. Alexander. Diagonally implicit Runge-Kutta methods for stiff ODEs. SIAM Journal on Numerical Analysis, 14(6):1006-1021, 1977. https://doi.org/10.1137/0714068.

 V. Alexandrov, O. Esquivel-Flores, S. Ivanovska and A. Karaivanova. On the preconditioned quasi-Monte Carlo algorithm for matrix computations. In I. Lirkov, S. D. Margenov and J. Wasniewski(Eds.), International Conference on LargeScale Scientific Computing, volume 9374, pp. 163-171, Cham, 2015. Springer. https://doi.org/10.1007/978-3-319-26520-9_17.

 O. Axelsson. On mesh independence and Newton type methods. Applications of Mathematics, 38(4):249-265, 1993.

 O. Axelsson. Iterative Solution Methods. Cambridge University Press, Cambridge, 1994. https://doi.org/10.1017/CBO9780511624100.

 O. Axelsson and V.A. Barker. Finite Element Solution of Boundary Value Problems: Theory and Computation. Society for Industrial and Applied Mathematics, 2001. https://doi.org/10.1137/1.9780898719253.

 O. Axelsson, S. Farouq and M. Neytcheva. Comparison of preconditioned Krylov subspace iteration methods for PDE-constrained optimization problems. Poisson and convection-diffusion control. Numerical Algorithms, 73(3):631-663, 2016. https://doi.org/10.1007/s11075-016-0111-1.

 O. Axelsson, S. Farouq and M. Neytcheva. Comparison of preconditioned Krylov subspace iteration methods for PDE-constrained optimization problems. Stokes control. Numerical Algorithms, 74(1):19-37, 2017. https://doi.org/10.1007/s11075-016-0136-5.

 O. Axelsson, S. Farouq and M. Neytcheva. A preconditioner for optimal control problems, constrained by Stokes equation with a time-harmonic control. Journal of Computational and Applied Mathematics, 310(C):5-18, 2017. https://doi.org/10.1016/jxam.2016.05.029.

 O. Axelsson and W. Layton. A two-level method for the discretization of nonlinear boundary value problems. SIAM Journal on Numerical Analysis, 33(6):2359-2374, 1996. https://doi.org/10.1137/S0036142993247104.

 O. Axelsson and D. Lukas. Preconditioning methods for eddy current optimally controlled time-harmonic electromagnetic problems. Journal of Numerical Mathematics, 2018. https://doi.org/10.1515/jnma-2017-0064.

 O. Axelsson and M. Neytcheva. Algebraic multilevel iteration method for Stieltjes matrices. Numerical Linear Algebra with Applications, 1(3):213-236, 1994. https://doi.org/10.1002/nla.1680010302.

 O. Axelsson, M. Neytcheva and A. Strom. An efficient preconditioning method for state box-constrained optimal control problems. Technical report, Uppsala University, Department of Information Technology, 2017. Available from Internet: http://www.it.uu.se/research/publications/reports/2017-004/.

 O. Axelsson and P.S. Vassilevski. Algebraic multilevel preconditioning methods. I. Numerische Mathematik, 56(2-3):157-177, 1989. https://doi.org/10.1007/BF01409783.

 S. Balay, S. Abhyankar, M. F. Adams, J. Brown, P. BMFEMe, K. Buschelman, L. Dalcin, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, D. A. May, L. Curfman McInnes, K. Rupp, B. F. Smith, S. Zampini, H. Zhang and H. Zhang. PETSc Web page, 2017. Available from Internet: http://www.mcs.anl.gov/ petsc.

 T. V. Kolev et al. MFEM: Modular finite element methods. Available from Internet: http://mfem.org/.

 M. Heroux, R. Bartlett, V. Howle, R Hoekstra, J. Hu, T. Kolda, R.Lehoucq, K. Long, R. Pawlowski, E. Phipps, A. Salinger, H. Thornquist, R. Tuminaro, J. Willenbring and A. Williams. An overview of Trilinos. Technical Report SAND2003-2927, Sandia National Laboratories, 2003.

 R. Herzog and E.W. Sachs. Preconditioned conjugate gradient method for optimal control problems with control and state constraints. SIAM Journal on Matrix Analysis and Applications, 31(5):2291-2317, 2010. https://doi.org/10.1137/090779127.

 M. Hintermuller and M. Hinze. Moreau-Yosida regularization in state constrained elliptic control problems: error estimates and parameter adjustment. SIAM Journal on Numerical Analysis, 47(3):1666-1683, 2009. https://doi.org/10.1137/080718735.

 G. Karypis and V. Kumar. ParMETIS-parallel graph partitioning and fill-reducing matrix ordering, 2013. Available from Internet: http://glaros.dtc. umn.edu/gkhome/metis/parmetis/overview.

 T. V. Kolev and P.S. Vassilevski. Parallel auxiliary space AMG solver for H(curl) problems. Journal of Computational Mathematics, 27(5):604-623, 2009. https://doi.org/10.4208/jcm.2009.27.5.013.

 M. Kollmann, M. Kolmbauer, U. Langer, M. Wolfmayr and W. Zulehner. A robust finite element solver for a multiharmonic parabolic optimal control problem. Computers & Mathematics with Applications, 65(3):469-486, 2013. https://doi.org/10.1016/jxamwa.2012.06.012.

 M. Kolmbauer and U. Langer. A robust preconditioned MINRES solver for distributed time-periodic eddy current optimal control problems. SIAM Journal on Scientific Computing., 34(6):B785-B809, 2012. https://doi.org/10.1137/110842533.

 U. Langer and M. Wolfmayr. Multiharmonic finite element analysis of a time-periodic parabolic optimal control problem. Journal of Numerical Mathematics, 21(4):265-300, 2013. https://doi.org/10.1515/jnum-2013-0011.

 Lawrence Livermore National Laboratory. hypre: High Performance Preconditioners. Available from Internet: http://www.llnl.gov/CASC/hypre.

 J.C. Nedelec. Mixed finite elements in R3. Numerische Mathematik, 35(3):315-341, 1980. https://doi.org/10.1007/BF01396415.

 J.C. Nedelec. A new family of mixed finite elements in R3. Numerische Mathematik, 50(1):57-81, 1986. https://doi.org/10.1007/BF01389668.

 Y. Notay. An aggregation-based algebraic multigrid method. Electronic Transactions on Numerical Analysis, 37:123-146, 2010.

 C.C. Paige and M.A. Saunders. Solution of sparse indefinite systems of linear equations. SIAM Journal on Numerical Analysis, 12(4):617-629, 1975. https://doi.org/10.1137/0712047.

 J.W. Pearson, M. Stoll and A.J. Wathen. Preconditioners for state-constrained optimal control problems with Moreau-Yosida penalty function. Numerical Linear Algebra with Applications, 21(1):81-97, 2014. https://doi.org/10.1002/nla.1863.

 A. Redlund and G. Cheng. Implementation and performance studies of a threephase model solver. Technical report, Uppsala University, Department of Information Technology, 2013. Available from Internet: http://www.it.uu.se/edu/ course/homepage/projektTDB/ht13/project13/.

 Y. Saad. A flexible inner-outer preconditioned GMRES algorithm. SIAM Journal on Scientific Computing, 14(2):461-469, 1993. https://doi.org/10.1137/0914028.

 Y. Saad. Iterative methods for sparse linear systems. Society for Industrial and Applied Mathematics, Philadelphia, PA, 2003. https://doi.org/10.1137/1.9780898718003.

 P.J. van der Houwen and B.P. Sommeijer. Analysis of parallel diagonally implicit iteration of Runge-Kutta methods. Applied Numerical Mathematics, 11(13):169-188, 1993. https://doi.org/10.1016/0168-9274(93)90047-U.

 P.S. Vassilevski. Multilevel Block Factorization Preconditioners. Springer-Verlag, New York, 2008.

 R. Ciegis, V. Starikovicius and S. Margenov. On parallel numerical algorithms for fractional diffusion problems. In J. Carretero, J. Garsia Blas and S. Margenov(Eds.), Proceedings of the Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016). Universidad Carlos III de Madrid, 12 2016.

 R. (Ciegis, V. Starikovicius, S. Margenov and R.Kriauziene. Parallel solvers for fractional power diffusion problems. Concurrency and Computation: Practice and Experience, 29(24), 2017. https://doi.org/10.1002/cpe.4216.

 J.-C. Xu. Two-grid discretization techniques for linear and nonlinear PDEs. SIAM Journal on Numerical Analysis., 33(5):1759-1777, 1996. https://doi.org/10.1137/S0036142992232949.

Owe Axelsson (a, b), Maya Neytcheva (b) and Zhao-Zheng Liang (c, b)

(a) Institute of Geonics, Czech Academy of Sciences Ostrava, Czech Republic

(b) Department of Information Technology, Uppsala University Uppsala, Sweden

(c) School of Mathematics and Statistics, Lanzhou University Lanzhou, China

E-mail: owe.axelsson@it.uu.se

E-mail(corresp.): maya.neytcheva@it.uu.se

E-mail: liangzhzh2014@lzu.edu.cn

Received September 9, 2017; revised March 4, 2018; accepted March 6, 2018

Caption: Figure 1. A symmetrically extended function

Caption: Figure 2. Problem 1: Surface plots of the desired state [u.sub.d], the computed state u and the control v on a 64 X 64 grid and [beta] = [10.sup.-6]

Caption: Figure 3. Problem 1: Surface plots of the state obtained by different timesteps on a 64 X 64 grid and [beta] = [10.sup.-6]
```Table 1. Problem 1: Performance results for FGMRES, A
preconditioned by C in (4.1)

[beta] = [10.sup.-2]  [beta] = [10.sup.-4]
h        k    IT         CPU        IT         CPU

0    7(1/3)     0.60       8(1/3)     0.67
1    7(3/7)     2.38       8(2/6)     1.74
1/128    2    7(4/6)     2.92       8(3/6)     2.52
3    6(4/6)     2.60       8(3/6)     2.47
4    6(4/6)     2.67       8(3/6)     2.44
5    6(4/6)     2.62       8(4/6)     3.06

0    7(1/4)     3.16       9(1/3)     3.83
1    7(3/7)     11.62      8(2/6)     8.72
1/256    2    7(4/7)     14.01      8(3/6)     12.13
3    6(4/7)     12.99      8(3/6)     12.05
4    6(4/7)     13.68      8(3/6)     11.99
5    6(4/7)     13.76      8(4/6)     14.52

0    7(1/4)     22.52      8(1/4)     18.81
1    7(3/7)     66.40      8(2/7)     45.75
1/512    2    7(4/7)     71.25      8(3/7)     63.22
3    6(4/7)     65.58      8(3/7)     63.87
4    6(4/7)     66.63      8(3/7)     67.86
5    6(4/7)     65.91      8(4/7)     79.63

[beta] = [10.sup.-6]  [beta] = [10.sup.-8]
h        k    IT         CPU         IT         CPU

0    8(1/2)     0.63       8(1/2)     0.47
1    8(2/5)     1.53       8(1/4)     0.62
1/128    2    8(2/5)     1.50       8(1/4)     0.63
3    8(2/5)     1.52       8(1/4)     0.69
4    8(2/5)     1.51       8(1/4)     0.77
5    8(2/5)     1.56       8(2/4)     0.81

0    8(1/3)     3.03       8(1/2)     2.64
1    8(2/5)     7.73       8(1/4)     3.93
1/256    2    8(2/5)     7.66       8(1/4)     4.50
3    8(2/5)     7.70       8(1/4)     4.63
4    8(2/5)     7.67       8(1/4)     4.89
5    8(2/5)     7.75       8(2/4)     5.53

0    8(1/3)     17.72      8(1/3)     16.29
1    8(2/6)     41.77      8(1/5)     23.15
1/512    2    8(2/6)     40.81      8(1/5)     24.59
3    8(2/6)     40.54      8(1/5)     26.15
4    8(2/6)     41.56      8(1/5)     30.06
5    8(2/6)     41.18      8(2/5)     33.74

Table 2. Problem 1: Performance of the block-diagonally
preconditioned MINRES, equations (33) and (34) in 

[beta] = [10.sup.-2]  [beta] = [10.sup.-4]
h        k    IT         CPU        IT         CPU

0    14(3)      0.83       20(3)      1.04
1    16(6)      1.51       22(5)      1.83
1/128    2    18(6)      1.68       22(5)      1.79
3    18(6)      1.65       22(5)      1.79
4    20(6)      1.82       24(5)      1.95
5    20(5)      1.76       24(5)      1.97

0    14(3)      3.93       20(3)      5.21
1    16(6)      7.46       22(6)      9.29
1/256    2    18(6)      8.27       24(6)      9.98
3    20(6)      9.05       24(6)      10.17
4    20(6)      8.96       24(6)      10.18
5    20(6)      9.07       25(6)      10.56

0    14(3)      24.90      21(3)      35.52
1    16(7)      39.67      22(6)      48.09
1/512    2    18(6)      41.76      24(6)      54.47
3    20(6)      42.83      24(6)      49.37
4    20(6)      44.64      24(6)      60.88
5    20(6)      43.25      26(6)      63.45

h        k    [beta] = [10.sup.-6]  [beta] = [10.sup.-8]
IT         CPU        IT         CPU
0
1    24(2)      1.14       25(2)      0.78
1/128    2    22(4)      1.53       24(4)      0.99
3    24(4)      1.75       24(4)      0.95
4    24(4)      1.66       24(4)      0.95
5    24(4)      1.68       24(4)      0.96
24(4)      1.66       24(4)      0.97
0
1    24(2)      5.68       26(2)      4.88
1/256    2    24(5)      8.77       26(4)      7.14
3    25(5)      9.18       26(4)      7.19
4    24(5)      8.82       26(4)      7.29
5    24(5)      8.96       26(4)      7.25
24(5)      9.25       26(4)      7.17
0
1    24(3)      26.88      26(2)      36.89
1/512    2    25(5)      42.68      26(4)      37.73
3    25(5)      49.75      26(5)      43.82
4    25(5)      43.17      26(5)      54.94
5    24(5)      43.05      26(5)      55.67
24(5)      40.96      26(5)      42.47

Table 3. Problem 1: Performance of the midpoint trapezoidal
method (N = 5)

[beta] = [10.sup.-2]  [beta] = [10.sup.-4]
h        [[DELTA].sub.t]   IT         CPU        IT         CPU

0.1            6         0.41        6         0.41
1/128         0.05            6         0.68        6         0.69
0.025           6         1.18        6         1.19

0.1            7         1.42        7         1.41
1/256         0.05            7         2.74        7         2.70
0.025           6         4.70        6         4.71

0.1            7         8.28        7         7.01
1/512         0.05            7        15.57        7        13.91
0.025           7        29.64        7        27.62

[beta] = [10.sup.-6]  [beta] = [10.sup.-8]
h        [[DELTA].sub.t]   IT         CPU        IT         CPU

0.1            6         0.41        5         0.38
1/128         0.05            6         0.66        5         0.65
0.025           6         1.23        5         1.08

0.1            7         1.39        6         1.30
1/256         0.05            6         2.45        6         2.44
0.025           7         5.23        6         4.70

0.1            7         8.56        6         6.60
1/512         0.05            7        17.35        6        14.17
0.025           7        30.32        6        31.56

Table 4. Problem 1: Difference between the optimal control solution
[u.sub.opt] and the solution from the time stepping method [u.sub.k]
for [t.sub.k] = T

[beta] = [10.sup.-2]  [beta] = [10.sup.-4]
h      [[DELTA].sub.t]    N = 5      N =10      N=5        N =10

0.1           0.45       0.50       0.88       1.12
1/128       0.05           0.12       0.15       0.24       0.38
0.025          0.03       0.04       0.06       0.10

[beta] = [10.sup.-6]  [beta] = [10.sup.-8]
h      [[DELTA].sub.t]    N = 5      N =10      N=5        N =10

0.1           0.90       1.17       0.85       1.11
1/128      0.05           0.26       0.45       0.25       0.45
0.025          0.07       0.12       0.07       0.12
```