Printer Friendly

Efficient preconditioners for PDE-constrained optimization problem with a multilevel sequentially semiseparable matrix structure.

1. Introduction. PDE-constrained optimization problems have a wide application such as optimal flow control [6, 7], diffuse optical tomography [1], and linear (nonlinear) model predictive control [5]. The solution of these problems is obtained by solving a large-scale linear system of saddle-point type. Much effort has been dedicated to finding efficient iterative solution methods for such systems. Some of the most popular techniques are the conjugate gradient (CG) [21], minimal residual (MINRES) [27], generalized minimal residual (GMRES) and induced dimension reduction (IDR(s)) [38] methods. Their performance highly depends on the choice of preconditioners. In this paper, we study a class of preconditioners that exploit the multilevel sequentially semiseparable (MSSS) structure of the blocks of the saddle-point system.

Semiseparable matrices appear in many applications, e.g., integral equations [23], GaussMarkov processes [25], boundary value problems [24], rational interpolation [39] and Kalman filtering [29]. Semiseparable matrices are matrices of which all the sub-matrices taken from the lower-triangular or the upper-triangular part are of rank at most 1, as defined in [42]. Sequentially semiseparable (SSS) matrices of which the off-diagonal blocks are of low-rank, not limited to 1, introduced by Dewilde et al. in [12] generalize the semiseparable matrices. Multilevel sequentially semiseparable matrices generalize the sequentially semiseparable matrices to the multi-dimensional cases. Systems that arise from the discretization of 1D partial differential equations typically have an SSS structure. Discretization of higher dimensional (2D or 3D) partial differential equations gives rise to matrices that have an MSSS structure [15, 22]. Under the multilevel paradigm, generators that are used to represent a matrix of a higher hierarchy are themselves multilevel sequentially semiseparable of a lower hierarchy. The usual one-level sequentially semiseparable matrix is the one of the lowest hierarchy. Operations like matrix inversion and matrix-matrix multiplication are closed under this structure. The LU factorization can also be performed in a structure preserving way. This factorization results in a growth of the rank of the off-diagonal blocks. Consequently, the LU factorization is not of linear computational complexity. Model order reduction can be used to reduce the rank of the off-diagonal blocks, which yields an inexact LU decomposition of an MSSS matrix that can be used as a preconditioner.

In [22], Gondzio et al. first introduced the MSSS matrix computations for preconditioning of PDE-constrained optimization problems. They exploited the MSSS matrix structure of the blocks of the saddle-point system and performed an LU factorization for MSSS matrices to approximate the Schur complement of the saddle-point system. With this approximated Schur complement as a preconditioner, conjugate gradient iterations were performed to solve the saddle-point system block-by-block. As aforementioned, the model order reduction plays a vital role in obtaining a linear computational complexity of the LU factorization for MSSS matrices. In [22], Gondzio et al. used a standard model order reduction algorithm [12, 19] to reduce the computational complexity.

This paper extends [22] in the following ways: 1) We propose a new model order reduction algorithm for SSS matrix computations based on the correspondence between linear timevarying (LTV) systems and blocks of SSS matrices. This new model order reduction algorithm is motivated by the work in [9, 11]. In [9], the approximate balanced truncation was addressed for the model order reduction of linear time invariant (LTI) systems, while in [11] the recursive low-rank approximation was performed to compute the approximation of the Gramians of LTV systems. In this paper, we use the low-rank approximation method in [11] and the approximate balanced truncation in [9] for the model order reduction for the SSS matrices. Compared with the model order reduction algorithms discussed in [12, 19], the approximate balanced truncation method for SSS matrices in this paper is computationally cheaper. 2) With these model order reduction algorithms, we can compute an inexact LU factorization for the MSSS matrix blocks of the saddle-point system in linear computational complexity (O(N)). This yields a block preconditioner for the saddle-point systems. Exploiting the block structure of the saddle-point system is a standard preconditioning technique, which is described in [4]. However, only the single preconditioner for the last block of the saddle-point system is studied in [22]. 3) By permuting the blocks of the saddle-point system, we can also compute an inexact LU factorization of the global system with MSSS matrix computations in linear computational complexity. This gives a global MSSS preconditioner and this novel MSSS preconditioner gives mesh size and regularization parameter independent convergence. This is a big advantage over the block MSSS preconditioner. 4) Besides the problem of optimal control of the Poisson equation, we also study the problem of optimal control of the convection-diffusion equation. 5) Moreover, we extend these preconditioning techniques to 3D saddle-point systems.

Note that the convergence of using block preconditioners depends on the regularization parameter [beta] for the PDE-constrained optimization problems [32]. For small [beta], block pre-conditioners do not give satisfactory performance. Since all the blocks of the saddle-point matrix are MSSS matrices, we can permute the saddle-point matrix into a single MSSS matrix. Then we can compute an approximate LU factorization for the permuted saddle-point system using MSSS matrix computations in linear computational complexity. We call this approximate factorization for the permuted global matrix the global MSSS preconditioner. Block preconditioners often neglect the regularization term [1/2[beta]]M, and as a result give poor convergence for small enough regularization parameter [beta]. Our Global MSSS preconditioner does not neglect the regularization term, which in turn gives [beta] independent convergence as well as mesh size independent convergence. Numerical experiments in this paper demonstrate such performance.

The outline of this manuscript is as follows: Section 2 formulates a distributed optimal control problem constrained by PDEs. This problem yields a linear saddle-point system. In Section 3, we give some definitions and algorithms for MSSS matrices to introduce the MSSS preconditioning technique. The new model order reduction algorithm for SSS matrices is also described. With MSSS matrix computations, we propose two types of preconditioners for saddle-point problem: the global MSSS preconditioner, and the block-diagonal MSSS preconditioner. In Section 4, we use the distributed optimal control of the convection-diffusion equation to illustrate the performance of these two preconditioners and the new model order reduction algorithm. Section 5 presents how to extend such preconditioning techniques to 3D saddle-point problems. Section 6 draws conclusions and describes future work.

A companion technical report [30] is also available on-line and studies a wider class of PDE-constrained optimization problems. It contains more numerical experiments to illustrate the performance of this preconditioning technique. In [31], we study the preconditioning technique proposed in this manuscript applied to computational fluid dynamics (CFD) problems and evaluate their performance on CFD benchmark problems using the Incompressible Flow and Iterative Solver Software (IFISS) [37].

2. Problem formulation.

2.1. PDE-constrained optimization problem. Consider the following PDE-constrained optimization problem

(2.1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [Laplace] is an operator, u the system state, f the system input, [??] the desired state of the system, [OMEGA] the domain, [partial derivative][OMEGA] the corresponding boundary, [beta] the weight of the system input in the cost function or regularization parameter and satisfies [beta] > 0. In this paper, we consider [Laplace] = - [[nabla].sup.2] for optimal control of the Poisson equation and [Laplace] = -[member of] [[nabla].sup.2] + [??] x [nabla] for optimal control of the convection-diffusion equation. Here [??] is a vector in [OMEGA], [nabla] is the gradient operator, and [epsilon] is a positive scalar. If we want to solve such a problem numerically, it is clear that we need to discretize these quantities involved at some point. There are two kinds of approaches. One is to derive the optimality conditions first and then discretize (optimize-then-discretize), and the other is to discretize the cost function and the PDE first and then optimize (discretize-thenoptimize). For the problem of optimal control of the Poisson equation, both approaches lead to the same solution while different answers are reached for the problem of optimal control of the convection-diffusion equation; see [32]. Since our focus is on preconditioning for such problems, the discretize-then-optimize approach is chosen in this paper.

By introducing the weak formulation and discretizing (2.1) using the Galerkin method, the discrete analogue of the minimization problem (2.1) is

(2.2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where K = [[K.sub.i,j]] [member of] [R.sup.NxN] is the stiffness matrix, M = [[M.sub.i,j]] [member of] [R.sup.NxN], [M.sub.ij] = [[integral].sub.[OMEGA]] [[phi].sub.i][[phi].sub.j]d[OMEGA] is the mass matrix and is symmetric positive definite, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] Here [[phi].sub.i] (i = 1, 2, ... N) and [[phi].sub.j] (j = 1, 2, ... N, N +1, ... N + [partial derivative]N) form a basis of [V.sub.0.sup.h] and [V.sup.h.sub.g], respectively. [V.sub.0.sup.h] and [V.sub.0.sup.g] represent the finite dimensional test space and solution space, respectively.

Considering the minimization problem (2.2), we introduce the Lagrangian function

J(u, f, [lambda]) = 1/2 [u.sup.t]Mu - [u.sup.t]b + c + [beta][f.sup.T]Mf + [[lambda].sup.T](Ku - Mf - d),

where [lambda] is the Lagrange multiplier. Then it is well-known that the optimal solution is given by finding u, f and [lambda], such that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

This yields the linear system

(2.3)[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The system (2.3) is of the saddle-point system type [4], i.e., the system matrix A is symmetric and indefinite. It has the following structure

(2.4) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where A [member of] [R.sup.nxn] is symmetric positive definite, B [member of] [R.sup.mxn] has full rank.

The system matrix of the saddle-point system (2.3) is large and sparse. Preconditioned Krylov subspace methods, such as MINRES [27] and IDR(s) [38], are quite efficient for solving such systems.

2.2. Preconditioning of saddle-point systems. The performance of iterative solution methods highly depends on the choice of the preconditioners [33]. For numerical methods to solve saddle-point system (2.3) and the construction of preconditioners, we refer to [4, 26] for an extensive survey. In this paper, we study two types of preconditioners. The first exploits the MSSS structure of the blocks of the saddle-point system, whereas the second type exploits the global MSSS structure of the saddle-point system.

2.2.1. Block preconditioners. Recall from (2.4), if A is nonsingular, then A admits the following [LDL.sup.T] factorization given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where S = -([1/2[beta]] M + K[M.sup.-1] [K.sup.T]) is the Schur complement.

The most difficult part for this factorization is to compute the Schur complement S because computing the inverse of a large sparse matrix is expensive both in time and memory. Meanwhile, solving the system Sx = b is also expensive since S is a large and full matrix. Note that all the blocks of (2.3) have a structure that is called multilevel sequentially semiseparable (MSSS), which will be introduced in a later section. Then the Schur complement S also has the MSSS structure but with a larger semiseparable order. If we exploit the MSSS structure of (2.3), we can compute S in linear computational complexity.

In this paper, we first study the block-diagonal preconditioner [p.sub.1] for the saddle-point system (2.3), where

(2.5) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and where [??] is an approximation of the mass matrix M and [??] is an approximation of the Schur complement S. For [??] and [??] without approximation, i.e., [??] = M and [??] = S, the preconditioned system [P.sup.-1.sub.1] A has three distinct eigenvalues and GMRES computes the solution of the preconditioned system using at most three iterations.

To approximate S = -([1/2[beta]]M + K[M.sup.-1] [K.sup.T]), [??] = -K[M.sup.-1][K.sup.T] can be used for big to middle range of [beta] while [??] = -1/2[beta] M could be chosen for small [beta] [32]. The block lower-triangular preconditioner [P.sub.2], which has the following form

(2.6) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

is studied in the technical report [30] and the performance comparison with the block-diagonal preconditioner [P.sub.1] is also discussed.

2.2.2. Global preconditioners. Since all the blocks of the saddle-point system (2.3) have the MSSS structure, there exists a permutation matrix [PSI] that permutes the saddle-point matrix with MSSS blocks into a single MSSS matrix. This gives

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are permutations of A, [[[f.sup.T] [u.sup.T] [X.sup.T]].sup.T], and [[[0.sup.T] [b.sup.T] [d.sup.T]].sup.T] in (2.3), respectively. This permutation will be introduced in the next section. After this permutation, the system matrix [??] is an MSSS matrix. We can compute an inexact LU factorization of [??] in linear computational complexity using MSSS matrix computations. This gives

(2.7) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

which can be used as a preconditioner. We call the factorization (2.7) the global preconditioner. Since no information of [beta] is neglected during the permutation and factorization, the global preconditioner gives [beta]-independent convergence, while this property for the standard block preconditioners [P.sub.1] in (2.5) or [P.sub.2] in (2.6) does not hold. This is a big advantage of the global preconditioner over the standard block preconditioners, which is illustrated by numerical experiments in Section 4.

3. Preconditioning using multilevel sequentially semiseparable matrix computations.

Matrices in this paper will always be real and their dimensions are compatible for the matrixmatrix operations and the matrix-vector operations when their sizes are not mentioned.

3.1. Multilevel sequentially semiseparable matrices. The generator representation of the sequentially semiseparable matrices are defined by Definition 3.1.

DEFINITION 3.1 ([13]). Let A be an N x N matrix with SSS matrix structure and let n positive integers [m.sub.1], [m.sub.2], ... [m.sub.n] satisfy N = [m.sub.1] + [m.sub.2] + ... + [m.sub.n], such that A can be written in the following block-partitioned form

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where the superscript 'T' denotes the transpose of a matrix.

The sequences [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are matrices whose sizes are listed in Table 3.1 and they are called generators of the SSS matrix A. With the generator representation defined in Definition 3.1, A can be denoted by

A = SSS ([P.sub.S], [R.sub.S], [Q.sub.S], [D.sub.S], [U.sub.S], [W.sub.S], [V.sub.S]).

REMARK 3.2. The generators of an SSS matrix are not unique. There exist a series of nonsingular transformations between two different sets of generators for the same SSS matrix A.

REMARK 3.3. For an SSS matrix, only its generators are stored. If [l.sub.i] and [k.sub.i] are bounded by a small constant, then the memory consumption for storing such a matrix is linear with respect to the matrix size. This property is also introduced in [13].

Take n = 4, for example. The SSS matrix A is given by

(3.1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

With the generator representation of SSS matrices, basic operations such as addition, multiplication and inversion are closed under the SSS matrix structure and can be performed in linear computational complexity. Moreover, decompositions/factorizations such as the QR factorization [18, 20], the LU decomposition [22, 42], and the ULV decomposition [40] can also be computed in linear computational complexity and in a structure preserving way.

Similar to Definition 3.1 for SSS matrices, the generator representation for MSSS matrices, specifically the k-level SSS matrices, is defined as following.

DEFINITION 3.4. The matrix A is said to be a k-level SSS matrix if all its generators are (k-1)-level SSS matrices. The 1-level SSS matrix is the SSS matrix that satisfies Definition 3.1.

Most operations for SSS matrices can be extended to MSSS matrices, which yields linear computational complexity for MSSS matrices. MSSS matrices have many applications, one of them is the discretized partial differential equations (PDEs) [28].

Note that for a saddle-point system arising from the PDE-constrained optimization problem, all its blocks are MSSS matrices. This enables us to compute an LU factorization of all its blocks with MSSS matrix computations in linear computational complexity. However, the saddle-point matrix is not an MSSS matrix itself but just has MSSS blocks. It fails to compute an approximate LU factorization of the saddle-point system matrix by using MSSS matrix computations.

Lemma 3.5 explains how to permute a matrix with SSS blocks into a single SSS matrix. This property can be extended to matrices with MSSS blocks. This enables us to compute an LU factorization of the global saddle point matrix by using MSSS matrix computations in linear computational complexity.

Lemma 3.5 ([34]). Let A, B, C and D be SSS matrices with the following generator representations

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Then there exists a permutation matrix [PSI] with [PSI][[PSI].sup.T] = [[PSI].sup.T] [PSI] = I such that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and the matrix T is an SSS matrix. Its generator representation is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Proof. For the case that all the diagonal blocks of A have the same size, and all the diagonal blocks of D also have the same size, i.e., [m.sup.a.sub.i] = [m.sup.a] and [m.sup.d] = [m.sup.d], the permutation matrix [PSI] has the following representation

(3.2)[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [cross product] denotes the Kronecker product and I is the identify matrix with a proper size.

With the permutation matrix [PSI] given by (3.2), the permuted matrix is

(3.3) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

It is not difficult to verify that the matrix T is an SSS matrix and its generators are given in Lemma 3.5.

For the case that sizes of diagonal blocks A and D are varying, let [{[m.sup.a.sub.i]}.sup.n.sub.i=1] [{[m.sup.d.sub.i]}.sup.n.sub.i=1] represent the diagonal blocks sizes of A and D, respectively. The permutation matrix [PSI] is

(3.4) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where blkdiag [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] represents the block diagonal matrix with its diagonal blocks given by [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and blkdiag [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

With the permutation matrix [PSI] in (3.4), it is not difficult to show that the permuted matrix T given by (3.3) is an SSS matrix and its generators are given in Lemma 3.5.

REMARK 3.6. Given a matrix with SSS blocks, one can apply Lemma 3.5 to permute it into a single SSS matrix by using a permutation matrix [PSI]. However, this permutation matrix is not explicitly multiplied on both sides of the matrix to be permuted. The generators of the permuted matrix are combinations of the generators of its SSS blocks. This is illustrated by the generators representation of the permuted matrix in Lemma 3.5. Such permutations are cheaper to compute due to the fact that there is no matrix-matrix multiplication.

REMARK 3.7. Lemma 3.5 is for a 2 x 2 block matrix. It can be generalized to the case of matrices with a larger number of blocks.

REMARK 3.8. Extending Lemma 3.5 to the k-level SSS matrix case is also possible. If A, B, C, and D are k-level SSS matrices, then their generators are (k - 1)-level SSS matrices. For the permuted k-level SSS matrix T, its (k - 1)-level SSS matrix generators with (k - 1)-level SSS matrix blocks are permuted into a single (k - 1)-level SSS matrix by applying Lemma 3.5 recursively.

The saddle-point system (2.3) derived from the optimal control of the convection-diffusion equation in 2D, discretized by using the [Q.sub.1] finite element method, has MSSS (2-level SSS) matrix blocks. The structure of the saddle-point matrix before and after permutation for mesh size h = [2.sup.-3] are shown in Figure 3.1.

3.2. Multilevel sequentially semiseparable preconditioners. The most important part of the PDE-constrained optimization problem is to solve a linear system of the saddle-point type. In the following, we first introduce the LU factorization of MSSS matrices and then give a new model order reduction algorithm for SSS matrices, which is necessary for computing the LU factorization in linear computational complexity. For comparison, the conventional model order reduction algorithm [12] is also discussed.

3.2.1. LU Factorization of multilevel sequentially semiseparable matrices. The semiseparable order to be defined in Definition 3.9 plays an important rule in the MSSS matrix computations. Note that this type of structured matrices were studied by Dewilde et al. [13] and by Eidelman et al. [17] independently, and were called sequentially semiseparable matrices and quasiseparable matrices there, respectively. In this paper, we use the MATLAB style of notation for matrices, i.e., for a matrix A, A(i : j, s : t) selects rows of blocks from i to j and columns of blocks from s to t of A.

DEFINITION 3.9 ([19]). Let

rank A(k + 1 : n, 1 : k) = [l.sub.k], rank A(1 : k, k + 1 : n) = [u.sub.k], k = 1, 2, ..., n - 1,

where [l.sub.k] and [u.sub.k], k = 1,2, ..., N - 1, are called the lower order numbers and upper order numbers of the matrix A, respectively. Let [r.sup.l] = max [l.sub.k], [r.sup.u] = max [u.sub.k], and call [r.sup.l] and [r.sup.u] the lower quasi-separable order and the upper quasi-separable order of A, respectively.

DEFINITION 3.10 ([34]). The SSS matrix A with lower and upper semiseparable order [r.sup.l] and [r.sup.u] is called block ([r.sup.l], [r.sup.u]) semiseparable.

The semiseparable order for 1-level SSS matrices defined in Definition 3.9 can be directly extended to the multilevel cases, which leads to Definition 3.11.

DEFINITION 3.11. Let A be a N x N block k-level SSS matrix with its generators be M x M block (k - 1)-level SSS matrices. Let

rank A(k + 1 : N, 1 : k) = [l.sub.k], rank A(1 : k,k + 1 : N) = [u.sub.k], k =1,2, ...,N - 1,

where [l.sub.k] and [u.sub.k], k = 1,2, ..., N - 1, are called the k-level lower order numbers and the k-level upper order numbers of the k-level SSS matrix A, respectively. Let [r.sup.l] = max [l.sub.k], [r.sup.u] = max [u.sub.k], and call [r.sup.l] and [r.sup.u] the k-level lower semiseparable order and the k-level upper semiseparable order of the k-level SSS matrix A, respectively.

DEFINITION 3.12. The k-level SSS matrix A with k-level lower and upper semiseparable order [r.sup.l] and [r.sup.u] is called k-level block ([r.sup.l], [r.sup.u]) semiseparable.

By using these definitions, we can apply the following lemma to compute the LU factorization of a k-level SSS matrix.

LEMMA 3.13 ([22,42]). Let A be a strongly regular N x N block k-level sequentially semiseparable matrix of k-level block ([r.sup.l], [r.sup.u]) semiseparable and denoted by its generator representation A = MSSS([P.sub.s], [R.sub.s], [Q.sub.s], [D.sub.s], [U.sub.s], [W.sub.s], [V.sub.s]). Here we say that a matrix is strongly regular if its leading principal minors are nonsingular. Let A = LU be its block LU factorization, then,

1. The block lower-triangular factor L is a k-level sequentially semiseparable matrix of k-level block ([r.sup.L], 0) semiseparable and the block upper-triangular factor U is a k-level sequentially semiseparable matrix of k-level block (0, [r.sup.U]) semiseparable. Moreover, [r.sup.L] = [r.sup.l] and [r.sup.U] = [r.sup.u].

2. The factors L and U can be denoted by the generator representation

L = MSSS ([P.sub.s], [R.sub.s], [[??].sub.s], [D.sup.L.sub.s], 0, 0, 0),

U = MSSS(0, 0, 0, [D.sup.U.sub.s], [[??].sub.s], [W.sub.s], [V.sub.s]).

where [[??].sub.s], [D.sup.L.sub.s], [D.sup.U.sub.s] and [[??].sub.s] are (k - 1)-level sequentially semiseparable matrices. They are computed by the following algorithm:

Algorithm 1 LU factorization of a k-level SSS matrix A.

Input: [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

 1: [D.sub.1] = [D.sup.L.sub.1][D.sup.U.sub.1](LU factorization
    of (k - 1)-level SSS matrix)
 2: Let [[??].sub.1] = [([D.sup.L.sub.1]).sup.-1][U.sub.1] and
    [[??].sub.1] = [([D.sup.L.sub.1]).sup.-t][Q.sub.1]
 3: for i = 2 : N - 1 do
 4:   if i == 2 then
 5:     Mi = [[??].sup.T.sub.i-1] [[??].sub.i-1]
 6:   else
 7:     [M.sub.i] = [[??].sup.T.sub.i-1][[??].sub.i-1]+[R.sub.i-1]
        [M.sub.i-1][W.sub.i-1]
 8:   end if
 9:   ([D.sub.i] - [P.sub.i][M.sub.i][V.sup.T.sub.i])= [D.sup.L.sub.i]
      [D.sup.U.sub.i] (LU factorization of (k - 1)-level SSS matrix)
10:  Let [[??].sub.i] = [([D.sup.L.sub.i]).sup.-1] ([U.sub.i] -
     [P.sub.i][M.sub.i][W.sub.i]), [[??].sub.i] =
     [([D.sup.U.sub.i]).sup.-T]([Q.sub.i] - [V.sub.i]
     [M.sup.T.sub.i][R.sup.T.sub.i]).
11: end for
12: [M.sub.N] = [[??].sup.N.sub.-1] [[??].sub.N-1] + [R.sub.N-1]
    [M.sub.n-1] [W.sub.n-1]
13: ([D.sub.N] - [P.sub.N] [M.sub.N] [V.sub.N]) = [D.sup.L.sub.N]
    [D.sup.U.sub.N] (LU factorization of (k - 1)-level SSS matrix)

Output: [{[D.sup.L.sub.s]}.sup.N.sub.s=1],
[{[D.sup.U.sub.s]}.sup.N.sub.s=1], [{[[??]sub.s]}.sup.N.sub.s=1],
[{[[??]sub.s]}.sup.N.sub.s=1]


Proof. For the proof of this lemma, we refer to [22, 42]. ?

REMARK 3.14. In Algorithm 1, the LU factorization of a 0-level SSS matrix is just the LU factorization of an ordinary matrix without SSS structure.

In Algorithm 1, for computing the LU factorization of a k-level SSS matrix, the matrixmatrix operations are performed on its (k - 1)-level SSS generators, such as computing the recurrence of Mi in line 7 of Algorithm 1. Such operations lead to the growth of the (k - 1)level semiseparable order, which increases the computational complexity. This can be verified from the matrix-matrix operations introduced in [13, 17]. Take the 1-level SSS matrix A for example. The flops needed for computing [A.sup.2] is O([n.sup.3]N), where n is the semiseparable order and N is the number of blocks of A [13]. To be specific, the following lemma is introduced.

LEMMA 3.15 ([17]). Let [A.sub.1] t [A.sub.2] be SSS matrices of lower semiseparable order m1 and n1, respectively. Then the product [A.sub.1] [A.sub.2] is of lower semiseparable order at most [m.sub.1] + [n.sub.1]. Let [A.sub.1], [A.sub.2] be SSS matrices of upper semiseparable order [m.sub.2] and [n.sub.2], respectively. Then the product [A.sub.1] [A.sub.2] is upper semiseparable of order at most [m.sub.2] + [n.sub.2].

REMARK 3.16. For a k-level SSS matrix, since the semiseparable order varies at different levels, results of Lemma 3.15 also hold for the k-level semiseparable order. But we do not know the exact upper bound of the (k - 1)-level semiseparable order. We just know the (k - 1)-level semiseparable order will increase.

Lemma 3.15 states that the semiseparable order grows by multiplying two SSS matrices, which also holds for adding two SSS matrices. There are similar results for multilevel SSS matrix multiplication and addition. Model order reduction is necessary to compute an LU factorization of the k-level SSS matrix A using Algorithm 1. The aim of model order reduction for a k-level SSS matrix A with its generator representation A = MSSS([P.sub.s], [R.sub.s], [Q.sub.s], [D.sub.s], [U.sub.s], [W.sub.s], [V.sub.s]) is to find (k - 1)-level SSS matrices [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] of smaller order compared with [P.sub.s], [R.sub.s], [Q.sub.s], [U.sub.s], [W.sub.s], [V.sub.s], respectively, such that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is of k-level semiseparable order smaller than or equal to the minimal k-level semiseparable order of A. Meanwhile, [??] is an approximation of A up to a small tolerance e, i. e., [parallel][??] - A[parallel] < [member of].

REMARK 3.17. Since the LU factorization of a k-level SSS matrix needs the model order reduction for (k - 1)-level SSS matrices, the LU factorization in Lemma 3.13 is an exact factorization for SSS matrices because no model order reduction is needed for ordinary matrices (0-level SSS matrices). It is an inexact factorization for the k-level (k [greater than or equal to] 2) SSS matrices.

For discretized one-dimensional PDEs on a regular grid, the system matrix has a certain SSS structure. The LU factorization introduced in Lemma 3.13 could be performed as a direct solver. For discretized higher dimensional PDEs on regular grids, this LU factorization can be used as an efficient preconditioner.

3.2.2. Approximate balanced truncation for SSS matrices. As introduced in the last section, the model order reduction plays a key role in the LU factorization of an MSSS matrix. In this subsection, we design a new model order reduction algorithm for SSS matrices. This new method exploits the correspondence between SSS matrices and linear time-varying (LTV) systems.

The SSS matrices have a realization of linear time-varying systems, which is studied by Dewilde et al. in [16]. Consider a mixed-causal system that is described by the following state-space model

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [x.sup.c] is the causal system state, [x.sup.a] represents the anti-causal system state, [u.sup.i] is the system input, and [y.sub.i] is the system output. With zero initial system states and stacking all the input and output as [bar.u] [([u.sup.T.sub.1], [u.sup.T.sub.2], ... [u.sup.T.sub.N]).sup.T], [bar.y] [([y.sup.T.sub.1], [y.sup.T.sub.2], ... [y.sup.T.sub.N]).sup.T], the matrix H that describes the input-output behavior of this mixed-causal system, i.e., [bar.[gamma]] = H[bar.u], induces an SSS matrix structure. Take N = 4 for example, the matrix H is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Using the LTV systems realization for SSS matrices, we have the following lemma that gives a direct link between LTV systems order and the semiseparable order.

LEMMA 3.18 ([35]). The lower and upper semiseparable order for an SSS matrix with minimal LTV system realization are max [{[l.sub.i]}.sup.N.sub.i=2] and max {[k.sub.i]}[M.sup.-1.sub.i=1], respectively. Here [{[l.sub.i]}.sup.M.sub.i=2] and {[l.sub.i]}.sup.M-1.sub.i=1] are defined in Table 3.1.

We describe the lemma in [35] more exactly by restricting the realization of an SSS matrix to be minimal in Lemma 3.18. It is not difficult to set an example of an SSS matrix with small semiseparable order, but its LTV systems realization is of larger order. Lemma 3.18 states that the order of the causal LTV system is equal to the lower semiseparable order of an SSS matrix, while the order of the anti-causal LTV system is equal to the upper semiseparable order. Thus, to reduce the semiseparable order of an SSS matrix is the same as reducing the order of its realization by mixed-causal LTV systems.

Model order reduction for LTV systems is studied in [10, 36]. In [36], a linear matrix inequality (LMI) approach was introduced to solve the Lyapunov inequalities to compute the controllability and observability Gramians. In [10], the low-rank Smith method was presented to approximate the square-root of the controllability and observability Gramians of LTV systems.

Since the causal LTV system and the anti-causal LTV system have similar structures that correspond to the strictly lower-triangular part and the strictly upper-triangular part of the matrix H, respectively, we only consider the causal LTV system described by the following state-space model,

(3.5) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

over the time interval [[k.sub.o], [k.sub.f]] with zero initial states. The controllability Gramian [G.sub.c](k) and observability Gramian [G.sub.o](k) are computed by the following Stein recurrence formulas:

(3.6) [G.sub.c]{k + 1) = [R.sub.k] [G.sub.c](k)[R.sup.T.sub.k] + [Q.sub.k][Q.sup.T.sub.k], [G.sub.o](k) = [R.sub.k] [G.sub.o](k +1)[R.sub.k] + [P.sup.T.sub.k] [P.sub.k],

with initial conditions [G.sub.c]([k.sub.o]) = 0 and [G.sub.o]([R.sub.k] + 1) = 0.

Note that the controllability Gramian [G.sub.c](k) and observability Gramian [G.sub.o](k) are positive definite if the system is completely controllable and observable or semi-definite if the system is partly controllable and observable. Thus their eigenvalues are non-negative and often have a large jump at an early stage [2, 3]. This suggests to approximate these two Gramians at each step by a low-rank approximation. In this paper, we just consider the case that the LTV systems are uniformly completely controllable and observable over the time interval, which means that both [G.sub.c] and [G.sub.o] are positive definite. This is reasonable because the SSS matrices considered in this paper correspond to uniformly completely controllable and observable LTV systems.

Since the controllability Gramian [G.sub.c](k) and observability Gramian [G.sub.o](k) have similar structure, we will only use the controllability Gramian [G.sub.c](k) to introduce the basic idea. The key point of the low-rank approximation is to substitute the factorization of the controllability Gramian [G.sub.c](k)

[G.sub.c](k) = [L.sub.c](k)[L.sup.T.sub.c](k),

where [L.sub.c](k) [member of] [R.sup.MxM], by its low-rank factorization

(3.7) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

at each step k, where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Typically, [m.sub.k] is set to be constant, i.e., [m.sub.k] = m at each step. It can be also chosen adaptively by setting a threshold e for the truncated singular values. If [G.sub.c](k) is of low numerical rank, it is reasonable to use the rank [m.sub.k] approximation (3.7) to approximate [G.sub.c](k).

The low-rank approximation of the controllability and observability Gramians for LTV systems is introduced in Algorithm 2.

Algorithm 2 Low-Rank Approximation of the Gramians for LTV Systems.

Input: LTV system [{[P.sub.k]}.sup.N.sub.k=2],
[{[R.sub.k]}.sup.N-1.sub.k=2], [{[Q.sub.k]}.sup.N-1.sub.k=1],
reduced LTV system order m

 1: for k = 2 : N do
 2:   if k == 2 then
 3:     [Q.sub.k-1] ] = [U.sub.c][[SIGMA].sub.c][V.sup.T.sub.c]
        (singular value decomposition)
 4:   else
 5:     [[Q.sub.k-1] | [R.sub.k-1] [[??].sub.c](k - 1) ] = [U.sub.c]
        [[SIGMA].sub.c][V.sup.T.sub.c] (singular value decomposition)
 6:   end if
 7:   Partition [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
 8:   Let [[??].sub.c](k) = [U.sub.c1][[SIGMA].sub.c1] [member of]
        [R.sup.Mxm]
 9: end for
10: for k = N : 2 do
11:   if k==N then
12:      [[P.sup.T.sub.k]] = [U.sub.o][[SIGMA].sub.o][V.sup.T.sub.o]
         (singular value decomposition)
13:   else
14:
15:      [[P.sup.T.sub.k]] [R.sup.T.sub.k] [[??].sub.o](k + 1) ] =
         [U.sub.o][[SIGMA].sub.o][V.sup.T.sub.o] (singular value
         decomposition)]
16:   end if
17:   Partition [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
18:   Let [[??].sub.o](k) = [U.sub.o1][[SIGMA].sub.o1] [member of]
      [R.sup.mxm]
19: end for

Output: Approximated factors [{[[??].sub.c](k)}.sup.N.sub.k=2],
        [{[[??].sub.o](k)}.sup.N.sub.k=2]


In [9], the recursive low-rank Gramians method was used to approximate the Gramians of the linear time-invariant (LTI) systems. Such methods can also be applied to approximate the Gramians of the LTV systems. This is studied by the same author in an earlier reference [11]. In this manuscript, we study the connections between LTV systems and SSS matrices. Meanwhile, we extend the model order reduction algorithm for LTV systems to the model order reduction for SSS matrices. The low-rank approximation method in [9, 11] was used to approximate the Gramians of the LTV systems that the SSS matrix corresponds to and the approximate balanced truncation method was applied for the model order reduction. Even the low-rank approximation method in this manuscript and the one in [11] are quite similar, the novelty is that this algorithm has never been applied to reduce the rank of the off-diagonal blocks of structured matrices.

The balanced truncation approximates the LTV systems in the following way. The Hankel map, which maps the input from past to the output in the future, has the following definition for the LTV systems,

(3.8) [H.sub.k] = [O.sub.k] [C.sub.k],

where [O.sub.k] and [C.sub.k] are the state to outputs map, and input to state map at time instant k, respectively. Meanwhile, the following relation holds

(3.9) [G.sub.c](k)= [C.sub.k] [C.sup.T.sub.k], [G.sub.o](k) = [O.sup.T.sub.k] [O.sub.k],

where [G.sub.c](k) and [G.sub.o](k) are the controllability Gramian and observability Gramian defined in (3.6), respectively.

The Hankel singular values [[sigma].sub.H] are the singular values of the Hankel map, and it was computed via the following equations in the finite dimensional spaces.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

It was shown in [43] that for any two positive definite matrices, there always exits a so-called contragredient transformation such that

[[LAMBDA].sub.k] = [T.sup.-1.sub.k] [G.sub.c](k)[T.sup.-T.sub.k] = [T.sup.T.sub.k] [G.sub.o](k)[T.sub.k],

where [[LAMBDa].sub.k] = [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is a diagonal matrix. With this contragredient transformation, we have

[T.sup.-1.sub.k][G.sub.c](k)[G.sub.o](k)[T.sub.k] = [[LAMBDA].sup.2.sub.k]

This states that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are the Hankel singular values at time instant k. Such contragredient transformation brings the systems into a "balanced" form, which means that the controllability Gramian and observability Gramian of the system are equal to a diagonal matrix. For the LTV system (3.5), after such a transformation, the balanced LTV system is

(3.10) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Since for system (3.10), the controllability and observability Gramians are balanced, truncation can be performed to truncate the Hankel singular values that are below a set threshold. This could be done by using the left and right multipliers [L.sub.i] and [L.sub.r] that are defined by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [I.sub.m] is an m x m identity matrix and m is the reduced system dimension size. Then the reduced LTV system is

(3.11) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The reduced LTV system (3.11) is computed via a projection method with the projector defined by [pi](k) = [[pi].sub.r](k)[[pi].sub.l](k). This is because

[[pi].sub.l](k)[[pi].sub.r](k) = [L.sub.l][T.sup.-1.sub.k][T.sub.k] [L.sub.r] = [I.sub.m]

and

[pi][(k).sup.2] = [[pi].sub.r](k)[[pi].sub.l](k)[[pi].sub.r](k)[[pi].sub.l](k) = [pi](k).

For the approximated Gramians [[??].sub.c](k) and [[??].sub.o](k), which are positive semi-definite, we have the following lemma, which states that there also exists a contragredient transformation such that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [[LAMBDA]'.sub.k] is a diagonal matrix and

(3-12) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Lemma 3.19 ([43], Theorem 3). Let Q, P [member of] [R.sup.MxM] be symmetric positive semidefinite and satisfy

rank Q = rank P = rank QP = m,

where m [less than or equal to] M. Then there exists a nonsingular matrix W [member of] [R.sup.MxM] (contragredient transformation) and a positive definite diagonal matrix [SIGMA] G [R.sup.mxm], such that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Proof. For the proof of this lemma, we refer to [43].

We have already explained that the diagonal entries of the matrix [[LAMBDA]'.sub.k] in (3.12) are the Hankel singular values of the approximate Hankel map in (3.8). If the controllability Gramian[ [G.sub.c](k) and the observability Gramian [G.sub.o](k) are well approximated by [[??].sub.c](k) and [[??].sub.o](k) separately, then [[??].sub.c](k)[[??].sub.o](k) also approximates [G.sub.c](k)[G.sub.o](k) well. This means that the approximate Hankel singular values [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are close to the original Hankel singular values [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. In Algorithm 3, we show how to use the approximated Gramians [[??].sub.c] k) and [[??].sub.o] (k) to compute the reduced system. By using the approximated Gramians, this method is called the approximate balanced truncation.

Algorithm 3 Approximate Balanced Truncation for LTV systems.

Input: LTV system [{[P.sub.k]}.sup.k=2], [{[R.sub.k]}.sup.N-1.sub.k=2],
       [{[Q.sub.k]}.sup.N-1.sub.k=1], reduced system order m

 1: Apply Algorithm 2 to compute the approximated Gramian factors
    [{[[??].sub.c](k)}.sup.N.sub.k=2] and
    [{[[??].sub.o](k)}.sup.N.sub.k=2]
 2: for k=2: N do
 3:     Compute the singular value decomposition [[??].sup.T.sub.c](k)
        [[??].sub.o](k) = [U.sub.k][[SIGMA].sub.k][V.sup.T.sub.k].
 4:     Let [[product].sub.l](k) = [[SIGMA].sup.1/2.sub.k]
        [V.sup.T.sub.k]
        [[??].sup.T.sub.o](k), and [[product].sub.r](k) =
 [[??].sub.c](k)
        [U.sub.k][[SIGMA].sup.1/2.sub.k]
 5: end for
 6: for k=1 : N do
 7:     if k == 1 then
 8:        [[??].sub.k] = [[product].sub.l](k + l)[Q.sub.k]
 9:     else if k == N then
10:       [[??].sub.k] = [P.sub.k][[product].sub.r] (k)
11:    else
12:       [[??].sub.k] = [[product].sub.l](k + i)[Q.sub.k],
          [[??].sub.k] = [[product].sub.l](k + 1)[R.sub.k]
   [[product].sub.r](k), [[??].sub.k] =
          [P.sub.k] [[product].sub.r] (k)
13:    end if
14: end for

Output: Reduced LTV system [MATHEMATICAL EXPRESSION NOT
REPRODUCIBLE IN ASCII]


We can prove the following lemma.

LEMMA 3.20. Algorithm 3 is an approximate balanced truncation method for the LTV system (3.5).

Proof. According to Lemma 3.19, there exists a contragredient transformation [[??].sub.k] [member of] [R.sup.MxM] such that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [[LAMBDA]'.sub.k] is a diagonal matrix and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Here

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are the singular values of [[??].sup.T.sub.c](K)[[??].sub.o](k), i.e.,

[[??].sup.T.sub.c](K)[[??].sub.o](k) = [U.sub.k] [[SIGMA].sub.k][V.sup.T.sub.k]

According to the proof of Lemma 3.19, the contragredient transformation [[??].sub.k] is computed via

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and the inverse of such transformation is computed by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where the columns of [N.sub.o](k) [member of] [R.sup.Mx(M-m)] span the null space of [[??].sub.o](k), the columns of [N.sub.c](k) [member of] [R.sup.Mx(M-m)] span the null space of [[??].sub.c](k), and the following singular value decomposition holds

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

With this contragredient transformation [[??].sub.k], the left and right multipliers are computed via

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

And the projection is defined via

[[product].sub.k] = [[product].sub.r] (k) [[product].sub.l](k),

since [[product].sub.l](k) [[product].sub.r](k) = [I.sub.m] and [[product].sup.2.sub.k] = [[product].sub.k].

Note that the low-rank approximation together with the approximate balanced truncation for linear time-invariant (LTI) system is studied in [8, 9]. Only balanced truncation for linear time-varying (LTV) system is studied in [11].

For an SSS matrix A = SSS([P.sub.s], [R.sub.s], [Q.sub.s], [D.sub.s], [U.sub.s], [W.sub.s], [V.sub.s]) with lower semiseparable order M, we have already explained its LTV system realization. Thus, Algorithm 2 and Algorithm 3 can be performed to reduce the order of the causal LTV system (3.5), which corresponds to reduce the lower semiseparable order. This yields the approximated SSS matrix [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. For the strictly upper-triangular part of A, we first transpose it to the strictly lower-triangular form then perform Algorithm 2 and Algorithm 3. After this reduction, we transpose the reduced strictly lower-triangular part to the strictly upper-triangular form.

3.2.3. Hankel blocks approximation. To compare with our model order reduction method for SSS matrices, we describe the standard model order reduction algorithm in this part. It is called the Hankel blocks approximation in [12, 13]. The Hankel blocks of an SSS matrix are defined as following.

DEFINITION 3.21 ([12]). Hankel blocks denote the off-diagonal blocks that extend from the diagonal to the northeast corner (for the upper case) or to the southwest corner (for the lower case).

Take a 4 x 4 block SSS matrix A for example. The Hankel blocks for the strictly lower-triangular part are shown in Figure 3.2 by [H.sub.2], [H.sub.3] and [H.sub.4].

It is easy to verify that for the Hankel blocks [H.sub.i], i = 2, ..., N, the following relation holds

(3.13) [H.sub.i] = [O.sub.i][C.sub.i], i = 2,, ..., N,

where [O.sub.i] and [C.sub.i] are the current state to the current and future output map and the past input to the current state map for system (3.5), respectively. Moreover, the following relations hold for [O.sub.i] and [C.sub.i].

(3.14) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(3.15) [C.sub.i+1] = [[R.sub.i][C.sub.i] [Q.sub.i]], for i = 2,,..., N - 1, and [C.sub.2] - [Q.sub.1].

The two maps [C.sub.i] and [O.sub.i] also satisfy

[G.sub.c] (i) = [C.sub.i][C.sup.t.sub.f], [G.sub.o] (i) = [O.sup.T.sub.i] [O.sub.i],

where [G.sub.c](i) and [G.sub.o](i) are the controllability Gramian and observability Gramian that satisfy (3.6).

The rank of the Hankel map [H.sub.i] at time step i, i.e., the rank of the i-th Hankel block is the order of the states [x.sub.i] of system (3.5); see [40]. The standard way to reduce the semiseparable order is given in [12, 13]. This standard approach is based on the realization theory of a given Hankel map for LTV systems that is introduced in [16, 40], i.e., according to the given Hankel map [{[H.sub.i]}.sup.N.sub.i=2], find a triple {[P.sub.k], [R.sub.k], [Q.sub.k]} that satisfy (3.13)-C3.15). By using the realization theory, it is also possible to get the reduced triple [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] that approximates the Hankel map [H.sup.i] in (3.13).

To do this approximation, first we need to transform the map [O.sub.i] to the form that has orthonormal columns and transform the map [C.sub.i] to the form that has orthonormal rows. These two forms are called the left proper form and the right proper form [13, 12], respectively. We use the change of [C.sub.i] to introduce the basic idea. The first step is to do a singular value decomposition (SVD) of the starting point [C.sub.2], which gives

[C.sub.2] = [U.sub.2][[SIGMA].sub.2][V.sub.2.sup.T],

and let [[bar.C].sub.2] = [V.sub.2.sup.T]. At this step, the map [C.sub.2] is transformed to the form [[bar.C].sub.2] that has orthonormal rows. Due to the change of [C.sub.2], to keep the Hankel map [H.sub.i] = [O.sub.i][C.sub.i] unchanged, the map [O.sub.2] is changed to

[[bar.O].sub.2] = [O.sub.2] [U.sub.2] [[SIGMA].sub.2].

Then the Hankel map satisfies

[[bar.H].sub.2] = [[bar.O].sub.2][[bar.C].sub.2] = [O.sub.2][U.sub.2][[SIGMA].sub.2][V.sub.2.sup.T] = [O.sub.2][C.sub.2] = [H.sub.2].

Since all these transformations have to be done on the triple {[P.sub.k], [R.sub.k], [Q.sub.k]}, not on the maps, we have

[[bar.Q].sub.1] = [[bar.C].sub.2] = [V.sub.2.sup.T],

and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

which gives [[bar.P].sub.2] = [P.sub.2][U.sub.2][[SIGMA].sub.2] and [[bar.R].sub.2 = [R.sub.2][U.sub.2][[SIGMA].sub.2].

Now, suppose at step i, the map [[bar.C].sub.i] already has orthonormal rows, then for [C.sub.i+1], we have

(3.16) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

By performing a singular value decomposition on [[bar.R].sub.i] [Q.sub.i]], we have

(3.17) [[bar.R].sub.i] [Q.sub.i]] = [U.sub.i][[SIGMA].sub.i][V.sup.T.sup.i].

Let [[bar.R].sub.i] [[bar.Q].sub.i]] = [V.sub.i.sup.T,] and partition [V.sub.i] such that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] to make the size of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] match the size of [Q.sub.i]. Then let

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

To keep the use of notations consistent, we reuse [[bar.R].sub.i] to denote the transformed [[bar.R].sub.i], i.e., [[??].sub.i], this gives [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. By doing this, we have the transformed map

(3.18) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

which also has orthonormal rows. This is due to

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

since [[bar.C].sub.i] also has orthonormal rows. Then the Hankel map at time step i + 1 before and after such transformation has the following relation,

(3.19) [C.sub.i+1] = U.sub.i[[SIGMA].sub.i][[bar.C].sub.i+1],

which can be checked by associating (3.16) and (3.17) with (3.18).

To keep the Hankel map at time step i + 1 unchanged, the following relation needs to hold,

(3.20) [[bar.O].sub.i+i] = [O.sub.i+1][U.sub.i][[SIGMA].sub.i].

Since [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], by letting [[bar.P].sub.i+1] = [P.sub.i+1] [U.sub.i] [[SIGMA].sub.i] and [[bar.R].sub.i+1] = [R.sub.i+1] [U.sub.i] [[SIGMA].sub.i]the transformed map

(3.21) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

By checking (3.19)-(3.21), it is easy to get the unchanged Hankel map at time step i + 1. Similar procedure can be applied to transform the map [O.sub.i] to the form that has orthonormal columns. The procedure for transforming [C.sub.i] and [O.sub.i] is shown in Algorithm 4.

After transforming the map [O.sub.i] and [C.sub.i] into the form with orthogonal column basis and row basis, we can truncate unimportant column spaces and row spaces of [O.sub.i] and [C.sub.i], which gives the approximated Hankel map [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

3.3. Operations count for the two model order reduction methods. Given an SSS matrix A = SSS([[P.sub.S]], [R.sub.s], [Q.sub.s], [D.sub.s], [U.sub.s], [W.sub.s], [V.sub.s]), to compare the operations count of the approximate balanced truncation described by Algorithm 2-3 and the Hankel blocks approximation introduced in Algorithm 4, we assume that the generator sizes in Table 3.1 are uniform, i.e., [m.sub.i] = n and [k.sub.i] = [l.sub.i] = M. Here N is the number of SSS blocks (LTV system time steps), M is the unreduced LTV system order, and N [much greater than to] M [much greater than to] n. The reduced SSS matrix is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is the reduced semiseparable order and m [much less than to] M.

In this paper, we measure the operations count by the floating-point operations (flops). To count the operations of the approximate balanced truncation, we first count the operations for the low-rank approximation in Algorithm 2. In the forward recursion, the singular value decomposition uses

(3.22) [m.sup.2]M + [(m + n).sup.2] M (N - 2)

flops. In this recursion, two matrix-matrix product are computed in each iteration. This consumes

(3.23) m[M.sup.2](N - 2) + [m.sup.2]M (N - 1)

flops. Adding (3.22) and (3.23) gives the total flop count for the forward low-rank approximation,

(3.24) [m.sup.2]M + [(m + n).sup.2]M (N - 2) + m[M.sup.2] (N - 2) + [m.sup.2]M (N - 1).

Since the forward low-rank approximation and the backward low-rank approximation are symmetric in computing, the flop count for the backward low-rank approximation is equal to (3.24). Then the flop count [F.sub.l] for the low-rank approximation Algorithm 2 is

(3.25) [F.sub.l] = 2m[M.sup.2]N + 4[m.sup.2]MN + 4mnMN + 2[n.sup.2]MN - 4[(m + n).sup.2] M - 4m[M.SUP.2].

Algorithm 4 Hankel Blocks Approximation.

Input: LTV system [{[P.sub.k]}.sup.N.sub.k=2],
[{[R.sub.k]}.sup.N.sub.k=2], [{[Q.sub.k]}.sup.N-1.sub.k=1]
reduced system order m

 1: for i = 2 : N do
 2:   if i == 2 then
 3:    [Q.sub.i-1] = [U.sub.i][[SIGMA].sub.i][V.sup.T.sup.i]
       (singular value decomposition)
 4:     Let [Q.sub.i-1] = [V.sup.T.sup.i], [P.sub.i] =
       [P.sub.i][U.sub.i][[SIGMA].sub.i], and R = [R.sub.i][U.sub.i]
       [[SIGMA].sub.i]
 5:   else if i == N then
 6:     [[R.sub.i-1] [Q.sub.i-1]] = [U.sub.i]
        [[SIGMA].sub.i[V.sup.T.sup.i] (singular value decomposition)
 7:     Partition [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
        such that the size of [Q.sub.i-1] and [MATHEMATICAL EXPRESSION
        NOT REPRODUCIBLE IN ASCII] match
 8:     Let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
 9:   else
10:     [[R.sub.i-1][Q.sub.i-1] = [U.sub.i][[SIGMA].sub.i]
        [V.sup.T.sup.i] (singular value decomposition)
11:     Partition [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
        such that the size of [Q.sub.i-1] and [MATHEMATICAL
        EXPRESSION NOT REPRODUCIBLE IN ASCII] match
12:     Let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
13:   end if
14: end for
15: for i = N : 2 do
16:   if i == N then
17:     [P.sub.i] = [U.sub.i][[SIGMA].sub.i][V.sup.T.sup.i]
        (singular value decomposition)
18:     Let [P.sub.i] = [U.sub.i], [R.sub.i-1] = [[SIGMA].sub.i]
        [V.sup.T.sub.i][R.sub.i-1], [Q.sub.i-1] =     [[SIGMA].sub.i]
        [V.sup.T.sub.i] [Q.sub.i-1]
19:   else if i == 2 then
20:     [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (singular
        value decomposition)
21:     Partition [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
        such that the size of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE
IN ASCII] and [R.sub.i] match
22:     Let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
23:   else
24:     [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (singular
        value decomposition)
25:     Partition [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
        such that the size of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE
        IN ASCII]
        and [R.sub.i] match
26:     Let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
27:   end if
28: end for
29: for i = 1 : N do
30:   if i == 1 then
31:     Partition [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
32:   else if i==N then
33:     Partition [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
34:   else
35:     Partition [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
36:     Partition [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
37:     Partition [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
38:   end if
39: end for

Output: Reduced LTV systems [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE
IN ASCII]


Next we count the operations of the approximate balanced truncation Algorithm 3. First, to compute [[product].sub.L](k) and [[product].sub.R] (k), the flop count is

(3.26) ([m.sup.2]M + [m.sup.3] + 2([m.sup.2] + [m.sup.2]M)) (N - 1),

and the flop count to compute the reduced LTV system is

(3.27) 2 mnM (N - 1) + (m[M.sup.2] + [m.sup.2] M)(N - 2).

Thus the total flop count [F.sub.a] for the approximate balanced truncation is the summation of (3.26) and (3.27), which gives

(3.28) [F.sub.a] = ([M.sup.2] + mM + 2nM)N - 2mn(M + m + n).

Then we have the total flop count [F.sub.la] of the approximate balanced truncation by adding (3.25) to (3.28). Since we have N [much greater than to] M [much greater than to] m, n, we just use the O() to denote the total flop count. Thus, we have

(3.29) [F.sub.la] = O ((2m + 1)[M.sup.2]N).

Similarly, the operations count [F.sub.h] for the Hankel blocks approximation in Algorithm 4 is

(3.30) [F.sub.h] = 4[M.sup.3]N + 6n[M.sup.2]N + 2([n.sup.2] + 2n)MN - 8[M.sup.3] - 12n[M.sup.2] +2([n.sup.2] - 3n)M + 2[n.sup.2], which can be written in the O(*) notation as

[F.sub.la] = O(4[M.sup.3]N).

Since N [much greater than to] M [much greater than to] m, by comparing the flop count Fla for the approximate balanced truncation in (3.29) with the flop count Fh for the Hankel blocks approximation in (3.30), we see that the approximate balanced truncation algorithm is computationally cheaper than the Hankel blocks approximation for the model order reduction of SSS matrices.

REMARK 3.22. We can see from [F.sub.la] in (3.29) and [F.sub.h] in (3.30), that the flop count is linear with N for both method, where N denotes the number of blocks of an SSS matrix. Moreover, the size of the SSS matrix equals to nN and n [much less than to] N. Thus, both methods give computational complexity that is linear with the matrix size.

3.4. Flowchart of preconditioning by using MSSS matrix computations. We have already described the MSSS matrix computations and how to compute a preconditioner using such matrix computations. The flowchart shown in Figure 3.3 illustrates how to compute a preconditioner for the PDE-constrained optimization problem (2.1).

4. Numerical experiments. In this section, we study the problem of optimal control of the convection-diffusion equation that is introduced in Example 4.1. First, we compare the performance of our model order reduction algorithm with the conventional model order reduction algorithm. Next we test the global MSSS preconditioner and the block diagonal MSSS preconditioner. Numerical experiments in [30] also show the advantage of the global MSSS preconditioner over the lower-triangular block MSSS preconditioner for the PDE-constrained optimization problems. The superiority of the global MSSS preconditioner to the block preconditioners that are computed by the multigrid methods for computational fluid dynamics (CFD) problems is illustrated in [31].

EXAMPLE 4.1 ([32]). Let [OMEGA] = {(x, y)|0 [less than or equal to] x [less than or equal to] 1,0 [less than or equal to] y [less than or equal to] 1} and consider the problem

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [[GAMMA].sub.D] = [partial derivative][OMEGA] and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[member of] is a positive scalar, [??] [(cos([theta]), sin([theta])).sup.T] is the unit directional vector and the prescribed state [??] = 0.

The numerical experiments are performed on a laptop with Intel Core 2 Duo P8700 CPU of 2.53 GHz and 8Gb memory in Matlab R2010b. The iterative solver is stopped by either reducing the 2-norm of the residual by a factor of [10.sup.-6] or reaching the maximum number of iterations that is set to be 100 in this manuscript. Note that there are three unknowns on each grid point. The problem sizes 3.07e + 03,1.23e + 04, 4.92e + 04 and 1.97e + 05 correspond to the mesh sizes h = [2.sup.-5], [2.sup.-6], [2.sup.-7], and [2.sub.-8], respectively. In the following tables, the maximum semiseparable order for the model order reduction is given in the brackets following the problem size. The "preconditioning" columns report the time to compute the preconditioner; the "MINRES" or "IDR(s)" columns give the time to solve the saddle point problem by using such Krylov solver; the "total" columns show the summation of them. All the times are measured in seconds.

4.1. Comparison of two model order reduction algorithms. Here we test the performance of the two model order reduction algorithms. Consider the preconditioning of optimal control of the convection-diffusion equation described by Example 4.1. For the block-diagonal preconditioner [P.sub.1] that is computed by the approximate balanced truncation algorithm and the Hankel blocks approximation method, the results for different [member of] and [beta] are shown in Tables 4.1-4.8, while [theta] is set to be [pi]/5 for all the experiments.

The results in Tables 4.1-4.8 show that the time to compute the preconditioner and iteratively solve the saddle-point system is linear in the problem size, which illustrates that the MSSS preconditioning technique has linear computational complexity. This shows that for the same group of e and fi, the block MSSS preconditioners computed by the approximate balanced truncation and Hankel blocks approximation methods give mesh size independent convergence. Moreover, the number of iterations for the block MSSS preconditioners computed by both model order reduction algorithms are the same.

REMARK 4.1. As shown by (3.29) and (3.30), the approximate balanced truncation is computationally cheaper than the Hankel blocks approximation and both algorithms have linear computational complexity. This is illustrated by the time to compute the preconditioners by these two methods for the same group of [beta] and [member of] in Tables 4.1-4.8.

The optimal solution of the system states and input for [beta] = [10.sup.-2], [member of] = [10.sup.-1] and h = [2.sup.-5] are shown in Figure 4.1(a) and Figure 4.1(b).

The block-diagonal and block lower-triangular MSSS preconditioners computed by these two model order reduction algorithms are also tested on the problems of optimal control of the Poisson equation and optimal control of the convection-diffusion equation in [30]. These results in [30] are consistent with the conclusions we draw here on the performance of these two model order reduction algorithms and the performance of such block MSSS preconditioners.

4.2. Comparison of preconditioners. In this subsection, we test the performance of the block-diagonal MSSS preconditioner and the global MSSS preconditioner. For the block diagonal MSSS preconditioner, from Tables 4.1-4.8 we have seen that with the decrease of [beta], the number of iterations increases slightly for the same problem size and [member of]. This is due to the [1/2[beta]]M term, which plays an increasingly more important role with the decrease of [beta], while this term is often neglected in the preconditioner [P.sub.1] in (2.5) for large and medium value of [beta] [32]. For smaller [beta], the computational results for the block-diagonal MSSS preconditioner are shown in Tables 4.9-4.10. For the preconditioners tested here, the Hankel blocks approximation method is chosen as the model order reduction algorithm. Results for the preconditioners computed by the approximate balanced truncation can be found in [30].

As shown in Tables 4.9-4.10, with the decrease of [beta] from [10.sup.-3] to [10.sup.-4], the number of iterations at least doubled. Clearly if ft is small, the block-diagonal MSSS preconditioner [P.sub.1] cannot give satisfactory results. Alternatively, for small ft, we can choose the block-diagonal MSSS preconditioner as follows

(4.1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The computational results of this preconditioner for [beta] = [10.sup.-4] are given in Table 4.11. The maximum number of iterations is set to 100.

The results in Tables 4.10-4.11 show that the block-diagonal MSSS preconditioner does not give satisfactory performance when [beta] becomes so small. In those tables, "no convergence" means that the 2-norm of the residual does not converge to desired accuracy within 100 iterations. This is due to the fact that the Schur complement is difficult to approximate for small [beta].

Recall from Section 2 that we can permute the saddle-point system with MSSS blocks into a global MSSS system. Due to the indefiniteness of the global MSSS preconditioner, MINRES is not suitable to iteratively solve the preconditioned saddle-point system. Instead, the induced dimension reduction (IDR(s)) method [41] is chosen as the Krylov solver. To compare with the results for the block-diagonal MSSS preconditioner in Tables 4.9-4.11, we apply the global MSSS preconditioner to the same test case. The results are given in Tables 4.12-4.13.

Even though it takes slightly more time to compute the global MSSS preconditioner than to compute the block-diagonal MSSS preconditioner, much less time is needed for the IDR(s) method to solve the preconditioned system by the global MSSS preconditioner. Meanwhile, the total time in Tables 4.12-4.13 scales linearly with the problem size.

REMARK 4.2. By comparing the computational results of the global MSSS preconditioner with that of the block-diagonal MSSS preconditioner, we find that for the same numerical test with the same group of [beta] and [member of] that the number of iterations is reduced significantly by the global MSSS preconditioner. Meanwhile, the global MSSS preconditioner gives both mesh size and [beta] independent convergence. This makes the global MSSS preconditioner superior to the block preconditioners.

5. Preconditioning for optimal control of 3D problems. As analyzed in Section 3.2.1, to do an LU factorization of a k-level SSS matrix, the model order reduction of a (k - 1)-level SSS matrix is needed. Therefore, to compute a preconditioner for 3D problems using MSSS matrix computations, model order reduction for 2-level SSS matrices is needed. Since the model order reduction for two and higher level SSS matrices is still an open problem, only preliminary results are given in this section for solving the following problem of optimal control of the 3D Poisson equation

(5.1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [OMEGA] = {(x, y, z) | 0 [less than or equal to] x [less than or equal to] 1, 0 [less than or equal to] y [less than or equal to] 1, 0 [less than or equal to] z [less than or equal to] 1} and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The discretized analog of problem (5.1) is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where

(5.2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Here the matrices D and L in K are 2-level SSS matrices, M is the 3D mass matrix that has the same structure as K, d is a vector corresponding to the given boundary condition. To compute the optimal solution of (5.1), a system of the form (2.3) needs to be solved. Here we again study two types of preconditioners: the block-diagonal MSSS preconditioner and the global MSSS preconditioner.

5.1. Block-diagonal preconditioner. In this subsection, we test the block-diagonal preconditioner for large and medium [beta]. The block-diagonal preconditioner [P.sub.1] (2.5) is used. Here the Schur complement is approximated by [??][M.sup.-1][[??].sup.T], where [??] is an approximation of K by MSSS matrix computations.

To approximate the symmetric positive definite matrix K, we compute its approximate Cholesky factorization with MSSS matrix computations. At the k-th step of the Cholesky factorization, the Schur complement is computed via

(5.3) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Since D and L are 2-level SSS matrices, Sk is also a 2-level SSS matrix. In the recurrence (5.3), both the 2-level and 1-level semiseparable orders of [S.sub.k] increase as k increases. Model order reduction for 2-level and 1-level SSS matrices are necessary, of which the model order reduction for 2-level SSS matrix is still an open problem. Here we use an alternative method to approximate the Schur complement with lower 2-level semiseparable order.

As pointed out in [14], for a symmetric positive definite matrix obtained from the discretization of PDEs with constant coefficients, all its subsequent Schur complements are also symmetric positive definite and converge to a fixed point matrix [S.sub.[infinity]] with a fast rate. In [15], Dewilde et al. used a hierarchical partitioning of the 3D matrix K (5.2) and did computations on 2D matrices using the 1-level SSS matrix computations to precondition the 3D Poisson equation on an 8 x 8 x 8 regular grid. Due to the fact that 1-level SSS matrix computations were performed on 2D matrices, the linear computational complexity is lost. Note that there is no numerical experiment in [15] to study the performance of such preconditioning technique for any Krylov solver.

In this manuscript, we extend the method in [15] to the optimal control of 3D Poisson equation in the following ways. Instead of using the hierarchical partitioning of a 3D matrix, we use the 3-level SSS matrix formulation. This avoids cutting on "layers" that is introduced in [15] to bound the 2-level semiseparable order. We exploit the fast convergence property of the Schur complements of symmetric positive definite matrices to bound the 2-level semiseparable order. As analyzed in Section 3.1, the 1-level and 2-level semiseparable order both grow in computing the Schur complements in (5.3). To reduce the 1-level semiseparable order, we can apply the approximate balanced truncation or the Hankel blocks approximation that are introduced in Section 3.2. To bound the 2-level semiseparable order, we use a different approach. We compute the Schur complements of the first [k.sub.r] steps using MSSS matrix computations. Here [k.sub.r] is a small constant that can be chosen freely. By using the Schur complement at step [k.sub.r] to replace the Schur complements, i.e., we have the following recursions for the Schur complements

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Since only the Schur complements are computed in the first [k.sub.r] steps, the 2-level semiseparable order is bounded. This also bounds the computational complexity. Due to the fast convergence property, the Schur complement at step [k.sub.r] gives an efficient approximation of the Schur complements afterwards. We also extend the fast convergence property of the Schur complements for the symmetric positive definite matrix to the symmetric indefinite matrix case. This extension enables us to compute a good approximation of the 3D global saddle-point matrix, which gives an efficient global MSSS preconditioner.

Here we apply the block-diagonal MSSS preconditioner (2.5) and the MINRES method to iteratively solve the saddle-point system. The computational results are reported in Tables 5.1-5.3. Note that if the mesh size h is halved, the problem size grows by a factor of 8. Besides, there are three unknowns per grid point. The 1-level semiseparable order is set to be 6 for all the numerical experiments in this part. The iterative solver is stopped if the 2-norm of the residual is reduced by a factor of [10.sup.-6]. The Hankel blocks approximation is chosen as the model order reduction method. The "preconditioning", "MINRES", and "total" columns have the same meanings as in the previous tables.

We can see from Tables 5.1-5.3 that for fixed h and [beta], the number of iterations decreases as [k.sub.r] increases. A small [k.sub.r] is needed to compute the preconditioner efficiently. With the increase of [k.sub.r], the time to compute the preconditioner also increases. Since only the Schur complements in the first [k.sub.r] steps are computed, the time to compute the preconditioner increases less than linearly while halving the mesh size h. This is illustrated by the "preconditioning" columns in Tables 5.1-5.3. Moreover, by choosing proper [k.sub.r], the block MSSS preconditioner also gives virtually mesh size independent convergence and regularization parameter almost independent convergence while the computational complexity is less than linear, which can also be observed from Tables 5.1-5.3.

5.2. Global preconditioners. In the previous subsection, we studied the performance of the block MSSS preconditioner for the optimal control of 3D Poisson equation. Here we extend the technique to bound the 2-level semiseparable order of the Schur complements for the symmetric indefinite system. While the analysis in [14] holds only for the symmetric positive definite matrix case, our numerical experiments illustrate that the fast convergence property of the Schur complements also holds for the symmetric indefinite case.

The saddle-point system obtained by using the discretize-then-optimize approach has the following form,

(5.4) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where K is the stiffness matrix as given in (5.2), and M is the mass matrix. Since all these matrices are from the discretization of 3D PDEs, K and M have the same 3-level SSS structure as shown in (5.2). Here we can apply Lemma 3.5 again to permute the global saddle-point matrix A (5.4) into a global MSSS matrix A. The permuted global saddle-point matrix [bar.A] has the same MSSS structure as subblocks of A, i.e.,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [bar.D] and [bar.L] are obtained via Lemma 3.5. Since [bar.A] is a symmetric indefinite matrix, its Schur complements in computing the LU factorization are also indefinite. To compute an LU factorization, the Schur complements are computed via the following recursions,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Due to the indefiniteness of [bar.D], the Schur complements are also indefinite. We apply the same method as introduced in the previous section: we compute the Schur complements of the first [k.sub.r] steps and use the Schur complement at step [k.sub.r] to approximate the Schur complements afterwards. This gives,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

By using this approximation for the permuted global system, we can compute the global MSSS preconditioner and apply it to iteratively solve the saddle-point system. The computational results are given in Tables 5.4-5.6.

Since we just compute the first few steps of the Schur complements, the computational complexity of constructing the global MSSS preconditioner is less than linear, which can be observed from the "preconditioning" columns for the same [k.sub.r] in Tables 5.4-5.6. The number of iterations decreases as [k.sub.r] increases for the same [beta] and h. By using a small [k.sub.r], we have already reduced the number of iterations significantly by the global MSSS preconditioner compared with the block-diagonal MSSS preconditioner. Moreover, the global MSSS preconditioner gives virtually both mesh size h and regularization parameter [beta] independent convergence for properly chosen [k.sub.r].

REMARK 5.1. Compared with the results for the block-diagonal MSSS preconditioner in Tables 5.1-5.3, the global MSSS preconditioner reduces the number of iterations significantly. Even though more time is spent in computing the global MSSS preconditioner for the same group of numerical experiment, the time to iteratively solve the preconditioned system is much reduced due to the fact that fewer iterations are needed. Moreover, the total computation time for the global MSSS preconditioner is less than that for the block MSSS preconditioner.

REMARK 5.2. Since there is no efficient model order reduction to reduce the 2-level semiseparable order, the 2-level semiseparable order continues growing before [k.sub.r] is reached. It is shown in Tables 5.4-5.6, when [k.sub.r] changes from 3 to 4 for h = [2.sup.-6], the time to compute the global MSSS preconditioner increases dramatically. This is due to the fact that when [k.sub.r] changes from 3 to 4, the 2-level semiseparable order is not bounded by a small number, but by a moderate constant. However, the computational complexity increases only slightly more than linear when h changes from [2.sup.-5] to [2.sup.-6] for [k.sub.r] = 4. Meanwhile, the global MSSS preconditioner already gives satisfactory performance by choosing [k.sub.r] = 3 for h = [2.sup.-6].

6. Conclusions. In this paper, we have studied the multilevel sequentially semiseparable (MSSS) preconditioners for saddle-point systems that arise from the PDE-constrained optimization problems. By exploiting the MSSS structure of the blocks of the saddle-point system, we are able to construct the preconditioners and solve the preconditioned system in linear computational complexity for 2D problems and almost linear computational complexity for 3D problems. To reduce the computational complexity of computing the preconditioners, we have proposed a new model order reduction algorithm based on the approximate balanced truncation for SSS matrices. We evaluated the performance of the new model order reduction algorithm by comparing it with the standard model order reduction algorithm (the Hankel blocks approximation). Numerical experiments illustrate that our model order reduction algorithm is computationally cheaper than the standard method. Besides, it shows that for the optimal control of 2D PDEs, the global preconditioner reduced the number of iterations significantly compared with the block preconditioners. Both preconditioners give mesh size independent convergence and have linear computational complexity. Moreover, the global MSSS preconditioner yields regularization parameter independent convergence while the block MSSS preconditioner does not have this property.

For PDE-constrained optimization problem in 3D, since efficient model order reduction algorithm for 2- or higher- level SSS matrices is still an open problem, we apply an alternative approach to bound the 2-level semiseparable order. Numerical experiments also illustrate that the global MSSS preconditioner gives virtually both mesh size and regularization parameter independent convergence while the block MSSS preconditioner just yields mesh size almost independent convergence.

The next step of this research will be focused on applying this preconditioning technique to the optimal control of the Navier-Stokes equation. This has a wide range of applications such as control a wind farm to optimize the output power.

Acknowledgments. The authors would like to thank the anonymous referees who helped to improve the quality of this paper.

REFERENCES

[1] G. S. ABDOULAEV, K. REN, AND A. H. HIELSCHER, Optical tomography as a PDE-constrained optimization problem, Inverse Problems, 21 (2005), pp. 1507-1530.

[2] A. C. ANTOULAS, D. C. SORENSEN, AND Y. ZHOU, On the decay rate of Hankel singular values and related issues, Systems Control Lett., 46 (2002), pp. 323-342.

[3] P. BENNER, Solving large-scale control problems, IEEE Control Syst. Mag., 24 (2004), pp. 44-59.

[4] M. BENZI, G. H. GOLUB, AND J. LIESEN, Numerical solution of saddle point problems, Acta Numer., 14 (2005), pp. 1-137.

[5] L. T. BIEGLER, O. GHATTAS, M. HEINKENSCHLOSS, D. KEYES, AND B. VAN BLOEMEN WAANDERS, eds., Real-Time PDE-Constrained Optimization, SIAM, Philadelphia, 2007.

[6] G. BIROS AND O. GHATTAS, Parallel Lagrange--Newton-Krylov-Schur methods for PDE-constrained optimization, part I: the Krylov-Schur solver, SIAM J. Sci. Comput., 27 (2005), pp. 687-713.

[7]--, Parallel Lagrange-Newton-Krylov-Schur methods for PDE-constrained optimization, part II: the Lagrange-Newton solver and its application to optimal control of steady viscous flows, SIAM J. Sci. Comput., 27 (2005), pp. 714-739.

[8] Y. CHAHLAOUI, Low-rank Approximation and Model Reduction, PhD. Thesis, Ecole Polytechnique de Louvain, Universite catholique de Louvain, Louvain-la-Neuve, 2003.

[9]--, Two efficient SVD/Krylov algorithms for model order reduction of large scale systems, Electron. Trans. Numer. Anal., 38 (2011), pp. 113-145. http://etna.mcs.kent.edu/volumes/2011-2020/vol38/abstract.php?vol=38&pages=113-145

[10] Y. CHAHLAOUI AND P. VAN DOOREN, Estimating Gramians of large-scale time- varying systems, in Proceedings of the 15th IFAC World Congress, 2002, L. Basanez and J. A. de la Puente, eds., IFAC, New York, 2002, pp. 540-545.

[11]--, Model reduction of time-varying systems, in Dimension Reduction of Large-Scale Systems, P. Benner, D. Sorensen, and V. Mehrmann, eds., vol. 45 of Lect. Notes Comput. Sci. Eng., Springer, Berlin, 2005, pp. 131-148.

[12] S. CHANDRASEKARAN, P. DEWILDE, M. GU, T. PALS, X. SUN, A.-J. VAN DER VEEN, AND D. WHITE, Some fast algorithms for sequentially semiseparable representations, SIAM J. Matrix Anal. Appl., 27 (2005), pp. 341-364.

[13] S. CHANDRASEKARAN, P. DEWILDE, M. GU, T. PALS, AND A. VAN DER VEEN, Fast stable solvers for sequentially semi-separable linear systems of equations, Tech. Report, Lawrence Livermore National Laboratory, Livermore, 2003.

[14] S. CHANDRASEKARAN, P. DEWILDE, M. GU, AND N. SOMASUNDERAM, On the numerical rank of the off-diagonal blocks of Schur complements of discretized elliptic PDEs, SIAM J. Matrix Anal. Appl., 31 (2010), pp. 2261-2290.

[15] P. DEWILDE, H. JIAO, AND S. CHANDRASEKARAN, Model reduction in symbolically semi-separable systems with application to preconditioners for 3D sparse systems of equations, in Characteristic Functions, Scattering Functions and Transfer Functions, D. Alpay and V. Vinnikov, eds., vol. 197 of Operator Theory: Advances and Applications, Birkhaser, Basel, 2010, pp. 99-132.

[16] P. DEWILDE AND A.-J. VAN DER VEEN, Time-Varying Systems and Computations, Kluwer, Boston, 1998.

[17] Y. EIDELMAN AND I. GOHBERG, On a new class of structured matrices, Integral Equations Operator Theory, 34 (1999), pp. 293-324.

[18]--, A modification of the Dewilde-van der Veen method for inversion of finite structured matrices, Linear Algebra Appl., 343/344 (2002), pp. 419-450.

[19]--, On generators of quasiseparable finite block matrices, Calcolo, 42 (2005), pp. 187-214.

[20] Y. EIDELMAN, I. GOHBERG, AND V. OLSHEVSKY, The QR iteration method for Hermitian quasiseparable matrices of an arbitrary order, Linear Algebra Appl., 404 (2005), pp. 305-324.

[21] G. H. GOLUB AND C. F. VAN LOAN, Matrix Computations, 3rd ed., Johns Hopkins University Press, Baltimore, 1996.

[22] J. GONDZIO AND P. ZHLOBICH, Multilevel quasiseparable matrices in PDE-constrained optimization. Preprint on arXiv, http://arxiv.org/abs/1112.6018, 2011.

[23] R. A. Gonzales, J. Eisert, I. Koltracht, M. Neumann, and G. Rawitscher, Integral equation method for the continuous spectrum radial Schrodinger equation, J. Comput. Phys., 134 (1997), pp. 134149.

[24] L. GREENGARD AND V. ROKHLIN, On the numerical solution of two-point boundary value problems, Comm. Pure Appl. Math., 44 (1991), pp. 419-452.

[25] A. KAVCIC AND J. M. F. MOURA, Matrices with banded inverses: inversion algorithms and factorization of Gauss-Markov processes, IEEE Trans. Inform. Theory, 46 (2000), pp. 1495-1509.

[26] Y. NOTAY, A new analysis of block preconditioners for saddle point problems, SIAM J. Matrix Anal. Appl., 35 (2014), pp. 143-173.

[27] C. C. PAIGE and M. A. SAUNDERS, Solutions of sparse indefinite systems of linear equations, SIAM J. Numer. Anal., 12 (1975), pp. 617-629.

[28] Y. QIU, M. B. VAN GIJZEN, J.-W. VAN WINGERDEN, AND M. VERHAEGEN, A class of efficient-preconditioners with multilevel sequentially semiseparable matrix structure, in Proceedings of the ICNAAM 2013, T. Simos, G. Psihoyios, and Ch. Tsitouras, eds., AIP Conf. Proc., 1558, AIP Publishing, Melville, 2013, pp. 2253-2256.

[29]--, On the application of a novel model order reduction algorithm for sequentially semi-separable matrices to the identification of one-dimensional distributed systems, in Proceedings of the 13th European Control Conference, 2014, pp. 2750-2755. Available through IEEE Xplore digital library at http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6851788

[30] Y. QIU, M. B. VAN GIJZEN, J.-W. VAN WINGERDEN, M. VERHAEGEN, AND C. VUIK, Efficient preconditioners for PDE-constrained optimization problems with a multilevel sequentially semiseparable matrix structure, Tech. Report 13-04, Delft Institution of Applied Mathematics, Delft University of Technology, Delft, 2013.

[31]--, Evaluation of multilevel sequentially semiseparable preconditioners on CFD benchmark problems using incompressible flow and iterative solver software, Math. Methods Appl. Sci., in press, 2015, doi: 10.1002/mma.3416.

[32] T. REES, Preconditioning Iterative Methods for PDE-Constrained Optimization, PhD. Thesis, New College, University of Oxford, Oxford, 2010.

[33] T. REES, H. S. DOLLAR, and A. J. Wathen, Optimal solvers for PDE-constrained optimization, SIAM J. Sci. Comput., 32 (2010), pp. 271-298.

[34] J. RICE, Efficient Algorithms for Distributed Control: a Structured Matrix Approach, PhD. Thesis, Delft University of Technology, Delft, 2010.

[35] J. K. RICE and M. VERHAEGEN, Distributed control: a sequentially semi-separable approach for spatially heterogeneous linear systems, IEEE Trans. Automat. Control, 54 (2009), pp. 1270-1283.

[36] H. SANDBERG AND A. RANTZER, Balanced truncation of linear time-varying systems, IEEE Trans. Automat. Control, 49 (2004), pp. 217-229.

[37] D. SILVESTER, H. ELMAN, AND A. RAMAGE, Incompressible Flow and Iterative Solver Software (IFISS) version 3.2, May 2012. http://www.maths.manchester.ac.uk/~djs/ifiss/

[38] P. SONNEVELD AND M. B. VAN GIJZEN, IDR(s): A family of simple and fast algorithms for solving large nonsymmetric systems of linear equations, SIAM J. Sci. Comput., 31 (2008), pp. 1035-1062.

[39] M. VAN BAREL, D. FASINO, L. GEMIGNANI, AND N. MASTRONARDI, Orthogonal rational functions and diagonal-plus-semiseparable matrices, in International Symposium on Optical Science and Technology, F. Luk, ed., Proceedings of SPIE 4791, SPIE, Bellingham, 2002, pp. 162-170.

[40] A.-J. VAN DER VEEN, Time-Varying System Theory and Computational Modeling: Realization, Approximation, and Factorization, PhD. Thesis, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, 1993.

[41] M. B. VAN GIJZEN and P. SONNEVELD, Algorithm 913: An elegant IDR(s) variant that efficiently exploits biorthogonality properties, ACM Trans. Math. Software, 38 (2011), Art. 5 (19 pages).

[42] R. VANDEBRIL, M. VAN BAREL, AND N. MASTRONARDI, Matrix Computations and Semiseparable Matrices: LinearSystems, Johns Hopkins University Press, Baltimore, 2007.

[43] D. ZIGIC, L. T. WATSON, AND C. BEATTIE, Contragredient transformations applied to the optimal projection equations, LinearAlgebraAppl., 188/189 (1993), pp. 665-676.

YUE QIU ([dagger]), MARTIN B. VAN GIJZEN ([double dagger]), JAN-WILLEM VAN WINGERDEN ([dagger]), MICHEL VERHAEGEN ([dagger]), AND CORNELIS VUIK ([double dagger])

* Received March 28, 2014. Accepted March 18, 2015. Published online on June 25, 2015. Recommended by Raf Vandebril. This research is supported by the NWO Veni Grant # 11930 "Reconfigurable Floating Wind Farms".

([dagger]) Delft Center for Systems and Control, Delft University of Technology, 2628 CD Delft, the Netherlands (Y.Qiu@gmx.us, {j.w.vanwingerden, m.verhaegen}@tudelft.nl).

([double dagger]) Delft Institute of Applied Mathematics, Delft University of Technology, 2628 CD Delft, the Netherlands ({M.B.vanGijzen, c.vuik}@tudelft.nl).

TABLE 3.1
Generator sizes for the SSS matrix A in Definition 3.1.

Generators  [U.sub.i]               [W.sub.i]
Sizes       [m.sub.i] x [k.sub.i]   [k.sub.i-1] x [k.sub.i]

Generators  [V.sub.i]                [D.sub.i]
Sizes       [m.sub.i] x [k.sub.i-1]  [m.sub.i-1] x [m.sub.i]

Generators  [P.sub.i]                [R.sub.i]
Sizes       [m.sub.i] x [l.sub.i-1]  [l.sub.i-1] x [l.sub.i]

Generators  [Q.sub.i]
Sizes      [m.sub.i] x [l.sub.i + 1]

Table 4.1
Results for approximate balanced, truncation for
[beta] = [10.sup.-1], [epsilon] = [10.sup.-1].

problem size     iterations    preconditioning    MINRES     total
                                    (sec.)        (sec.)    (sec.)

3.07e+03 (4)         10              0.43          0.88      1.31
1.23e+04 (6)         10              1.79          2.07      3.86
4.92e+04 (6)         10              4.11          5.95      10.06
1.97e+05 (7)         10             17.05          22.09     39.14

Table 4.2
Results for Hankel blocks approximation for [beta] =
[10.sup.-1], [epsilon] = [10.sup.-1].

problem size    iterations    preconditioning     MINRES     total
                                   (sec.)         (sec.)    (sec.)

3.07e+03 (4)        10              0.69           1.32      2.01
1.23e+04 (6)        10              2.59           2.38      4.97
4.92e+04 (6)        10              6.14           5.94      12.08
1.97e+05 (7)        10              26.11          21.59     47.70

Table 4.3
Results for approximate balanced truncation for [beta] =
[10.sup.-1], [epsilon] = [10.sup.-2].

problem size    iterations    preconditioning    MINRES     total
                                   (sec.)        (sec.)    (sec.)

3.07e+03 (3)        16              0.29          1.46      1.75
1.23e+04 (4)        14              0.96          3.01      3.97
4.92e+04 (4)        14              2.49          8.17      10.66
1.97e+05 (5)        14              9.43          29.57     39.00

Table 4.4
Results for Hankel blocks approximation for [beta] = [10.sup.-1],
[epsilon] = [10.sup.-2].

problem size    iterations    preconditioning    MINRES     total
                                   (sec.)        (sec.)    (sec.)

3.07e+03 (3)        16              0.46          1.48      1.94
1.23e+04 (4)        14              1.40          2.98      4.38
4.92e+04 (4)        14              4.85          7.99      12.84
1.97e+05 (5)        14             20.48          28.24     48.72

Table 4.5
Results for approximate balanced truncation for [beta] =
[10.sup.-2], [epsilon] = [10.sup.-1].

problem size    iterations    preconditioning    MINRES     total
                                   (sec.)        (sec.)    (sec.)

3.07e+03 (3)        18              0.28          1.59      1.87
1.23e+04 (3)        18              0.85          4.02      4.87
4.92e+04 (3)        18              2.26          10.79     13.05
1.97e+05 (5)        18              9.67          35.32     44.99

Table 4.6
Results for Hankel blocks approximation for [beta] =
[10.sup.-2], [epsilon] = [10.sup.-1].

problem size    iterations    preconditioning    MINRES     total
                                   (sec.)        (sec.)    (sec.)

3.07e+03 (3)        18              0.47          1.65      2.12
1.23e+04 (3)        18              1.28          3.95      5.23
4.92e+04 (3)        18              4.41          10.38     14.79
1.97e+05 (5)        18             21.14          35.12     56.26

Table 4.7
Results for approximate balanced, truncation for [beta] =
[10.sup.-2], [epsilon] = [10.sup.-2].

problem size    iterations    preconditioning    MINRES     total
                                   (sec.)        (sec.)    (sec.)

3.07e+03 (3)        30              0.32          2.54      2.86
1.23e+04 (3)        30              0.81          6.04      6.85
4.92e+04 (3)        30              2.28          17.79     20.07
1.97e+05 (5)        30              9.42          58.01     67.43

Table 4.8
Results for Hankel blocks approximation for [beta] =
[10.sup.-2], [epsilon] = [10.sup.-2].

problem size    iterations    preconditioning    MINRES     total
                                   (sec.)        (sec.)    (sec.)

3.07e+03 (3)        30              0.49          2.62      3.11
1.23e+04 (3)        30              1.42          6.08      7.50
4.92e+04 (3)        30              4.46          17.43     21.89
1.97e+05 (5)        30             20.39          57.32     77.71

Table 4.9
Results for the block-diagonal MSSS preconditioner (2.5)
for [beta] = [10.sup.-3], [epsilon] = [10.sup.-1].

problem size   iterations   preconditioning (sec.)

3.07e+03 (3)       34                0.43
1.23e+04 (3)       34                1.31
4.92e+04 (3)       34                4.26
1.97e+05 (5)       34               17.39

problem size   MINRES (sec.)   total (sec.)

3.07e+03 (3)       2.91            3.34
1.23e+04 (3)       7.61            8.92
4.92e+04 (3)       19.83          24.09
1.97e+05 (5)       61.82          79.21

Table 4.10
Results for the block-diagonal MSSS preconditioner (2.5)
for [beta] = [10.sup.-4], [epsilon] = [10.sup.-1].

                              preconditioning
problem size    iterations         (sec.)

3.07e+03 (3)        82              0.45
1.23e+04 (3)        82              1.31
4.92e+04 (3)        80              4.34
1.97e+05 (5)        80             17.89

problem size    MINRES (sec.)    total (sec.)

3.07e+03 (3)         4.91            5.36
1.23e+04 (3)        11.91           13.22
4.92e+04 (3)        34.83           39.17
1.97e+05 (5)        133.28          141.17

Table 4.11
Results for the block-diagonal MSSS preconditioner (4.1)
for [beta] = [10.sup.-4], [epsilon] = [10.sup.-1].

problem size    iterations   preconditioning (sec.)

3.07e+03 (5)       100                0.35
1.23e+04 (5)       100                1.17
4.92e+04 (5)       100                4.19
1.97e+05 (5)       100               15.72

problem size    MINRES (sec.)    convergence

3.07e+03 (5)        6.73        no convergence
1.23e+04 (5)        17.97       no convergence
4.92e+04 (5)        44.93       no convergence
1.97e+05 (5)       156.89       no convergence

Table 4.12
Results for the global MSSS preconditioner for
[beta] = [10.sup.-3] and [epsilon] = [10.sup.-1].

problem size    iterations   preconditioning (sec.)

3.07e+03 (4)        2                 0.38
1.23e+04 (6)        2                 1.16
4.92e+04 (8)        2                 4.46
1.97e+05 (10)       2                18.29

problem size    IDR(4) (sec.)   total (sec.)

3.07e+03 (4)        0.13            0.51
1.23e+04 (6)        0.24            1.40
4.92e+04 (8)        0.66            5.12
1.97e+05 (10)       2.21           20.50

Table 4.13
Results for the global MSSS preconditioner for
[beta] = [10.sup.-4] and [epsilon] = [10.sup.-1].

problem size    iterations   preconditioning (sec.)

3.07e+03 (4)        2                 0.38
1.23e+04 (6)        2                 1.15
4.92e+04 (7)        2                 4.23
1.97e+05 (9)        2                17.87

problem size    IDR(4) (sec.)   total (sec.)

3.07e+03 (4)        0.13            0.51
1.23e+04 (6)        0.24            1.39
4.92e+04 (7)        0.64            4.87
1.97e+05 (9)        2.18           20.05

Table 5.1
Block MSSS preconditioner for optimal control of
3D Poisson equation with [beta] = [10.sup.-1].

problem                             preconditioning
size           h        [k.sub.r]       (sec.)        iterations

1.54e+03   [2.sup.-3]       1            1.59             16
                            2            2.76             10
                            3            4.20              6
                            4            5.68              6

1.23e+04   [2.sup.-4]       1            3.35             30
                            2            6.47             18
                            3            9.88             12
                            4           13.36             10

9.83e+04   [2.sup.-5]       2           14.47             38
                            3           22.95             24
                            4           33.51             18
                            5           42.83             14

7.86e+05   [2.sup.-6]       7          215.42             20
                            8          315.62             18

problem
size           h        [k.sub.r]   MINRES (sec.)   total (sec.)

1.54e+03   [2.sup.-3]       1           17.41           19.01
                            2           11.09           13.85
                            3            7.08           11.28
                            4            7.15           12.82

1.23e+04   [2.sup.-4]       1          139.81          143.16
                            2           86.77           93.24
                            3           59.30           69.18
                            4           50.42           63.77

9.83e+04   [2.sup.-5]       2          761.27          775.75
                            3          503.24          526.18
                            4          397.82          431.33
                            5          321.34          364.17

7.86e+05   [2.sup.-6]       7         2156.24         2371.66
                            8         2024.42         2340.04

Table 5.2
Block MSSS preconditioner for optimal control of
3D Poisson equation with [beta] = [10.sup.-2].

problem                       preconditioning
size            h        kr       (sec.)        iterations

1.54e+03   [2.sup.-3]    1          1.48            14
                         2          2.93             8
                         3          4.29             8
                         4          6.07             6

1.23e+04   [2.sup.-4]    1          3.56            30
                         2          7.26            16
                         3         10.59            12
                         4         13.36             8

9.83e+04   [2.sup.-5]    2         15.86            36
                         3         27.34            24
                         4         35.72            18
                         5         50.33            14

7.86e+05   [2.sup.-6]    7        216.87            20
                         8        314.44            18

problem
size            h        kr   MINRES (sec.)   total (sec.)

1.54e+03   [2.sup.-3]    1        15.49           16.97
                         2         9.31           12.24
                         3         9.22           13.51
                         4         7.17           13.24

1.23e+04   [2.sup.-4]    1       141.86          145.42
                         2        86.04           86.04
                         3        59.85           70.44
                         4        42.63           56.82

9.83e+04   [2.sup.-5]    2       726.65          742.51
                         3       504.29          531.63
                         4       408.10          443.82
                         5       356.48          406.80

7.86e+05   [2.sup.-6]    7      2154.61         2371.48
                         8      2050.43         2364.87

Table 5.3
Block MSSS preconditioner for optimal control
of 3D Poisson equation with [beta] = [10.sup.-3].

problem size        h        [k.sub.r]   preconditioning
                                             (sec.)

                                 2            2.90
1.54e+03        [2.sup.-3]       3            4.44
                                 4            6.13
                                 5            7.68

                                 2            6.80
1.23e+04        [2.sup.-4]       3            13.04
                                 4            20.34
                                 5            17.22

                                 2            14.52
9.83e+04        [2.sup.-5]       3            22.43
                                 4            30.73
                                 5            40.11

                                 6           183.13
7.86e+05        [2.sup.-6]       7           214.31
8 315.58

problem size    iterations   MINRES (sec.)   total (sec.)

                    14           15.36          19.01
1.54e+03            14           15.60          13.85
                    12           13.61          11.28
                    12           13.39          12.82

                    14           70.27          143.16
1.23e+04            10           53.51          93.24
                    10           53.79          69.18
                    10           52.10          63.77

                    32          647.86          775.75
9.83e+04            22          459.30          526.18
                    16          347.96          431.33
                    12          273.07          364.17

                    22          2880.90        3064.03
7.86e+05            20          2419.73        2634.04
                    16          1843.61        2159.19

Table 5.4
Global MSSS preconditioner for the optimal
control of 3D Poisson equation for [beta] = [10.sup.-1].

problem size       h        [k.sup.r]   iterations

                                1           3
1.54e+03       [2.sup.-3]       2           3
                                3           2

                                2           6
1.23e+04       [2.sup.-4]       3           4
                                4           3

                                2           8
9.83e+04       [2.sup.-5]       3           7
                                4           6

                                2           14
7.86e+05       [2.sup.-6]       3           9
4 8

problem size   preconditioning   IDR(4)     total
                   (sec.)        (sec.)    (sec.)

                    3.84          1.96      5.79
1.54e+03            4.95          1.59      6.53
                    7.70          1.08      8.78

                    13.37         15.19     28.56
1.23e+04            22.39         10.79     33.17
                    33.67         8.75      42.42

                    41.04        106.14    147.18
9.83e+04            78.87        109.18    188.05
                   143.04        109.26    252.31

                   153.60        1174.12   1327.72
7.86e+05           245.24        1101.88   1347.12
                   1152.20       1841.57   2993.78

Table 5.5
Global MSSS preconditioner for the optimal control
of 3D Poisson equation for [beta] = [10.sup.-2].

problem size       h        [k.sub.r]   iterations

                                1           4
1.54e+03       [2.sup.-3]       2           3
                                3           2

                                2           7
1.23e+04       [2.sup.-4]       3           4
                                4           3

                                2           8
9.83e+04       [2.sup.-5]       3           6
                                4           4

                                2           14
7.86e+05       [2.sup.-6]       3           9
                                4           8

problem size   preconditioning   IDR(4)     total
                   (sec.)        (sec.)    (sec.)

                    3.39          2.69      6.09
1.54e+03            4.92          1.61      6.53
                    8.13          1.09      9.22

                    13.41         17.98     31.40
1.23e+04            22.39         10.78     33.17
                    34.16         8.80      42.95

                    38.71        103.94    142.65
9.83e+04            77.30        111.30    188.61
                   155.59        103.77    259.36

                   209.47        1362.70   1572.17
7.86e+05           290.69        1132.86   1423.55
                   1181.81       2277.18   3458.99

Table 5.6
Global MSSS preconditioner for the optimal control
of 3D Poisson equation for [beta] = [10.sup.-3].

problem size       h        [k.sub.r]   iterations

                                1           9
1.54e+03       [2.sup.-3]       2           4
                                3           3

                                2           6
1.23e+04       [2.sup.-4]       3           4
                                4           4

                                2           8
9.83e+04       [2.sup.-5]       3           7
                                4           4

                                2           16
7.86e+05       [2.sup.-6]       3           9
                                4           8

problem size   preconditioning   IDR(4)     total
                   (sec.)        (sec.)    (sec.)

                    2.63          5.26      7.89
1.54e+03            5.30          2.72      8.03
                    6.32          1.64      7.96

                    10.54         15.25     25.79
1.23e+04            19.41         14.26     33.68
                    31.65         17.67     49.32

                    35.08        104.76    139.84
9.83e+04            78.38        108.77    187.15
                   134.06         93.27    227.44

                   162.84        1594.91   1757.75
7.86e+05           322.00        1328.26   1650.26
                   1503.76       2218.80   3722.56
COPYRIGHT 2015 Institute of Computational Mathematics
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2015 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:partial differential equations
Author:Qiu, Yue; Van Gijzen, Martin B.; Van Wingerden, Jan-Willem; Verhaegen, Michel; Vuik, Cornelis
Publication:Electronic Transactions on Numerical Analysis
Article Type:Report
Date:Jan 1, 2015
Words:16879
Previous Article:The fast bisection eigenvalue method for Hermitian order one quasiseparable matrices and computations of norms.
Next Article:An overview of multilevel methods with aggressive coarsening and massive polynomial smoothing.
Topics:

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters