# Sparse Signal Inversion with Impulsive Noise by Dual Spectral Projected Gradient Method.

1. Introduction

In the present manuscript we are concerned with ill-posed linear operator equation:

Ax = y, (1)

where x is sparse with respect to an orthonormal basis and A : D(A) [subset] X [right arrow] Y is a bounded linear operator. In practice, exact data y are not known precisely, but only an approximation [y.sup.[delta]] with

[parallel]y - [y.sup.[delta]][parallel] [less than or equal to] [delta] (2)

is available. We call [y.sup.[delta]] the noisy data and [delta] the noise level. It is well known that the conventional method for solving (1) is sparsity regularization, which provides an efficient way to extract the essential features of sparse solutions compared with oversmoothed classical Tikhonov regularization.

In the past ten years, sparsity regularization has certainly become an important concept in inverse problems. The theory of sparse recovery has largely been driven by the needs of applications in compressed sensing [1, 2], bioluminescence tomography [3], seismic tomography [4], parameter identification [5], and so forth. For accounts of the regularizing properties and computational techniques in sparsity regularization we refer the reader to [5-7] and the references given there. In general, sparsity regularization is given by

[mathematical expression not reproducible], (3)

where [[parallel]x[parallel].sup.p.sub.w,p] = [[summation].sub.[gamma]] [[omega].sub.[gamma]] [[absolute value of {<[[phi].sub.[gamma]], x>].sup.p] (1 [less than or equal to] p [less than or equal to] 2), [alpha] is the regularization parameter balancing the fidelity [mathematical expression not reproducible] and regularization term [[parallel]x[parallel].sup.p.sub.w,p]. The functional in (3) is not convex if p < 1, it is challenging to investigate the regularizing properties and numerical computing method of minimizers. Limited work has been done for p < 1; we refer the reader to [8-11] for a recent account of the theory. In this paper, we will focus our main attention on the situation of p = 1.

The aim of this paper is to consider a regularization functional of the form

[mathematical expression not reproducible]. (4)

We call (4) [l.sup.1] + [l.sup.1] problem. A main motivation to investigate the [l.sup.1] + [l.sup.1] problem is that noisy data [y.sup.[delta]] often contain impulsive noise. For Gaussian noise, [l.sup.2] fidelity is a natural choice. However, a typical nondifferentiable fidelity used in application involving impulsive noise is the [l.sup.1] fidelity, which is more robust than [l.sup.2] fidelity [12].

[l.sup.1] fidelity technique was motivated by [13] for signal processing and later had attracted considerable attention in image processing, especially in image denoising. In image processing, [l.sup.1] fidelity typically combines with TV regularization term. For details of [l.sup.1] + TV problems, we refer the reader to [14-17]. Nowadays [l.sup.1] fidelity has received growing interest in the inverse problems where solutions are sparse with respect to an orthonormal basis. Minimizers of cost functions involving [l.sup.1] fidelity combined with sparsity regularization have been studied. We refer the reader to [18-23] and the references given there.

For [l.sup.1] fidelity, regularizing properties must be handled specifically. Burger and Osher [24] proved convergence rate when the regularization parameter is chosen arbitrarily fixed but sufficiently small; the authors call this phenomenon exact penalization. The most curious situation is that, in [25], the optimal convergence rate requires that regularization parameter must be equal to a fixed value. In [26], Grasmair et al. proved that source condition together with finite basis injectivity (FBI) condition is weaker than the restricted isometry property and obtained linear convergence 0([delta]). Flemming and Hegland discussed convergence rates in [l.sup.1]-regularization when the basis is not smooth enough [27]. Konig et al. obtained high order polynomial rates of convergence in the special corrupted domain, even though the underlying problem is exponentially ill-posed in the classical sense [28].

Though [l.sup.1] fidelity is robust, more researchers prefer to use [l.sup.2] fidelity because of its differentiability. Hence a key issue for the [l.sup.1] fidelity is the numerical computing methods. In the past few years, numerous algorithms have been systematically proposed for the [l.sup.1] + TV problems. On the other hand, in spite of growing interests in the [l.sup.1] fidelity, we can indicate limited work has been done for numerical methods of [l.sup.1] + [l.sup.1] problems. For sparsity regularization, the popular algorithms, for example, homotopy (LARS) method [29], iteratively reweighted least squares (IRLS) method [30], and iterative thresholding algorithm [31, 32], cannot be directly applied to [l.sup.1] + [l.sup.1] problem due to the fact that both fidelity and regularization term lack differentiability. There are only a few papers, in which numerical algorithms for [l.sup.1] + [l.sup.1] problems have been discussed systematically. References [20, 22] proposed the new model by statistics method. Reference [21] discussed the prior sparse representation and the data-fidelity term and proposed the two-phase approach. In [23], authors propose a robust bisparsity model (RBSM) to effectively exploit the prior knowledge about the similarities and the distinctions of signals. However, the researchers often devoted them to compressive sensing problem, where random matrices are well-conditioned. For ill-conditioned problems, these methods are often unstable [33] [Chap. 5]. Moreover, the researchers assume that the solution is sparse itself, which is different from the general assumption that the solution is sparse with respect to an orthonormal basis. In [19], Yang and Zhang proposed a Primal Dual-Interior Point Methods (PD-IPM) for EIT problem, which is efficient at dealing with the nondifferentiability. However, they did not give the convergence proof. Yang and Zhang reformulated the [l.sup.1] + [l.sup.1] problem into the basis pursuit model which can be solved effectively by ADM method [18]. It is a competitive method compared with other algorithms for compressive sensing. In [34], Xiao et al. applied ADM method to [l.sup.1] + [l.sup.1] problem directly. Numerical results illustrated that the proposed algorithm performs better than Yall1 [18].

In this paper, we investigate regularizing properties of [l.sup.1] + [l.sup.1] problems. The convergence rates of [mathematical expression not reproducible] can be shown to be 0([[delta].sup.1-[epsilon]]) and 0([delta]) by a priori and a posteriori choice with the additional source condition and FBI condition. As mentioned above, dual is a conventional technique to solve the Tikhonov regularization with [l.sup.1] fidelity. However, there are some limitations to this approach to [l.sup.1] + [l.sup.1] problems due to the fact that it is difficult to obtain the dual formulation of [l.sup.1] + [l.sup.1] problems. Inspired by [14] and multiregularization theory [35-37], a smooth [l.sup.2] term is added to original functional of regularization. The dual problem of this new cost function is reduced to a constraint smooth functional. Moreover, the smooth [l.sup.2] regularization term can improve the stability. The conventional method to solve the constraint smooth functional is projected gradient algorithm. We use spectral projected gradient (SPG) method to seek for the minimizers of the constraint functional and prove the convergence.

An outline of this paper is as follows. We devote Section 2 to a discussion of regularizing properties, including well-posedness and convergence rate. The convergence rate for a priori and a posteriori choice is established. In Section 3, inspired by the theory of multiregularization, we construct a new functional which is equal to a constraint smooth functional according to Frechel duality. In Section 4, the spectral projected gradient method is applied to compute the minimizers. Section 5 provides a detailed exposition of multiparameter choice rule based on the balancing principle. Numerical experiments involving compressed sensing and image inpainting are presented in Section 6, showing that our proposed approaches are robust and efficient.

2. Regularization Properties

2.1. Notation and Assumptions. For the approximate solutions of Ax = y, we consider the minimization of the regularization functional

[mathematical expression not reproducible], (5)

where R(x) := [[summation].sub.y] [[omega].sub.y] [absolute value of ([[phi].sub.y], x)] and the subdifferential of R(x) at x is denoted by [partial derivative]R(x) [subset] X. All along this paper, X denotes a Hilbert space which is a subspace of [l.sup.2] space and <*, *> denotes the inner product; Y denotes a Banach space which is a subspace of [l.sup.1] space. A : dom(A) [??] X [right arrow] Y is a bounded linear operator and dom(A) [intersection] dom(R) [not equal to] [??]. [([[phi].sub.[gamma]]).sub.[gamma][member of][LAMBDA]] [subset] X is an orthonormal basis where [LAMBDA] is some countable index set. From now, we denote

[mathematical expression not reproducible] (6)

In order to prove convergence rate results we denote by [x.sup.[delta].sub.[alpha]] the minimizer of the regularization functional [J.sub.[alpha],[delta]] (x) for every [alpha] > 0 and use the following definition of R(x)-minimum norm solution.

Definition 1. An element [x.sup.[dagger]] is called a P(x)-minimum norm solution of linear problem Ax = y if

A[x.sup.[dagger]] = y, R ([x.sup.[dagger]]) = min {R (x) | Kx = y}. (7)

We define the sparsity as follows.

Definition 2. x [member of] X is sparse with respect to [{[[phi].sub.[gamma]]}.sub.[gamma][member of][LAMBDA]] in the sense that supp(x) == {[gamma] [member of] [LAMBDA]: <[[phi].sub.y], x> [not equal to] 0} is finite. If [[parallel]supp(x)[parallel].sub.0] = s for some s [member of] N, the x [member of] X is called s-sparse.

For a subset of indices J [subset] A, we denote by

J = {[gamma] [member of] [LAMBDA]: [absolute value of ([[phi].sub.[gamma]], [xi])] [greater than or equal to] [[omega].sub.min] [greater than or equal to] 0}, W = span {[[phi].sub.[gamma]] : [gamma] [member of] J}, (8)

where [xi] [member of] [partial derivative]R([x.sup.[dagger]]). We denote by [J.sup.c] the complement of J. In addition, let [mathematical expression not reproducible].

Source condition or variational inequality are necessary for analysis of convergence rate. With these conditions one can obtain convergence rate o([[delta].sup.1/2]), even saturation convergence rate o([[delta].sup.2/3]). But for sparsity regularization, in order to obtain convergence rate o([8.sup.1-[epsilon]]) (0 < [epsilon] < 1), even linear convergence o([delta]), one needs the following assumptions.

Assumption 3. Assume that the following hold:

(i) Source condition: there exists [eta] [member of] Y* and some [xi] [member of] [partial derivative]R([x.sup.[dagger]]) satisfying [xi] = A * [eta].

(ii) Finite basis injectivity (FBI) condition: for every finite set [LAMBDA] [??] N, the restriction of A to {[[phi].sub.[gamma]] : [gamma] [member of] [LAMBDA]} is injective.

The FBI condition was originally introduced in [8]. In [26], Grasmair et al. used a slightly weaker condition based on FBI condition and showed that Assumption 3 is in some sense the weakest possible condition that guarantees the linear convergence rate. Actually, if [x.sup.[dagger]] is the unique R(x)- minimizing solution, Assumption 3 is satisfied [26]. For 0 [less than or equal to] p < 1, [x.sup.[dagger]] being the unique R(x)-minimizing solution can not guarantee that Assumption 3 is satisfied. Even if Assumption 3 is satisfied for 0 [less than or equal to] p < 1, we do not know if linear convergence can be obtained due to fact that traditional proof for linear convergence needs the convexity of R(X).

2.2. Well-Posedness and Convergence Rate. In this subsection, well-posedness and convergence rate of the regularization method are given.

Proposition 4 (well-posedness). For every [alpha] > 0 and [y.sup.[delta]] [member of] Y, minimizing [J.sub.[alpha],[delta]](x) is well-defined, stable, and convergent.

Proof. In [J.sub.[alpha],[delta]] S(x), A is a linear bounded operator, X is Hilbert space, and R(x) is sequentially lower semicontinuous. It is easy to verify Assumption 3.13 [38]; then proof is along the lines of the proofs of Theorem 3.22 (existence), 3.23 (stablity), and 3.26 (convergence) in [38].

We now turn to convergence rate. For [l.sup.2] + [l.sup.1] problem, Lorenz proved convergence rate 0([[delta].sup.1/2]) with source condition and FBI condition [7]. Linear convergence rate 0([delta]) was improved by Grasmair et al. for nonlinear equation with additional two nonlinear conditions [8]. In [26], Grasmair et al. proved that source condition together with FBI condition is weaker than the restricted isometry property and obtained linear convergence 0([delta]). Flemming discussed convergence rates in [l.sup.1]-regularization when the basis is not smooth enough [27].

Remark 5. We note that Bregman distance cannot be used as an error measure in this section due to the fact that R(x) := [[summation].sub.[gamma]] [[omega].sub.[gamma]] [absolute value of <[[phi].sub.[gamma]], x>] fails to be strictly convex. We refer reader to [8, 39] for details. In this section we use [l.sup.2] norm as error measure.

The next results show a convergence rate for a priori parameter choice rule.

Lemma 6. Assume that Assumption 3 holds. Then there exists a constant C > 0 such that

[mathematical expression not reproducible] (9)

for all x [member of] X.

Proof. Since [[xi].sub.[gamma]] = <[[phi].sub.[gamma]],[xi]> [member of] X, we see that

[mathematical expression not reproducible]. (10)

Together with the definition of J, (10) implies that J is a finite set. From Assumption 3, the restriction A to every x [member of] W is injective. Then for some constant C > 0

[mathematical expression not reproducible] (11)

for all x [member of] W. By the definition of J and [x.sup.[dagger]] [member of] X we have that ([[phi].sub.[gamma]], [x.sup.[dagger]]) = 0 for every [gamma] [??] J and [x.sup.[dagger]] [member of] W. From (11), it is easy to show that

[mathematical expression not reproducible]. (12)

Then we conclude that

[mathematical expression not reproducible], (13)

and the proof is completed.

Lemma 7. Assume there exist [[beta].sub.1], [[beta].sub.2] > 0 such that

[mathematical expression not reproducible] (14)

for all x [member of] dom(A). If [alpha][[beta].sub.2] < 1, then

[mathematical expression not reproducible]. (15)

Proof. Since [x.sup.[delta].sub.[alpha]] minimize [J.sub.[alpha],[delta]](x), then

[mathematical expression not reproducible]. (16)

Together with condition (14) this implies the inequality

[mathematical expression not reproducible], (17)

which proves Lemma 7.

Theorem 8. Assume that Assumption 3 holds. Then for the choice [alpha] = 0([[delta].sup.[member of]]) (0 < [member of] < 1), we deduce that

[mathematical expression not reproducible]. (18)

Proof. Let m := max {[absolute value of [[phi].sub.[gamma]], [xi]>]| [gamma] [??] J}; we can estimate

[mathematical expression not reproducible]. (19)

From Lemma 6, we obtain

[mathematical expression not reproducible]; (20)

that is,

[mathematical expression not reproducible]. (21)

Let

[mathematical expression not reproducible]. (22)

According to Lemma 7, it follows that

[mathematical expression not reproducible]. (23)

The assertion follows from [alpha] = 0([[delta].sup.[member of]]).

Next we turn our attention to a posteriori parameter choice which is so-called discrepancy principle due to Morozov [40]. The regularization parameter defined via the discrepancy principle is

[mathematical expression not reproducible] (24)

for some [tau] [greater than or equal to] 1. Morozov discrepancy principle is a widely used and easy numerical implementation rule. Furthermore, for [tau] = 1, (4) is equivalent to a constrained minimization problem [26]

[mathematical expression not reproducible], (25)

which is widely used in compressive sensing. The next results show a convergence rate by the Morozov discrepancy principle.

Theorem 9. Assume that Assumption 3 holds and that the regularization parameter is determined by (24). Then

[mathematical expression not reproducible]. (26)

Proof. Since [x.sup.[delta].sub.[alpha]] is a minimizer of [J.sub.[alpha],[delta]](x), the inequality

[mathematical expression not reproducible] (27)

holds. This together with [mathematical expression not reproducible] implies that

R ([x.sup.[delta].sub.[alpha]]) [less than or equal to] R([x.sup.[dagger]]). (28)

Proceeding as in the proof of Theorem 8, we obtain that

[mathematical expression not reproducible]. (29)

From Lemma 6, it follows that

[mathematical expression not reproducible]. (30)

The proof is completed.

Remark 10. Linear convergence rate O(d) can also be obtained when the a priori choice and the Morozov discrepancy principle are applied to [mathematical expression not reproducible] fidelity. However, one must let [alpha] = 0([delta]) in the a priori parameter choice rule.

3. Dual Problem

In this section, let [[omega].sub.[gamma]] in (4) take the same value; that is, [[omega].sub.[gamma]] = [mu] > 0 for all [gamma] [member of] [GAMMA]. It is reasonable because convergence can be obtained when [delta]/[alpha] [right arrow] 0 [6]. Let [alpha] := [alpha][mu] = [alpha][[omega].sub.[gamma]]; then (4) is equivalent to

[mathematical expression not reproducible]. (31)

Let u = ([x.sub.1], [x.sub.2], ..., [x.sub.y], ...) [member of] [l.sup.2], where [x.sub.[gamma]] = <[[phi].sub.[gamma]], x>. In addition, we denote by D :[l.sup.2] [right arrow] [l.sup.2] a dictionary satisfied with u = Dx and [u.sub.[dagger]] = D[x.sup.[dagger]]. For example, in the field of wavelet transform, D is a wavelet decomposition operator and [D.sup.T] is a wavelet reconstruction operator [41, 42]. Let K = A * [D.sup.T]; then (4) is equivalent to

[mathematical expression not reproducible]. (32)

Dual is a conventional method for solving Tikhonov regularization with [l.sup.1] fidelity. However, there are some limitations to this approach to solve (32). The main difficulty is that both the [l.sup.1] fidelity and the [l.sup.1] regularization term are nondifferentiable. Moreover, for ill-conditioned problems, sparsity regularization is often unstable. We add the smooth penalty [mathematical expression not reproducible] to (32) to construct the following functional:

[mathematical expression not reproducible]. (33)

The advantage of problem (33) in place of (32) is that the dual problem of (33) is a constraint smooth functional and projected gradient algorithm can be used to compute minimizers. Moreover, the regularization effect of [l.sup.1] penalty is weak and [l.sup.2] penalty can improve the stability of (32).

Next we will investigate the convergence of the minimizers to the functional [P.sub.[beta]] as [beta] tending to zero.

Theorem 11. Let [alpha] be fixed and [absolute value of [[beta].sub.n]] be a sequence tending to zero. Then the minimizers [mathematical expression not reproducible] to the functional [mathematical expression not reproducible] have a subsequence converging to [u.sup.*] being a minimizer of the functional [J.sub.[alpha]](u).

Proof. By the definition of [mathematical expression not reproducible], we have

[mathematical expression not reproducible], (34)

and hence

[mathematical expression not reproducible]. (35)

Since [alpha] is a fixed value and [[beta].sub.n] [right arrow] 0, there exists a constant C > 0 such that

[mathematical expression not reproducible]. (36)

It follows that [mathematical expression not reproducible] is uniformly bounded. Therefore, a subsequence [mathematical expression not reproducible] of [mathematical expression not reproducible] and u* exist such that

[mathematical expression not reproducible]. (37)

By the weak lower semi continuity of norm, we obtain that

[mathematical expression not reproducible], (38)

and hence that

[mathematical expression not reproducible] (39)

for all u [member of] D(K). Therefore, [u.sup.*] is a minimizer of the functional [J.sub.[alpha]](u) and the proof is completed.

Next we consider the dual problem of [P.sub.[beta]]. We will show that the constraint smooth minimization problems [P.sup.*.sub.[beta]]

[mathematical expression not reproducible], (40)

are the dual problem of [P.sub.[beta]].

Theorem 12. [P.sup.*.sub.[beta]] is the dual problem of [P.sub.[beta]]. Resolutions [u.sub.[beta].sub.[alpha]] of [P.sub.[beta]] and [p.sup.[beta].sub.[alpha]] of [[[P.sup.*.sub.[beta]].sup.*] have the following relation:

[([u.sub.[beta]].sub.[alpha]).sub.i] = [S.sub.[alpha]/[beta]] ([([K.sup.*] [p.sup.[beta].sub.[alpha]).sub.i]/[beta]), i = 1, 2, ..., <K[u.sup.[beta].sub.[alpha]] - [y.sup.[delta]], p - [p.sup.[beta].sub.[alpha]]> [greater than or equal to] 0 (41)

for all p [member of] [l.sup.2] with [mathematical expression not reproducible], where

[mathematical expression not reproducible] (42)

is the soft threshold function.

Proof. Let

[mathematical expression not reproducible], (43)

then problem [P.sub.[beta]] can be rewritten as

[mathematical expression not reproducible]. (44)

Let us denote by [F.sup.*] and [F.sup.*] the conjugate function of F and F. By the Fenchel duality [43] [Chap 3, Chap 10], it follows that

[mathematical expression not reproducible]. (45)

In [J.sub.[alpha],[delta]](x), the functionals F and F are proper, convex lower semicontinuous. By the Fenchel duality theorem [43](Chap 3, Prop. 2.4, Prop. 4.1, Rem. 4.2), it is easy to show that

[mathematical expression not reproducible]; (46)

that is,

[mathematical expression not reproducible]. (47)

Since [u.sub.[beta].sub.[alpha]] and [p.sub.[beta].sub.[alpha]] are minimizers of [P.sub.[beta]] and [P.sup.*.sub.[beta]], we have that

F (K[u.sub.[beta].sub.[alpha]]) + R ([u.sub.[beta].sub.[alpha]]) + [F.sup.*] (- [p.sub.[beta].sub.[alpha]]) + [R.sup.*] ([K.sup.*] [p.sub.[beta].sub.[alpha]]) = 0. (48)

Moreover, the extremality condition (48) is equivalent to the Kuhn-Tucker conditions

-[p.sub.[beta].sub.[alpha]] [member of] [partial derivative]F (K[u.sub.[beta].sub.[alpha]]), K*[p.sub.[beta].sub.[alpha]] [member of] [partial derivative]R ([u.sub.[beta].sub.[alpha]]). (49)

By the definition of the subgradient and -[p.sub.[beta].sub.[alpha]] [member of] [partial derivative]F(K[u.sub.[beta].sub.[alpha]]) in (49), it follows that

[mathematical expression not reproducible]. (50)
```Algorithm 1: Spectral projected gradient method for [P*.sub.[beta]].

(1) Set [gamma], [member of], [[sigma].sub.1], [[sigma].sub.2],
[[alpha].sub.min], [[alpha].sub.max] > 0, M [greater than or equal
to] 0, k = 1; choose [p.sub.1] [member of] [l.sup.2], [[alpha].sub.1]
(2) [d.sub.k] = [P.sub.c]([p.sub.k] - [a.sub.k][nabla]f([p.sub.k]))
- [p.sub.k]
(3) Select [[lambda].sub.k] using line research Algorithm 2
(4) compute [p.sub.k+1] = [p.sub.k] + [[lambda].sub.k][d.sub.k],
[s.sub.k] = [p.sub.k+1] - [p.sub.k], [q.sub.k] = [nabla]f
([p.sub.k+1]) - [nabla]f([p.sub.k]), [beta].sub.k] =
<[s.sub.k], [q.sub.k]>
(5)
if [[beta].sub.k] [less than or equal to] 0 then
Set [[alpha].sub.k+1] = [[alpha].sub.max]
Else
Set [[alpha].sub.k+1] = min {[[alpha].sub.max], max
{[[alpha].sub.min], <[s.sub.k], [s.sub.k]>/[[beta].sub.k]}}
end if
(6) k = k + 1
Until [[parallel][P.sub.c]([x.sub.k] + [nabla]f([x.sub.k]))
- [x.sub.k][parallel].sub.2] [less than or equal to] [epsilon]
or k = [k.sub.max]
```

Then we obtain

<K[u.sub.[beta].sub.[alpha]] - [y.sup.[delta]], p - [p.sub.[beta].sub.[alpha]]) [greater than or equal to] 0. (51)

By the definition of the subgradient and K* [p.sub.[beta].sub.[alpha]] [member of] [partial derivative]R([u.sub.[beta].sub.[alpha]]) in (49), we conclude that

[([K.sup.*][p.sub.[beta].sub.[alpha]]).sub.i] = [beta][([u.sub.[beta].sub.[alpha]]).sub.i] + [alpha] sign ([([u.sub.[beta].sub.[alpha]]).sub.i]), i = 1, 2, .... (52)

For [([u.sub.[beta].sub.[alpha]]).sub.i] > 0, this leads to [([u.sub.[beta].sub.[alpha]]).sub.i] = [([K.sup.*]([p.sub.[beta].sub.[alpha]])).sub.i]/[beta] - [alpha]/[beta]; we must impose in this case that [([K.sup.*] ([p.sub.[beta].sub.[alpha]])).sub.i]/[beta] > 0, this leads to [([u.sub.[beta].sub.[alpha]]).sub.i] = [([K.sup.*] ([p.sub.[beta].sub.[alpha]])).sub.i]/[beta] + [alpha]/[beta], valid only when [([K.sup.*] ([p.sub.[beta].sub.[alpha]])).sub.i]/[beta] < -[alpha]/[beta]. When [absolute value of [([K.sup.*] ([p.sub.[beta].sub.[alpha]])).sub.i]/[beta]] < [alpha]/[beta], we put [([u.sub.[beta].sub.[alpha]]).sub.i] = 0- Summarizing

([u.sub.[beta].sub.[alpha]]) = [S.sub.[alpha],[beta]] (([K.sup.*] [([p.sub.[beta].sub.[alpha]]).sub.i]/[beta]), (53)

where [S.sub.[alpha]/[beta]] is soft threshold function which is defined by

[mathematical expression not reproducible], (54)

the proof is completed.

Remark 13. In [39] (chap 1), numerical experiments showed that the choice of p slightly larger than 1 gives even better results than p = 1. It is necessary to discuss the functional

[mathematical expression not reproducible], (55)

where 1 < p < 2; we call (55) [l.sup.1] + [l.sup.p] problem. We do not have to add additional [l.sup.2] penalty to (55); its dual problem can be obtained by Fenchel dual theory directly.

4. Computation of Minimizers

As mentioned above, the functional equation (33) can be transformed using Fenchel duality into a smooth functional with a box constraint, which can be solved effectively using projected gradient-type methods. The spectral projected gradient (SPG) method [44] has been proved to be effective for the minimization of differentiable functional with box constraints. The function f(p) is defined by

[mathematical expression not reproducible]; (56)

then

[nabla]f(p) = 1/[beta]KK[K.sup.*] p - [y.sup.[delta]]. (57)

By setting

[mathematical expression not reproducible], (58)

where c = min {[alpha], 1},

we denote by [P.sub.B(c)](z) the orthogonal projection on the ball B(c)

[P.sub.B(c)] (x) (i, j) = c x(i, j)/max (c[absolute value of x(i, j)]). (59)

Given initial value [x.sub.0] and a constraint on step length [[alpha].sub.k], that is, 0 < [[alpha].sub.min] [less than or equal to] [[alpha].sub.k] [less than or equal to] [[alpha].sub.max], let parameter [gamma] [member of] (0,1), and 0 < [[sigma].sub.1] < [[sigma].sub.2] < 1, M > 1. We use the convergence criteria given by

[mathematical expression not reproducible] (60)

(see Algorithm 1). We have the following convergence results.

Proposition 14. For [gamma] [member of] (0,1) and 0 < [[sigma].sub.1] < [[sigma].sub.2] < 1, algorithm SPG is well-defined; the sequence [p.sub.k] generated by algorithm SPG converges to a stationary point of the functional [p.sub.[beta].sub.[alpha]].

Proof. Since f(p) is convex and has continuous derivatives, the convergence follows from standard results (refer to Theorem 2.1 in [44]).
```Algorithm 2: Line research.

(1) Set [lambda] =1
(2) Compute [bar.p] = [p.sub.k] + [lambda][d.sub.k]
(3)
if
[mathematical expression not reproducible]
then
Let [[lambda].sub.k] = A
Else
Let [lambda] [member of] [[[sigma].sub.1] [lambda],
[[sigma].sub.2] [lambda]], and go to (2)
end if

Algorithm 3: Fixed point algorithm for [alpha] and [beta].

(1) Choose [[alpha].sup.0] and [[beta].sup.0], and let k = 0
(2) [mathematical expression not reproducible]
(3) [mathematical expression not reproducible]
Until a stopping criterion satisfied.
```

5. Choice of Parameter [alpha] and [beta]

The solution [p.sub.[beta].sub.[alpha]] of [P.sub.[beta]] converges theoretically to the solution [p.sub.[delta].sub.[alpha]] of P as [beta] [right arrow] 0. Obviously smaller [beta] is better. However, [mathematical expression not reproducible] tends to infinite as [beta] [right arrow] 0. Moreover, the smaller weakens the regularization effect of the [l.sup.2] penalty, which leads to instability. If the solution is sparse, we can say that the nonzero coefficients are impulsive parts and the zero coefficients are smooth parts. Multiparameter regularization

[mathematical expression not reproducible], (61)

is a conventional method for ill-posed problems if the solutions have a number of different structures. In [35, 37], numerical experiments show that multiparameter regularization can effectively recover the different part of solutions. If the solutions contain only a single structure, multiparameter regularization also has better performance. Hence it does not necessarily require the parameter [beta] tending to zero when we choose the parameters [alpha] and [beta]. A conventional method for the choice of parameters [alpha] and [beta] is multiparameter regularization choice principle, for example, discrepancy principle [35] and balance principle [37]. In this paper, we use multiparameter balance principle for the choice of regularization parameters [alpha] and [beta].

Balance principle is to compute minimizers of the function

[mathematical expression not reproducible], (62)

where [c.sub.[gamma]] is a fixed constant, and (62) is equivalent to [37]

[mathematical expression not reproducible]. (63)

A fixed point algorithm for [alpha] and [beta] according to (63) is as in Algorithm 3.

6. Numerical Simulations

In this section, we present some numerical experiments to illustrate the efficiency of the proposed method. In Section 6.1, numerical experiments involve compressive sensing. We aim to demonstrate that [l.sup.1] fidelity is more stable than the [l.sup.2] fidelity and is capable of handing both impulsive and Gaussian noises. In Section 6.2, we first compare the performance of the DSPG method with the alternating direction method (ADM) and TNIP method by well-conditioned compressive sensing problems. In the second example, we discuss an ill-posed problem where the condition number of linear operator A is 255; we aim to demonstrate that the proposed method is stable even with large condition numbers. In Section 6.3, we discuss the image inpainting where images are sparse with respect to the Daubechies wavelets. For image inpainting, the linear operator A is moderate ill-condition and the condition number is around 4000. In order to compare the restoration results, the quality of the computed solution x is measured by relative error Rerr and PSNR which are, respectively, defined by

[mathematical expression not reproducible]. (64)

All experiments were performed under Windows 7 and Matlab R2010a on HP ProBook 4431s with Intel Core i5 2410M CPU 2.30 GHz 2.30 GHz and 4 GB of memory.

6.1. Comparison with [l.sup.1] and [l.sup.2] Fidelity. This example involves compressive sensing problem Ax = [y.sup.[delta]], where matrix [A.sub.80x200] is random Gaussian and [y.sup.[delta]] = [Ax.sup.[dagger]] + [delta] is the observed data containing white noise or impulsive noise. The true solution [x.sup.[dagger]] is 16-sparse with respect to natural basis of [l.sup.2] space which is defined by

[mathematical expression not reproducible]. (65)

White noise is generated such that data [y.sup.[delta]] attains a desired SNR, which is defined by

[mathematical expression not reproducible]. (66)

The impulsive noise is measured by relative error, which is defined by

Rerr ([delta]) = [parallel] y- [y.sup.[delta]] [parallel]/[parallel]y[parallel] x 100%. (67)

Figure 1 presents the restoration results by [l.sup.1] and [l.sup.2] fidelities. The left column describes the data with different white noise levels. The SNR of data are 30 dB, 20 dB, 10 dB, and 5 dB. The right column is restoration results; the regularization parameters ([alpha], [beta]) are (2.37e-3, 1.26e-3), (6.32e-2, 2.63e-2), (4.35e-1,1.68e-1), and (9.05e-1, 7.45e-1). As can be seen from Figure 1, the quality of restoration by [l.sup.1] and [l.sup.2] fidelities is similar if the data are contaminated with low noise levels. If data contain high noise levels, the performance of [l.sup.2] fidelity is better. However, we can not recover high-quality solutions with high noise levels. In Figure 2, the left column describes the data which are contaminated by different impulsive noise levels. Rerr([delta]) of noise level are 3%, 7%, 15%, and 22%. The value of impulsive noise is [+ or -]1 at random positions and 0 at other positions. The right column contains restoration results according to different noise levels. The regularization parameters ([alpha], [beta] [beta]) are (2.43e-2, 1.58e-2), (8.29e-2, 4.63e-2), (2.12e-1, 1.06e-1), and (4.25e-1, 2.46e-1). As can be seen from Figure 2, the [l.sup.1] fidelity is more stable for impulsive noise and always offers high-quality restoration even with poor data. In contrast to the [l.sup.2] fidelity, the quality of restoration results by the [l.sup.2] fidelity is always poor.

6.3. Image Inpainting. We consider 2D image inpainting problems. The image is Barbara (n = 512; cf. Figure 5). We randomly remove 40% pixels of Barbara to create an incomplete image. In this case, the image inpainting is a moderate ill-posed problem. The condition number of the matrix is around 4000. For our purpose, we make use of Daubechies 4 wavelet basis as a dictionary. We use four scales, for a total of 8192 x 512 wavelet and scaling coefficients (cf. Figure 6). As seen from Figure 6, the representation of the image with respect to Daubechies 4 basis is sparse. We add salt-and-pepper noise by Matlab code: imnoise(image, 'salt & pepper, d). In this example, d = 0.05. The restoration results are shown in Figure 5(d). Restoration results of four images with different noise levels are given in Table 2. Restoration results show that if images have a sparse representation with respect to an orthogonal basis, DSPG method can obtain satisfactory results even if the image inpainting are moderate ill-posed problems.

7. Conclusion

With source and FBI conditions, we have proved that [l.sup.1] + [l.sup.1] regularization method yields convergence rates of order [[delta].sup.1-e] and [delta]. For numerical solutions, we have proposed a novel DSPG approach for sparsity regularization with [l.sup.1] fidelity. Numerical results indicate that the proposed DSPG algorithm performs competitively with several state-of-art algorithms such as ADM method. On various classes of test problems with different condition numbers, the proposed DSPG method has exhibited the following advantages: (i) compared with other algorithms, the dual problem of our methods is more simple which is a box-constraint smooth functional and can be solved effectively by projected methods; (ii) for ill-conditioned problems, DSPG method is more stable with respect to noise level. We remark that for well-conditioned problems, for example, compressive sensing, optimal accuracy of ADM-[l.sup.1] is better than DSPG method. However, the optimal accuracy of ADM-[l.sup.1] method is strongly dependant on stopping tolerance values which can be difficult to estimate in practice. To the best of the author's knowledge, multiparameter regularization is first used to obtain the dual formulation of [l.sup.1] +11 problems. One can also try to use multiparameter regularization strategy to solve dual problem of the other nonsmooth functionals, for example, [l.sup.1] + TV problems.

https://doi.org/10.1155/2017/2736306

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was funded by the National Natural Science Foundation of China (Grants nos. 41304093 and 11301119) and the Fundamental Research Funds for the Central Universities (Grant no. 2572015CB19) and Heilongjiang Postdoctoral Research Develop-Mental Fund (no. LBH-Q16008). The authors are grateful to Professor Wen Song of Harbin Normal University for many useful discussions and suggestions.

References

[1] E. J. Candes, J. K. Romberg, and T. Tao, "Stable signal recovery from incomplete and inaccurate measurements," Communications on Pure and Applied Mathematics, vol. 59, no. 8, pp. 1207-1223, 2006.

[2] D. L. Donoho, "Compressed sensing," IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289-1306, 2006.

[3] H. Gao and H. Zhao, "Multilevel bioluminescence tomography based on radiative transfer equation part 1: 11 regularization," Optics Express, vol. 18, no. 3, pp. 1854-1871, 2010.

[4] I. Loris, G. Nolet, I. Daubechies, and F. A. Dahlen, "Tomographic inversion using 11-norm regularization of wavelet coefficients," Geophysical Journal International, vol. 170, no. 1, pp. 359-370, 2007.

[5] B. Jin and P. Maass, "Sparsity regularization for parameter identification problems," Inverse Problems, vol. 28, no. 12, Article ID 123001, 2012.

[6] I. Daubechies, M. Defrise, and C. De Mol, "An iterative thresholding algorithm for linear inverse problems with a sparsity constraint," Communications on Pure and Applied Mathematics, vol. 57, no. 11, pp. 1413-1457, 2004.

[7] M. Fornasier, Theoretical Foundations and Numerical Methods for Sparse Recovery, De Gruyter, 2010.

[8] M. Grasmair, "Non-convex sparse regularisation," Journal of Mathematical Analysis and Applications, vol. 365, no. 1, pp. 19-28, 2010.

[9] M. Nikolova, "Description of the minimizers of least squares regularized with l(0)-norm. Uniqueness of the global minimizer," SIAM Journal on Imaging Sciences, vol. 6, no. 2, pp. 904-937, 2013.

[10] C. A. Zarzer, "On Tikhonov regularization with non-convex sparsity constraints," Inverse Problems, vol. 25, no. 2, Article ID 025006, 2009.

[11] W. Wang, S. Lu, H. Mao, and J. Cheng, "Multi-parameter Tikhonov regularization with the l(0) sparsity constraint," Inverse Problems, vol. 29, no. 6, Article ID 065018, 2013.

[12] P. J. Rousseeuw and A. M. Leroy, Robust Regression and Outlier Detection, John Wiley & Sons, New York, NY, USA, 1987.

[13] S. Alliney, "Digital filters as absolute norm regularizers," IEEE Transactions on Signal Processing, vol. 40, no. 6, pp. 1548-1562, 1992.

[14] C. Clason, B. Jin, and K. Kunisch, "A duality-based splitting method for [l.sup.1]-TV image restoration with automatic regularization parameter choice," SIAM Journal on Scientific Computing, vol. 32, no. 3, pp. 1484-1505, 2010.

[15] M. Nikolova, "Minimizers of cost-functions involving nonsmooth data-fidelity terms. Application to the processing of outliers," SIAM Journal on Numerical Analysis, vol. 40, no. 3, pp. 965-994, 2002.

[16] Y. Dong, M. Hintermuller, and M. Neri, "An efficient primal-dual method for [l.sup.1]-TV image restoration," SIAM Journal on Imaging Sciences, vol. 2, no. 4, pp. 1168-1189, 2009.

[17] W. Yin, D. Goldfarb, and S. Osher, "The total variation regularized L1 model for multiscale decomposition," Multiscale Modeling and Simulation, vol. 6, no. 1, pp. 190-211, 2007.

[18] J. Yang and Y. Zhang, "Alternating direction algorithms for [l.sup.1]- problems in compressive sensing," SIAM Journal on Scientific Computing, vol. 33, no. 1, pp. 250-278, 2011.

[19] J. Yang and Y. Zhang, "A primal-dual interior-point framework for using the L1 or L2 norm on the data and regularization terms of inverse problems," SIAM Journal on Scientific Computing, vol. 33, no. 1, pp. 250-278, 2011.

[20] S. Stankovic, I. Orovic, and M. Amin, "L-statistics based modification of reconstruction algorithms for compressive sensing in the presence of impulse noise," Signal Processing, vol. 93, no. 11, pp. 2927-2931, 2013.

[21] L. Ma, J. Yu, and T. Zeng, "Sparse representation prior and total variation-based image deblurring under impulse noise," SIAM Journal on Imaging Sciences, vol. 6, no. 4, pp. 2258-2284, 2013.

[22] J. Shang, Z. Wang, and Q. Huang, "A robust algorithm for joint sparse recovery in presence of impulsive noise," IEEE Signal Processing Letters, vol. 22, no. 8, pp. 1166-1170, 2015.

[23] L. Chen, L. Liu, and C. L. Philip Chen, "A robust bi-sparsity model with non-local regularization for mixed noise reduction," Information Sciences, vol. 354, pp. 101-111, 2016.

[24] M. Burger and S. Osher, "Convergence rates of convex variational regularization," Inverse Problems, vol. 20, no. 5, pp. 1411-1421, 2004.

[25] A. Neubauer, T. Hein, B. Hofmann, S. Kindermann, and U. Tautenhahn, "Improved and extended results for enhanced convergence rates of Tikhonov regularization in Banach spaces," Applicable Analysis, vol. 89, no. 11, pp. 1729-1743, 2010.

[26] M. Grasmair, M. Haltmeier, and O. Scherzer, "Necessary and sufficient conditions for linear convergence of[l.sup.1]-regularization," Communications on Pure and Applied Mathematics, vol. 64, no. 2, pp. 161-182, 2011.

[27] J. Flemming and M. Hegland, "Convergence rates in [l.sup.1]-regularization when the basis is not smooth enough," Applicable Analysis, vol. 94, no. 3, pp. 464-476, 2015.

[28] C. Konig, F. Werner, and T. Hohage, "Convergence rates for exponentially ill-posed inverse problems with impulsive noise," SIAM Journal on Numerical Analysis, vol. 54, no. 1, pp. 341-360, 2016.

[29] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, "Least angle regression," The Annals of Statistics, vol. 32, no. 2, pp. 407-499, 2004.

[30] I. Daubechies, R. DeVore, M. Fornasier, and C. S. Gunturk, "Iteratively reweighted least squares minimization for sparse recovery," Communications on Pure and Applied Mathematics, vol. 63, no. 1, pp. 1-38, 2010.

[31] T. Blumensath and M. E. Davies, "Iterative thresholding for sparse approximations," The Journal of Fourier Analysis and Applications, vol. 14, no. 5-6, pp. 629-654, 2008.

[32] T. Blumensath and M. E. Davies, "Iterative hard thresholding for compressed sensing," Applied and Computational Harmonic Analysis, vol. 27, no. 3, pp. 265-274, 2009.

[33] M. Cullen, M. A. Freitag, and S. Kindermann, Large Scale Inverse Problems: Computational Methods and Applications in the Earth Sciences, De Gruyter, Austria, 2013.

[34] Y. Xiao, H. Zhu, and S.-Y. Wu, "Primal and dual alternating direction algorithms for [l.sup.1]-[l.sup.1] norm minimization problems in compressive sensing," Computational Optimization and Applications, vol. 54, no. 2, pp. 441-459, 2013.

[35] S. Lu and S. V. Pereverzev, "Multi-parameter regularization and its numerical realization," Numerische Mathematik, vol. 118, no. 1, pp. 1-31, 2011.

[36] S. Lu, S. V. Pereverzev, Y. Shao, and U. Tautenhahn, "Discrepancy curves for multi-parameter regularization," Journal of Inverse and Ill-Posed Problems, vol. 18, no. 6, pp. 655-676, 2010.

[37] K. Ito, B. Jin, and T. Takeuchi, "Multi-parameter Tikhonov regularization," Methods & Applications of Analysis, vol. 18, no. 1, pp. 31-46, 2011.

[38] O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier, and F. Lenzen, Variational Methods in Imaging, Springer, 2011.

[39] T. Schuster, B. Kaltenbacher, B. Hofmann, and K. S. Kazimierski, Regularization Methods in Banach Spaces, De Gruyter, 2012.

[40] V. A. Morozov, "On the solution of functional equations by the method of regularization," Soviet Mathematics. Doklady, vol. 7, pp. 510-512, 1966.

[41] S. G. Mallet, A Wavelet Tour of Signal Processing, Elsevier, Oxford, UK, 2010.

[42] I. Loris, G. Nolet, I. Daubechies, and F. A. Dahlen, "Tomographic inversion using [l.sup.1]-norm regularization of wavelet coefficients," Geophysical Journal International, vol. 170, no. 1, pp. 359-370, 2007.

[43] I. Ekeland and R. Man, Convex Analysis and Variational Problems, SIAM, Philadelphia, Pa, USA, 1999.

[44] E. G. Birgin, J. M. Martinez, and M. Raydan, "Nonmonotone spectral projected gradient methods on convex sets," SIAM Journal on Optimization, vol. 10, no. 4, pp. 1196-1211, 2000.

[45] S. J. Kim, K. Koh, and M. Lustig, "A method for large-scale l1-regularized least squares," IEEE Journal on Selected Topics in Signal Processing, vol. 1, no. 4, pp. 606-617, 2007

Liang Ding, (1) Hui Zhao, (1) and Yixin Dou (2)

(1) Department of Mathematics, Northeast Forestry University, No. (26) Hexing Street, Xiangfang District, Harbin, China

(2) School of Economics and Finance, Harbin University of Commerce, No. 1 Xuehai Street, Songbei District, Harbin, China

Correspondence should be addressed to Liang Ding; dl@nefu.edu.cn

Received 20 April 2017; Accepted 24 July 2017; Published 14 September 2017

Caption: Figure 1: Restorations of [l.sup.1] and [l.sup.2] fidelities with different Gaussian noise levels.

Caption: Figure 2: Restorations of [l.sup.1] and [l.sup.2] fidelities with different impulsive noise levels.

Caption: Figure 5: Restoration of Barbara with salt-and-pepper noise, d = 0.2.

Caption: Figure 6: Wavelet and scale coefficients.
```Table 1: Comparisons of different algorithms.

Impulsive           TNIP          AMD-[l.sup.2]
noise
[delta]      Rerror     PSNR     Rerror     PSNR

0.1%         0.610%     54.49    0.399%     57.97
0.3%         1.412%     47.01    0.559%     55.04
0.5%         2.295%     42.78    1.078%     49.34
1.0%         3.187%     39.92    3.178%     39.93
3.0%         4.993%     36.03    4.956%     36.12
5.0%         8.608%     31.30    8.602%     31.30
10%          17.66%     25.06    17.66%     25.08

Impulsive     AMD-[l.sup.1]          DSPG
noise
[delta]      Rerror     PSNR     Rerror     PSNR

0.1%         0.907%     50.84    0.387%     57.28
0.3%         1.813%     44.83    0.548%     55.36
0.5%         2.719%     41.31    0.956%     50.14
1.0%         3.619%     38.82    2.305%     42.65
3.0%         5.542%     35.31    3.063%     40.15
5.0%         9.101%     30.87    5.217%     36.25
10%          18.11%     24.84    12.64%     29.21

Table 2: Restoration of four images with different
salt-and-pepper noise levels.

Noise level     Lena     Boat    Barbara    Goldhill    Cameraman

0.05           26.21    25.16     22.16       25.58       25.12
0.08           24.87    23.88     21.52       25.17       25.03
0.10           22.82    21.56     20.35       24.07       24.23
0.15           20.42    19.55     19.23       22.74       22.42

Noise level    Peppers     Mandrill     Pirate

0.05            26.25       26.68       25.09
0.08            26.11       26.16       25.01
0.10            25.01       25.32       24.58
0.15            23.16       23.85       22.75
```