# Variational image denoising while constraining the distribution of the residual.

Dedicated to Lothar Reichel on the occasion of his 60th birthday1. Introduction. A real captured image may be distorted by many expected or unexpected factors among which random noise is a typical and often unavoidable example. Hence, image denoising is a fundamental task in the field of image processing and a plethora of noise removal approaches have been proposed throughout the last few decades. Basically, there are three standard noise models in imaging systems. These are additive noise, multiplicative noise, and impulse noise. Typical image noise models are further characterized by the shape of their probability density function which in the discrete setting is represented by the noise histogram. In this paper, we focus on the restoration of images corrupted by additive noise, which we assume to be sampled from a known a-priori distribution. Note that previous work has been done to impose constraints on the histogram of the restored image itself. In this work, however, the focus is on the histogram of the residual. To our knowledge, this is the first attempt to impose such constraints.

Representing gray-scale two-dimensional images by real-valued functions defined on a rectangular domain [OMEGA] [subset] [R.sup.2], the available observed noise-contaminated image [u.sub.0] is related to the unknown noise-free image [bar.u] by the following degradation model

(1.1) [u.sub.0](x) = [bar.u](x) + [bar.n](x), x [member of] [OMEGA],

where [bar.n] is an unknown realization of the random noise process, which we assume to be identically distributed with known probability density function. The goal of a denoising algorithm is to obtain an estimate u of the unknown noise-free image u. This, from (1.1), allows us to define the residual as n = [u.sub.0] - u which represents an estimate of the unknown noise realization n. In principle, the greater the a priori information available on the noise-free image and the noise is, the better the chance for a successful denoising process will be. More general, in image restoration it is often beneficial to impose known properties of the noise-free image such as smoothness [20] and nonnegativity [15] during the solution process. On the other hand, most image restoration methods find the approximation u of [bar.u] by using the a priori information on the mean and the variance of the noise, that is, they mainly exploit only the norm of the residual. Among these methods, many including the discrepancy principle, the generalized cross validation, the Truncated Singular Value Decomposition [16], or the L-curve method [2, 5], exploit this limited amount of residual information in the regularization parameter choice.

One aspect generally missing from state-of-the-art image denoising algorithms is a full exploitation of all the available a-priori information on the noise. The main purpose of this work is to go beyond the use of the residual norm and to propose a new method that takes advantage of a higher amount of information present in the components of the residual vector since in many cases the noise distribution is known (1).

This concept is better illustrated in Figure 1.1. Three synthetic examples of additive image noise all with the same mean [mu] = 127.5 and standard deviation [sigma] = 27.5 are depicted in the first row, nevertheless the first noise example is completely different from the other two. In particular, the first is a simple step function, the second and the third are specific realizations of Gaussian and uniform noise distributions, respectively. In the second row, the associated histograms are shown which reflect the visual difference we can notice in the illustrations. Even if the second and the third illustrations appear visually similar, the associated histograms are significantly different from each other. Therefore, by exploiting the complete information on the noise distribution, we expect to obtain a residual which better fits the probabilistic model of the noise process, and, as a consequence, we expect an improvement in the restored image.

To the best of our knowledge, no previous attempt has been made to explicitly favor a target distribution for the residual during the denoising process. The histogram of the residual has been proposed for choosing the regularization parameter in [22] to make the residuals as close as possible to white noise. In [14] a novel fidelity functional was proposed in a variational framework in order to enforce whiteness of the residual. The histogram of the image itself has been recently proposed in [24] for image restoration, however it is based on the strong assumption that the histogram of the original image is known.

While our formulation is quite different due to our focus on the residual, it can nevertheless be related on a conceptual level with numerous methods that have been proposed for image contrast enhancement or segmentation problems which are based on the modification of the histogram of an input image toward a target histogram. The simplest such method is histogram linear stretching [17]. Sapiro and Caselles in [23] proposed histogram modification via image evolution equations and R. Chan et al. in [8] proposed a general variational framework for histogram modification.

Over the last two decades, a variety of PDE-based and variational methods have been developed to deal with the image denoising problem. A good review can be found in [4]. In these approaches, the use of variational methods [7, 10, 13, 18, 21, 26] and nonlinear partial differential equations (PDEs) [1, 6, 11, 19, 27] have significantly grown.

Our idea is to develop a variational denoising model which integrates soft or hard constraints to fit the distribution of the noise.

The paper is organized as follows. In Section 2 the variational approach to image denoising is illustrated. Section 3 describes how we defined the novel constraints based on the residual cumulative density function. In Section 5 the ADMM optimization technique is proposed to solve the constrained variational model. In Section 6 preliminary experimental results are presented and conclusions are drawn in Section 7.

2. The variational approach to image denoising. Variational approaches for image denoising commonly rely on the following energy functional minimization:

(2.1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The energy functional J minimized in (2.1) is thus the sum of a regularization term R and a so-called fidelity term F with the regularization parameter [lambda] > 0 controlling the trade-off between "regularity" of the solution u and fidelity of the solution to the observed data [u.sub.0]. In particular, the regularization functional in (2.1) encodes prior information on the smoothness of the unknown noise-free image u while the fidelity functional is based on the assumption on the residual n = [u.sub.0] - u, that is, on the additive noise n in (1.1).

A standard choice for the fidelity term is

(2.2) F([u.sub.0] - u) = [1/2] [([u.sub.0] - u).sup.2] dx,

which indirectly encodes the prior knowledge on the noise standard deviation a.

The type of the regularization functional in (2.1) is important for the success of the denoising process. A very popular choice for it is the Total Variation semi-norm since it has the desirable property to allow for sharp edges in the solution. The popular Rudin-Osher-Fatemi (ROF) denoising algorithm [21] considers the [TV-L.sub.2] functional,

(2.3) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

This model yields very satisfactory results for removing image noise while preserving edges and contours of objects.

The unconstrained [TV-L.sup.2] model in (2.3) can be equivalently reformulated as the following constrained optimization problem:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

with the admissible set defined as

V = {u : [parallel][u.sub.0] - u [[parallel].sup.2.sub.2] [less than or equal to] [absolute value of ([OMEGA])] [[sigma].sup.2]},

where [absolute value of ([OMEGA])] is the area of the image domain.

Instead of the standard [L.sub.2] fidelity term in (2.3), in order to explicitly exploit the assumption that the noise distribution is known, we introduce a novel residual fidelity term that enforces the similarity between the residual distribution and the target noise distribution, thus obtaining the penalty formulation,

(2.4) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [bar.P] denotes the known noise distribution in (1.1) and [P.sub.n] is the distribution function of the residual n = [u.sub.0] - u.

For instance, if the noise is known to be zero-mean Gaussian with a certain standard deviation, [bar.P] will be the associated cumulative distribution function. Details on these distributions will be provided in Section 4.

In this paper, we propose a hard-constrained version of the variational model in (2.4) aimed at forcing explicitly the constraints on the noise distribution function of the residual. We chose the TV regularization simply for its popularity; any other regularizers could be substituted as well. The proposed model is

(2.5) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where the new admissible set H contains solutions u such that the corresponding residual [u.sub.0] - u has a distribution close to the theoretical noise distribution function. We will characterize the set H and, accordingly, we will present an efficient minimization algorithm based on the Alternating Directions Method of Multipliers (ADMM) optimization technique. The ADMM method and its variants are largely used to solve minimization problems in image processing. We refer the reader to [3] for a general dissertation on optimization techniques such as ADMM methods or others and their applications to image processing.

Let us remark how the proposed residual fidelity term in (2.4) and the related constraint u [member of] H can be seen as an infinite dimensional extension of the classical ones in (2.2). Instead of constraining the mean and variance of the residual only, we force the entire distribution [P.sub.n] (z) (i.e., for any real intensity z [member of] R) toward the known noise distribution [bar.P]. As such, the proposed residual distribution term is infinitely more constraining than the classical mean and variance penalties.

3. Continuous distribution constraints. In this section we present a few preliminary equations that will be exploited in the following sections when we develop models that constrain (either in the form of a penalty or in the form of a hard constraint) the distribution of the residual of the reconstruction. The basic idea, which will be explained in more detail in the following sections, is to force the residual to exhibit some of the known a-priori properties of the distribution of the additive noise embedded in the original observed image. Since our models are developed in the variational framework, we will need to address constraints on the distribution of the residual in this framework as well. Here we present the necessary preliminaries. Note that we will focus our attention on the cumulative distribution function rather than the standard probability distribution function as it exhibits an extra degree of regularity while still containing the same amount of information about the global distribution of the values of a given function.

Let p(z, t) denote the probability density function (pdf),

(3.1) p(z,t) = 1/[absolute value of ([OMEGA])] [[integral].sub.[OMEGA]] [delta](z - f (x,t)) dx,

where [delta] represents the standard Dirac measure. Then, with [P.sub.f] (z,t), we denote the time-varying cumulative distribution function (cdf) of a time-varying function f (x,t) of a 2D space variable x [member of] [OMEGA] as

(3.2) [P.sub.f](z-T) 1/[absolute value of ([OMEGA])] [[integral].sub.[OMEGA]] H(z - f (x,t)) dx

where [OMEGA] [subset] [R.sup.2] is a fixed, compact domain and H: R [right arrow] {0,1} represents the standard Heaviside function. The time derivative of [P.sub.f] is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [delta] represents the standard Dirac measure.

Now consider a weighted penalty term [F.sub.cdf] (f (t)) that measures the difference between [P.sub.f] (z, t) and a given target [bar.P] (z),

(3.3) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where the interval D C R contains the range of values of f and where the weighting parameter [lambda] > 0 is allowed to vary within this interval. We may interpret such a term as a weighted fidelity term between the target cdf and the actual cdf of the function f. Note that in the upcoming sections we will be utilizing such a term but applied to the residual rather than the reconstructed image, and therefore the function f presented here in these preliminary equations serves as a placeholder for the difference between the reconstructed image and the observed noisy image. As such, the interpretation of this as a fidelity term is to be understood in the sense of the residual having a similar distribution as the additive noise within the observed image. As the function f varies in time, the cdf penalty varies as follows:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

As such, the gradient of [F.sub.cdf] is given by

(3.4) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Finally, it is worth noting that in the continuum (where our variational models are developed), standard a-priori noise distributions (such as the Gaussian) are often supported over unbounded intervals (including the case D = R) whereas in the discretized and quantized observed image, the range of the image values as well as the additive noise is truncated. As such, the residual will also be bounded. In this case, it makes sense to consider a target distribution supported over a finite interval D. We will present a Monte Carlo based approach to develop such bounded noise models in a later section.

We notice that the distribution penalty in (3.3) is the fidelity functional proposed in (2.4) with the generic function f replaced by the residual n = [u.sub.0] - u and with [bar.P] denoting the known cumulative distribution of noise.

The admissible set H which defines the hard constraint for the proposed variational problem (2.5) is defined as follows:

(3.5) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

In fact, if a residual [u.sub.0] - u represents a realization of a noise with known distribution [bar.P] (z), its cumulative distribution function [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] will fit perfectly [bar.P] (z).

4. Discrete distribution constraints. In this section we illustrate how the distribution constraints on the residual n = [u.sub.0] - u can be applied in the case of discretized and quantized images. Without loss of generality, we will consider square d x d images.

In the following, we will define the discrete counterpart of the probability density function (3.1) and of the cumulative distribution function (3.2). To this purpose, first we need to quantize the range of possible image values z e D by introducing a partition of D into Q (non-overlapping) intervals, called bins, defined as [b.sub.i] := [[z.sub.i - 1], [z.sub.i]], [z.sub.i-1] < [z.sub.i]; i = 1, ..., Q, as illustrated in Figure 4.1.

The normalized histogram [h.sub.n] [member of] [R.sup.Q] of the residual n is defined as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [n.sub.i,j] denotes the value of n at pixel (i, j) and [I.sub.S] (*) denotes the indicator function of the set S, i.e., the function having value 1 inside the set S and 0 outside.

Summing up, we get the normalized cumulative histogram,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

We can define theoretical histograms also in this discrete and quantized setting. Given the known noise probability density function p and cumulative distribution function P and a range partition, we define the theoretical normalized histogram h for the residual as

[[bar.h].sub.[q]] = [bar.P]([z.sub.q]) - [bar.P]([z.sub.q-1]), q = 1, 2, ..., Q,

and the theoretical normalized cumulative histogram H as

[[bar.H].sub.[q]] = [q.summation over (i=1)] [bar.h][i], q = 1,2, ..., Q.

The noise models we consider are the additive white Gaussian noise (AWGN) and the additive white uniform noise (AWUN), whose probability density functions are, respectively,

(41) [p.sub.G](z) = [1/2[pi][sigma]] (-[1/2][(z-u/[sigma]).sup.2])

and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where [mu] and [sigma] denote the mean and the standard deviation. To highlight the difference between the discrete and the continuous case, we show in Figure 4.2 four images (50 x 50 pixels) which represent different realizations from a standard (i.e., zero-mean and unit-variance) AWGN and AWUN (first row), the associated normalized histograms (second row), normalized cumulative histograms (third row), and theoretical histograms (fourth row). We notice that for both the AWGN and the AWUN, the two realizations yield quite different histograms and (though much less visible in the figures) cumulative histograms. Hence, in contrast with the continuous case, the distribution constraint can not be an equality constraint as in (3.5). Instead, we must constrain the histograms to reside within a band around the theoretical histograms.

We are ready to describe, in the discrete setting, the admissible set H which defines the hard constraint for the proposed variational problem (2.5) as

(4.2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [[bar.H].sub.L] and [[bar.H].sub.U] are the lower and upper limits, respectively, of the band around the theoretical histograms.

For the computation of the [[bar.H].sub.L] and [[bar.H].sub.U] limits, we propose the following simple Monte Carlo approach. Given a target probability density function for the noise and a selected image dimension d, we generate a large number of d x d different noise realizations by sampling from the target distribution. For each realization, we compute the normalized histogram and the cumulative histogram. Finally, we compute the minimum and the maximum values within all the realizations for all the histogram bins. In Figure 4.3 we report the results for the AWGN and AWUN distributions obtained by using four different image dimensions d = 5,10,20, 50. First, we notice that the bands for the cumulative histograms are narrower than those for the histograms. This means that the random fluctuations of the cumulative histogram values over different realizations are smaller than those for the histogram values. For this reason, we choose to constrain the distribution of the residual by means of the cumulative histogram. Second, the size of the bands gets smaller as the image dimension d increases. In other words, the bigger the image dimension d is, the narrower the band of the distribution constraint will be. The histograms converge to the theoretical ones as d tends to infinity.

5. ADMM for the constrained minimization problem. To solve the proposed constrained minimization problem in (2.5) with the admissible set H defined in (4.2), we adapt the well known ADMM optimization technique [9] to our framework. We first introduce two auxiliary variables t and r to reformulate the minimization problem into the equivalent form,

(5.1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where the auxiliary variable t is introduced to transfer the discrete gradient operator D out of the non-differentiable term [parallel] * [[parallel].sub.2] and the variable r plays the role of the residual [u.sub.0] - u within the distribution constraint so that the constraint is now imposed on r instead of u.

To solve (5.1), we define the augmented Lagrangian functional and seek its stationary points

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [[beta].sub.t] > 0, [[beta].sub.r] > 0 are the scalar penalty parameters and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are the vectors of Lagrangian multipliers.

Starting at u = [u.sup.k], [[lambda].sub.t] = [[lambda].sup.k.sub.t], [[lambda].sub.r] = [[lambda].sup.k.sub.r], the ADMM iterative scheme applied to the solution of (5.1) reads as follows:

(5.2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(5.3) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(5.4) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [gamma] is a relaxation parameter chosen in the interval (0, ([square root of (5 + 1)/2))] as analyzed in [12]. The two minimization sub-problems (5.2) and (5.4) for t and u can be easily solved by [d.sup.2] two-dimensional shrinkage operations and by the fast solution of a [d.sup.2] x [d.sup.2] linear system [9], respectively. More attention must be payed to the sub-problem (5.3) for r that, when made explicit, reads as

(5.5) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [p.sub.H[*]] denotes the Euclidean projection onto the distribution set H defined in (4.2). Computing the solution of (5.5) is not as straightforward as for the sub-problems (5.2) and (5.4), due to the characteristics of the set H onto which we project. However, from the definitions of the distribution set in (4.2), we notice that the complicated constraint on the auxiliary variable r can be turned into a simple box-constraint on its cumulative histogram H(r), that is, H(r) must belong to the set,

(5.6) B = {H(r) [member of] [R.sup.Q] : [[bar.H].sub.L] [less than or equal to] H(r) [less than or equal to] [[bar.H].sub.U]},

where HL, Hu represent the lower and upper limits of the band around H. This simpler set allows us to solve the overall optimization problem using ADMM with a new auxiliary variable v = H(r). In particular, the minimization problem in (5.1) is rewritten as

(5.7) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The augmented Lagrangian functional associated with (5.7) is

(5.8) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [[beta].sub.t], [[beta].sub.r], [[beta].sub.v] > 0 are the scalar penalty parameters and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are the vectors of Lagrangian multipliers.

Solving (5.7) is thus equivalent to search for the solutions of the saddle point problem,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

with L defined in (5.8) and where, for simplicity of notation, we set [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Starting at u = [u.sup.k], [[lambda].sub.t] = [[lambda].sup.k.sub.t], [[lambda].sub.r] = [[lambda].sup.k.sub.r], [[lambda].sub.v] = [[lambda].sup.k.sub.t], the ADMM iterative scheme applied to the solution of (5.7) reads as follows:

(5.9) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(5.10) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(5.11) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(5.12) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(5.13) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

In the following subsections, we show how to solve the four sub-problems (5.9)-(5.12) and then we present the iterative ADMM-based minimization algorithm.

5.1. Solving the sub-problem for t. Given the definition of the augmented Lagrangian functional in (5.8), the minimization sub-problem for t in (5.9) can be written as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(5.14) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(5.15) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Note that in (5.14), we have omitted the constant terms while in (5.15) we have written the functional to be minimized in an explicit component-wise form. The minimization in (5.15) is equivalent to the [d.sup.2] two-dimensional problems

(5.16) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Following [9] and setting

(5.17) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

the solution of (5.16) is given explicitly by the [d.sup.2] two-dimensional shrinkages

(5.18) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

where 0 x (0/0) = 0 is assumed. We notice that the computational cost of (5.17)-(5.18) is linear with respect to the number of pixels [d.sup.2].

5.2. Solving the sub-problem for r. The minimization sub-problem for r in (5.10) can be rewritten as

(5.19) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

In order to compute the gradient of the objective function in (5.19), we can exploit the result we obtained for the continuous case in (3.4), where H(r) is the discrete counterpart of the cumulative distribution function Pf (f), while the constant vector [v.sup.k] - [1/[[beta].sub.v]] [[lambda].sup.k.sub.v] plays the role of the target cdf [bar.P] (f). Stationary points of (5.19) are thus obtained by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The gradient vanishes for r-values obtained by a few steps of this simple fixed-point iteration

(5.20) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

5.3. Solving the sub-problem for v. The minimization sub-problem for v in (5.11) is

(5.21) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The solution of (5.21) is thus given by a simple Euclidean projection of the vector H([r.sup.k+1]) + [1/[[beta].sub.v]] [[lambda].sup.k.sub.v] onto the box-constraints defined by the set B in (5.6),

[v.sup.k+1] = [P.sub.B][H([r.sup.k+1]) + [1/[[beta].sub.v]] [[lambda].sup.k.sub.v].

This projection can be obtained in a straightforward manner by computing the following Q component-wise projections, one for each histogram bin,

(5.22) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The computational complexity of this sub-problem is clearly linear in the number of bins Q.

5.4. Solving the sub-problem for u. The minimization sub-problem for u in (5.12) can be rewritten as

(5.23) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The problem (5.23) is a quadratic optimization problem whose optimality conditions are

(5.24) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Under periodic boundary conditions for u, the coefficient matrix [D.sup.T] D + [[[beta].sub.r]/[[beta].sub.t]]I is block circulant with circulant blocks and thus it is diagonalizable by the 2D discrete Fourier transform (FFT implementation). Therefore, (5.24) can be solved by one forward FFT and one inverse FFT, each at a cost of O([d.sup.2] log d).

5.5. ADMM iterative scheme. To solve the proposed constrained minimization problem (5.7), we use the ADMM iterative scheme reported in Algorithm 1. In general, choosing good values for the [beta] penalty parameters in ADMM-based optimization is a difficult and sensitive problem. Some criteria for the automatic tuning of these parameters along iterations have been proposed in the literature; see, e.g., [3] and the references therein. In this work, as will be detailed in the next experimental section, we preferred to hand-tune these parameters.

Algorithm 1 ADMM for the proposed distribution constrained problem (5.7). Input: [u.sub.0], [[beta].sub.t] > 0, [[beta].sub.r] > 0, [[beta].sub.v] > 0 Output: approximate solution [u.sup.*] of (5.7) 1. Initialize: [u.sup.0] = [u.sup.0], [r.sup.0] = 0, [[lambda].sup.0.sub.t] = 0, [[lambda].sup.0.sub.r] = 0, [[lambda].sup.0.sub.v] = 0; 2. For k = 0, 1, 2, ... until convergence: 1) compute [t.sup.k+1] according to (5.17)-(5.18) 2) compute [r.sup.k+1] according to (5.20) 3) compute [v.sup.k+1] according to (5.22) 4) compute [u.sup.k+1] by solving (5.24) 7) compute [[lambda].sup.k+1.sub.t], [[lambda].sup.k+1.sub.r], [[lambda].sup.k+1.sub.v] by (5.13) End For

6. Computed examples. In this section we demonstrate the usefulness of the proposed constraint on the distribution of the residual in image denoising by illustrating the performance of the proposed algorithm on both real and synthetic 2D images corrupted by additive zero-mean white noise with known distribution. Additive noise is a good model for the thermal noise within photoelectric sensors and the term "white" noise identifies a noise which is spatially uncorrelated: the noise for each pixel is independent and identically distributed. The additive zero-mean white noise models we considered are the Gaussian noise (AWGN) defined in (4.1) and the uniform noise (AWUN) in a given interval [-[sigma][square root of (3)], [sigma][square root of (3)] whose probability density function is given in (4). Even if uniform noise is not often encountered in real-world imaging systems, it provides a useful comparison with Gaussian noise.

We compare the proposed algorithm, referred to as TV-CDF, with the well-known RudinOsher-Fatemi (ROF) model [21] based on the minimization of the [TV-L.sub.2] functional (2.3). The [TV-L.sub.2] approach is implemented by the Alternating Direction Method (ADM), a variant of the classic augmented Lagrangian method for structured optimization which reformulates a TV problem as a linear equality constrained problem. The ADMTV algorithm is stable, efficient, and, in particular, faster than most of the state-of-the-art denoising algorithms. The package for ADM[TV-L.sub.2] is freely available (2) and described in detail in [25].

The regularization parameter of the [TV-L.sub.2] model is adjusted so that the solution is guaranteed to satisfy the discrepancy principle, that is, the variance of the residual is equal to the variance of the noise. For the proposed TV-CDF model, we used the ADMM minimization procedure illustrated in Section 5 with the following parameters setting fit = 10, fir = 30, [[beta].sub.v] = 1, [gamma] = 1. These parameters have been hand-tuned so as to guarantee fast convergence for all the considered experiments. The iterations of the two algorithms are stopped as soon as the relative difference between two successive iterates satisfies the termination criterion [parallel][u.sup.k] - [u.sup.k-1][[parallel].sup.2.sub.2]/[parallel][u.sup.k-1][[parallel].sup.2.sub.2] < [10.sup.-5].

The accuracy of the methods is evaluated by the Root Mean Squared Error (RMSE) defined as

RMSE (u, [bar.u]) := [square root of ([parallel]u - [bar.u][[parallel].sup.2.sub.2]/[d.sup.2])],

where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is the computed approximation of the desired noise-free image [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. This quantity provides a quantitative measure of the quality of the restored image u. A small RMSE value indicates that u is an accurate approximation of u; we recall, however, that the RMSE values are not always in agreement with visual perception.

We consider the restoration of three different images: barbara (d = 512), test (d = 256) and skyscraper (d = 256) which present interesting mixtures of textures, flat regions, and shaded areas. The noise-free versions of the images are depicted in Figure 6.1(a), Figure 6.2(a), and Figure 6.3(a), respectively.

Tables 6.1 and 6.2 show the quantitative results of the comparison for the two considered noise types, i.e., AWGN and AWUN, respectively. In particular, the tables report the RMSE values obtained by applying ROF ([TV-L.sub.2]) and our (TV-CDF) algorithm on the test images corrupted by noise with different standard deviations a = 10, 20, 30. The bold numbers in the tables indicate the better (lower) RMSE values obtained between the two methods.

The results in Tables 6.1 and 6.2 demonstrate that the proposed TV-CDF model outperforms the [TV-L.sub.2] approach in terms of obtained RMSE values on the three test images for all the considered noise levels.

The quantitative results shown in Tables 6.1 and 6.2 are validated by the visual inspection of the image restorations illustrated in Figures 6.1-6.3 for the AWGN corruption and in Figure 6.4 for the AWUN corruption.

The results for the first example, i.e., images barbara, test, and skyscraper corrupted by zero-mean AWGN with standard deviation a = 10 are illustrated in Figures 6.1-6.3. In particular, the noise-free images are depicted in Figures 6.1-6.3(a), the noise-corrupted images are shown in Figures 6.1-6.3(b), the images restored by [TV-L.sub.2] and by TV-CDF are given in Figures 6.1-6.3(c) and Figures 6.1-6.3(d), respectively. In the last rows of Figures 6.1-6.3, we show the normalized histograms of the residuals associated with the restored images depicted in Figures 6.1-6.3(c) and Figures 6.1-6.3(d).

The effect of the proposed constraint is evident from the residual histograms. In fact, the residual histograms obtained by the proposed TV-CDF method and displayed in Figures 6.1-6.3(f) adhere more faithfully to the target Gaussian probability density function (red curve) than those reported in Figures 6.1-6.3(e) and computed by the [TV-L.sub.2] model.

The results for the second example, i.e., images barbara, test, and skyscraper corrupted by zero-mean AWUN with standard deviation a = 30, are illustrated in Figure 6.4. In particular, in Figure 6.4(a) we show the noise-corrupted images while in Figures 6.4(b)-(e) we show zoomed details of the noisy images, of the original noise-free images, of the images restored by [TV-L.sub.2], and of the images restored by TV-CDF, respectively. This experiment demonstrates that the proposed constrained denoising approach is effective also for non-Gaussian noise models.

The restoration results illustrated in the examples obtained by the TV-CDF method together with the results in Table 6.1 and Table 6.2 allow us to conclude that by exploiting the information present in the components of the residual vectors, such as in this case the noise distribution, we can go beyond the use of just the residual norm or the variance of the noise model, thus providing better restorations.

A proof of convergence is beyond the scope of this paper and will be investigated in future work. However, in Figure 6.5 we provide empirical evidence of the numerical convergence of the proposed algorithm for the example illustrated in Figure 6.1. In particular, in Figure 6.5 the relative change of the approximate solution computed as [parallel][u.sup.k] - [u.sup.k-1] [[parallel].sub.2]/ [parallel][u.sup.k-1][[parallel].sup.2] is shown as a function of the iteration count k. The same convergent behavior can be observed for all the other tested examples.

7. Conclusions. In this paper we have proposed a hard-constrained variational model aimed at denoising images corrupted by identically distributed additive noise with known probability density function. In particular, our model is obtained by enforcing explicitly the constraints on the similarity between the residual distribution and the target noise distribution. The classical TV-seminorm term is considered but any other regularizer could be substituted as well. The constrained variational problem is efficiently solved by using the popular ADMM procedure suitably adapted to our particular model. We have compared our model with the well-known ROF model by applying them to a few image restoration examples. The experiments show that the TV-CDF based method outperforms the [TV-L.sub.2] based method for preserving better contrast and more details in the denoising process. Robust preliminary estimation of the noise distribution, i.e., of the target residual histogram to be enforced within our approach, will be a matter of future work.

Acknowledgments. This work has been partially supported by GNCS-INDAM, 2013, and the ex60% project by the University of Bologna "Funds for selected research topics". The authors thank the referees for providing constructive comments and help in improving the contents of this paper.

REFERENCE

[1] J. BAI AND X. C. FENG, Fractional-order anisotropic diffusion for image IEEE Trans. Image Process., 16 (2007), pp. 2492-2502.

[2] F. BAUER and M. A. Lukas, Comparing parameter choice methods for regularization of ill posed problems, Math. Comput. Simulation, 81 (2011),pp. 1795-1841.

[3] S. BOYD, N. Parikh, E. CHU, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn., 3 (2011), pp. 1122.

[4] A. BUADES, B. COLL, and J. M. Morel, A review of image denoising algorithms, with a new one, Multiscale Model. Sim., 4 (2005), pp. 490-530.

[5] D. CALVETTI, S. MORIGI, L. REICHEL, AND F. SGALLARI, An L-ribbon for large underdetermined linear discrete ill-posed problems, Numer. Algorithms, 25 (2000), pp. 89-107.

[6] V. CASELLES, R. KIMMEL, AND G. SAPIRO, Geodesic Active Contours, Int. J. Comput. Vis., 22 (1997), pp. 61-79.

[7] A. CHAMBOLLE AND P. L. LIONS, Image recovery via total variation minimization and related, Numer. Math., 76 (1997), pp. 167-188.

[8] R. CHAN, M. NIKOLOVA, and Y. W. WEN, A variational approach for exact histogram specification, in Scale Space and Variational Methods in Computer Vision, A. M. Bruckstein, B. M. ter Haar Romeny, A. M. Bronstein, M. M. Bronstein, eds., Lecture Notes in Computer Science, 6667, Springer, Heidelberg, 2012, pp. 86-97.

[9] R. CHAN, M. Tao, and X. M. Yuan, Constrained total variational deblurrring models and fast algorithms based on alternating direction method of multipliers, SIAM J. Imaging. Sci., 6 (2013), pp. 680-697.

[10] I. DAUBECHIES AND G. TESCHKE, Variational image restoration by means of wavelets: Simultaneous de composition, deblurring, and denoising, Appl. Comput. Harmonic Anal., 19 (2005), pp. 1-16.

[11] S. DIDAS, J. WEICKERT, AND B. BURGETH, Properties of higher order nonlineardiffusionfiltering, J. Math. Imaging Vision, 35 (2009), pp. 208-226.

[12] R. GLOWINSKI, Numerical Methods for Nonlinear Variational Problems, Springer, New York, 1984.

[13] L. JIANG, X. C. Feng, and H. Q. Yin, Variational Image Restoration and Decomposition with Curvelet Shrinkage, J. Math. Imaging Vision, 30 (2008), pp. 125-132.

[14] A. LANZA, S. MORIGI, F. SGALLARI, AND A. YEZZI, Variational image denoising based on auto-correlation whiteness, SIAM J. Imaging. Sci., 4 (2013), pp. 1931-1955.

[15] S. MORIGI, R. PLEMMONS, L. REICHEL, AND F. SGALLARI, A hybrid multilevel-activeset method for large box-constrained discrete ill-posed problems, Calcolo, 48 (2011), pp. 89-105.

[16] S. MORIGI, L. REICHEL, AND F. SGALLARI, A truncated projected SVD method for linear discrete ill-posed problems, Numer. Algorithms, 43 (2006), pp. 197-213.

[17] S. NARASIMHAN AND S. NAYAR, Contrast restoration of weather degraded, IEEE Trans. Pattern Anal. Mach. Intell., 25 (2003), pp. 713-724.

[18] M. NIKOLOVA, M. K. Ng, AND C. P. Tam, Fast nonconvex nonsmooth minimization methods for image restoration and reconstruction, IEEE Trans. Image Process., 19 (2010), pp. 3073-3088.

[19] P. PERONA AND J. MALIK, Scale-space and edge detection using anisotropic diffusion, IEEE Trans. Pattern Anal. Mach. Intell., 12 (1990), pp. 629-639.

[20] L. REICHEL, F. SGALLARI, AND Q. Ye, Tikhonov regularization based on generalized Krylov subspace methods, Appl. Numer. Math., 62 (2012), pp. 1215-1228.

[21] L. I. RUDIN, S. OSHER, AND E. FATEMI, Nonlinear total variation based noise removal algorithms, Phys. D, 60 (1992), pp. 259-268.

[22] B. W. RUST and D. P. O'Leary, Residual periodograms for choosing regularization parameters for ill posed problems, Inverse Problems, 24 (2008), 034005 (30 pages).

[23] G. SAPIRO AND V. CASELLES, Histogram modification via partial differential equations, in Proc. IEEE Inter national Conference on Image Processing vol. 3, IEEE Conference Proceedings, Los Alamitos, CA, 1995, pp. 632-635.

[24] P. SWOBODA AND C. SCHNORR, Convex variational image restoration with histogram priors, SIAM J. Imag ing Sci., 6 (2013), pp. 1719-1735.

[25] M. Tao AND J. Yang, Alternating direction algorithmsfor total variation deconvolution in image reconstruc tion, Tech. Rep. TR0918, Department of Mathematics, Nanjing University, China, 2009.

[26] L. A. VESE AND S. J. OSHER, Modeling textures with total variation minimization and oscillatory patterns in image processing, J. Sci. Comput., 19 (2003), pp. 553-572.

[27] M. WELK, D. THEIS, T. BROX, AND J. WEICKERT, PDE-based deconvolution with forward-backward dif fusivities and diffusion tensors, in Scale-Space and PDE Methods in Computer Vision, R. Kimmel, N. Sochen, J. Weickert, eds., Lecture Notes in Computer Science, 3459, Springer, Berlin, 2005, pp. 589-597.

ALESSANDRO LANZA ([dagger]), SERENA MORIGI ([dagger]), FIORELLA SGALLARI ([dagger]), AND ANTHONY J. YEZZI ([double dagger])

* Received October 16, 2013. Accepted March 7, 2014. Published online on May 12, 2014. Recommended by Bob Plemmons.

([dagger]) Department of Mathematics-CIRAM, University of Bologna, Bologna, Italy ({alessandro.lanza2, serena.morigi, fiorella.sgallari}@unibo.it).

([double dagger]) School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA (ayezzi@ece.gatech.edu).

(1) Technically, the distribution of the "noise process" is known with the actual noise being a realization of this process. If the process is spatially ergodic, then the histogram of the realized noise can be expected to resemble the distribution of the noise process itself.

(2) http://www.caam.rice.edu/~optimization/L1/ftvd/v4.0/

Table 6.1 RMSE values obtained by the compared denoising algorithms on the images barbara, test, skyscraper corrupted by zero-mean AWGN with standard deviation [sigma]. barbara test [sigma] TV-[L.sub.2] TV-CDF TV-[L.sub.2] TV-CDF 10 8.09 7.11 7.41 6.82 20 12.51 11.57 11.00 10.34 30 15.13 14.56 13.55 13.02 skyscraper [sigma] TV-[L.sub.2] TV-CDF 10 8.48 7.54 20 13.85 12.72 30 17.80 16.72 Table 6.2 RMSE values obtained by the compared denoising algorithms on the images barbara, test, skyscraper corrupted by zero-mean AWUN with standard deviation [sigma]. barbara test [sigma] TV-[L.sub.2] TV-CDF TV-[L.sub.2] TV-CDF 10 8.01 6.98 7.28 6.60 20 12.32 11.27 10.76 10.00 30 14.93 14.17 13.21 12.56 skyscraper [sigma] TV-[L.sub.2] TV-CDF 10 8.41 7.43 20 13.73 12.48 30 17.64 16.34

Printer friendly Cite/link Email Feedback | |

Author: | Lanza, Alessandro; Morigi, Serena; Sgallari, Fiorella; Yezzi, Anthony J. |
---|---|

Publication: | Electronic Transactions on Numerical Analysis |

Article Type: | Report |

Geographic Code: | 1USA |

Date: | Aug 1, 2014 |

Words: | 6877 |

Previous Article: | Approximating optimal point configurations for multivariate polynomial interpolation. |

Next Article: | Implicitly restarting the LSQR algorithm. |

Topics: |