# Efficient MCMC-based image deblurring with Neumann boundary conditions.

1. Introduction. In applications such as astronomy, medicine, physics and biology, digital images are used by scientists and practitioners to record and analyze unique events. Environmental effects and imperfections in the imaging system can cause the recorded images to be degraded by blurring and noise. Unfortunately, it is not always possible to repeat the process used to record the image to obtain "better pictures"; for example, it could be too costly to repeat a particular experiment, or it may not be physically possible to repeat the event that was observed. In such cases, computational post processing techniques, called image deblurring, are used to improve the resolution of the image.

Image deblurring is typically modeled as a linear inverse problem. Suppose, is a function describing the true d-dimensional image; e.g., for a plane image containing pixels, d=2 .The image formation process, which includes blurring and noise, is modeled by an integral equation,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where, s, t [member of] [R.sup.d], b(s) is a function that represents the observed image, [eta](s) represents additive noise, and [OMEGA] [subset] [R.sup.d] is the computational domain. The kernel a(s,t) is a function that specifies how the points in the image are distorted, and is therefore called the point spread function (PSF). The inverse problem of image deblurring is: given a and 6, compute an approximation of x. If the kernel has the property that a(s,t) = a(s-t), then the PSF is said to be spatially invariant; otherwise it is said to be spatially variant. In the spatially invariant case, the blurring operation,[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], is a convolution operation, and thus the in corresponding inverse problem of computing an approximation of x from b and a, is called deconvolution.

In a realistic problem, images are collected only at discrete points (i.e., pixels for 2D images and voxels for 3D images), and are also only available in a finite bounded region [OMEGA]. It is, therefore, typical to work directly with the discrete linear model,

(1.1) b = Ax + [eta],

where x, b [eta] and are real valued vectors obtained by discretizing functions x, b, and [eta], and n is the number of pixels (or voxels) in the discrete image. is a real valued n x n matrix that arises when approximating the integration operation with a quadrature rule, and it usually has structure (e.g., Toeplitz, circulant, Hankel, etc.) that can be exploited in computations.

Our approach for solving the discrete inverse problem (1.1) is statistically motivated. Specifically, we assume that [eta] is an n x 1 independent and identically distributed (iid) Gaussian random vector with variance [[lambda].sup.-1] ([lambda] is known as the precision) across all pixels, and that the probability density function for (1.1) is given by

(1.2) p(b|x, [lambda]) [varies] exp (-[lambda]/2[[parallel]Ax - b[parallel].sup.2],

where [varies] denotes proportionality. However, it is important to note that when attempting to solve inverse problems, the maximizer of the likelihood L(x|b, [lambda]) = p(b|x, [lambda]) with respect to x is unstable with respect to the noise contained in. This instability is a characteristic of inverse problems, such as deconvolution, and it has to do with the fact that the forward mapping (convolution) is a compact operator defined on a function space . The standard technique for overcoming such instability is regularization, which is treated in detail in several references [5, 7, 8, 10, 20].

In the context of Bayesian statistics, regularization corresponds to the choice of the prior probability density function p(x|[delta], where [delta] > 0 is a scaling parameter. In our case, we use a Gaussian Markov random field (GMRF) to model the prior, which yields

(1.3) p(x|[delta]) [varies exp (-[delta]/2 [x.sup.T] Lx),

where the precision matrix [delta]L is sparse and encodes distributional assumptions regarding the values of [x.sub.i] conditioned on the values of its neighbors, for all .

Bayes' Theorem states that the posterior probability density function p(x|b, [lambda], [delta]) can be expressed as

(1.4) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Maximizing (1.4) with respect to x yields the so-called maximum a posteriori (MAP) estimator. By equivalently minimizing the negative-log of (1.4), we see that corresponds to the regularization matrix in classical inverse problems, while corresponds to the regularization parameter .

In this paper we extend the Bayesian formulation (1.4) and compute samples from the resulting posterior density function using the Markov Chain Monte Carlo (MCMC) method of . Bayes' Law (1.4) is extended by assuming Gamma distributed hyper-priors on [lambda] and [delta] i.e.,

(1.5) p([lambda][varies][[lambda].sup.[alpha][lambda]-1]exp(-[beta].sub.[lambda]][lambda]),

(1.6) p([delta][varies][[delta].sup.[alpha][delta]-1]exp(-[beta].sub.[delta]][delta]),

with [[alpha].sub.[lambda]] = [[alpha].sub.[delta]] = 1 and [[beta].sub.[lambda]] = [[beta].sub.[delta]] = [10.sup.4] which have mean and variance [alpha]/[beta] = [10.sup.4] and [alpha]/[[beta].sup.2], respectively. Note that yields exponential distributed hyper-priors, however we present the full Gamma hyper-prior here because it is a conjugate distribution and other choices for and may be advantageous in other situations. Given the large variance values, the hyper-priors should have a negligible effect on the sampled values for [lambda] and [delta].

With (1.2), (1.3), (1.5), and (1.6) in hand, through Bayes' Law the posterior probability density has the form

(1.7) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The prior and hyper-priors were chosen to be conjugate , which guarantees that the full conditional densities have the same form as the corresponding prior/hyper-prior; specifically, note that

(1.8) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(1.9) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(1.10) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where N and [GAMMA] denote Gaussian and Gamma distributions, respectively.

The power in (1.8)-(1.10) lies in the fact that samples from these three distributions can be computed using standard statistical software, and a Gibbsian approach can be applied to (1.8)-(1.10) yielding the MCMC method of  for sampling from (1.7).

A MCMC Method for Sampling from p(x,[delta],[lambda]|b).

0. Initialize [[delta].sub.0],and [[lambda].sub.0], and set k=0;

1. Compute [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

2. Compute [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

3. Compute [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

What makes this MCMC method interesting in the case of image deblurring is that computing the image samples in Step 1 requires the solution of the large linear system

(1.11) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

In this paper, we assume Neumann boundary conditions for the image and symmetric PSFs. The matrices and are then diagonalizable by the discrete cosine transform (DCT), which we will denote by C, making the computation of [x.sup.k] in Step 1 extremely efficient. Specifically, if A = [C.sup.T][LAMBDA]C, and L = [C.sup.T]SC, given that C is an orthogonal matrix we have

[[lambda].sub.k][A.sup.T]A + [[delta].sub.k]L = [C.sup.T] ([[lambda].sub.k] [[LAMBDA].sup.2] = [[delta].sub.k]S)C

Substituting this into (1.11) yields

(1.2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where v ~ N (0,I), which follows from the fact that if M [member of] [R.sup.nxn] and y = Mv then y ~ N (0,M[M.sup.T]). Since A and S are computed off-line, and because multiplication by C and [C.sup.T] is computed efficiently using a fast DCT algorithm , the computation of [x.sup.k] via (1.12) is very efficient.

The paper is organized as follows. We begin with a discussion of Neumann boundary conditions for image deblurring problems in Section 2. Then in Section 3, we discuss GRMFs generally, introduce the one that we use, and show that if Neumann boundary conditions are assumed the precision matrix can be diagonalized using the DCT. Finally, in Section 4, we show results of the method and compare it with other standard approaches. Concluding remarks are given in Section 5.

2. The Neumann boundary condition for convolution problems. As has been stated, we are interested in the case in which the image x is assumed to have Neumann, or reflective, boundary conditions. To illustrate what we mean by this, we begin by considering 1D image deconvolution. In this case, the unknown image x = ([x.sub.1], ...,[x.sub.n]) can be extended spatially to create the vector

[??] = [[x.sub.-n+1], ...,[x.sub.0],[x.sub.1], ..., [x.sub.n], ...,[x.sub.2n]].sup.T]

The matrix A is defined in terms of the convolution kernel

a = [[[a.sub.-n], [a.sub.-n+l], ...,[a.sub.0], [a.sub.1], ..., [a.sub.n]].sup.T],

and the noise-free data vector is obtained via discrete convolution:

(2.1) [b.sub.i] = [i+n.summation over (j=i-n)] [a.sub.i-j][x.sub.j], for i = 1, ...,n,

or in matrix-vector notation,

(2.2)[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The key observation here is that the value for near 1 will depend upon the region of to the left of the computational domain, i.e.,[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] while for near n,[b.sub.i] will depend upon the region of [??] to the right of the computational domain, i.e., .

Rather than estimating these extra values by solving the underdetermined system (2.2), the standard approach is to make assumptions about these values based on a priori knowledge, or by relating the values to those within the computational domain. These assumptions are called boundary conditions. For example, a zero (or Dirichlet) boundary condition corresponds to the assumption that ([x.sub.-n+1], ...,[x.sub.0] = ([x.sub.n+1] ,...,[x.sub.2n]) = 0 which yields a Toeplitz matrix A ; while a periodic boundary condition corresponds to ([x.sub.-n+1], ...,[x.sub.0]) = (x1, ... ,[x.sub.n] and ([x.sub.n+1], ...,[x.sub.2n]) = ([x.sub.1], ..., [x.sub.n]) which yields a circulant matrix that can be diagonalized by the discrete Fourier transform (DFT) .

The Neumann boundary condition corresponds to a reflection of the signal about the boundaries, i.e.,([x.sub.-n+1], ...,[x.sub.0]) = ([x.sub.n], ..., [x.sub.n]) and ([x.sub.n+1], ...,[x.sub.2n]) = ([x.sub.n], ...,[x.sub.1]). In this case, the resulting matrix has Toeplitz-plus-Hankel structure, and if the convolution kernel a is symmetric, i.e. [a.sub.i] = [a.sub.-i], then A can be diagonalized by the discrete cosine transform (DCT) [15, Theorem 3.2]. We note that while a Toeplitz matrix is one for which each descending diagonal from left to right is constant, a Hankel matrix is one for which each descending anti-diagonal from right to left is constant.

We started with the 1D example in order to illustrate concepts more simply, but our primary interest is two-dimensional (2D) image deblurring. In this case, and are obtained by column-stacking the arrays N x N arrays and B and X, which we denote as b = vec (B) and x = vec(X); and A is defined in terms of the N x N convolution kernel a - [{[a.sub.ij]}.sup.N.sub.i,j] =-N, with some assumed boundary condition. The noise-free N x N data array then satisfies the 2D discrete convolution equation

(2.3) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

In the 2D case, for zero and periodic boundary conditions, the extensions of are represented, respectively, by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

In both cases the central corresponds to the unknowns within the computational domain (i.e., the field of view). The assumption of zero boundary conditions results in a matrix A that is block Toeplitz with Toeplitz blocks , while periodic boundary conditions result in a matrix that is block circulant with circulant blocks and can be diagonalized by the 2D-DFT .

In instances where the zero and/or periodic extensions are poor approximations of reality, unnatural artifacts in reconstructions can result. This is particularly the case when X and B contain regions of relative high and variable intensity near the boundaries of the computational domain. In such instances, the reflective extension of X, corresponding to Neumann boundary conditions, works significantly better. It is represented by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [X.sub.v] is the image that results from flipping X across its central vertical axis; [X.sub.h] is the image that results from flipping X across its central horizontal axis; and [X.sub.vh] is the image that results from flipping X across its vertical then horizontal axes. The Neumann boundary condition leads to a matrix A that is block Toeplitz-plus-Hankel with Toeplitz-plus-Hankel blocks (BTHTHB), and provided the kernel a is symmetric, i.e.,

(2.4) [a.sub.i,j] = [a.sub.-i,j] = [a.sub.i-j] = [a.sub.-i,-j]

A can be diagonalized by the 2D-DCT [15, Theorem 3.3]. We will use such a BTHTHB matrix in our numerical experiments below, while acknowledging that (2.4) is somewhat restrictive.

An alternative to using one of the above three artificial boundary conditions is to reconstruct the object on an extended field of view, imposing a periodic boundary condition on the extended object, and then mask the reconstruction (restricting to the field of view) to remove the boundary artifacts. This approach has been used by various authors [2, 11, 16, 19]. It has the benefit that it does not impose artificial boundary conditions at the boundary of the field of view, hence boundary artifacts do not appear in reconstructions, and moreover, it requires no restrictions on the PSF, such as the requirement of symmetry for the Neumann boundary condition. However, the presence of the mask matrix in the model removes the periodic structure from the problem, so that the resulting matrix is not diagonalizable by a fast transform. Hence an iterative method must be used to approximately solve (1.11), yielding the sample [x.sub.k] in Step 1 of the MCMC method. As a result this alternative boundary condition is more computationally intensive to implement than the Neumann boundary condition, which we focus on here.

2.1. Diagonalizing matrices with Toeplitz-plus-Hankel structure. As stated above, the Neumann boundary condition results in a matrix that was diagonalizable by the DCT. Specifically,

A = [C.sup.T][LAMBDA]C,

where [LAMBDA] is the n x n diagonal eigenvalue matrix, and C is the orthogonal DCT matrix. In 1D, C = [C.sub.1D] with

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where 1 [less than or equal to] i, j [less than or equal to] n. In the 2D case, C = [C.sub.2D] with [C.sub.2D] = [C.sub.1D] [cross product] [C.sub.1D], where '[cross product]' denotes Kronecker product. We note that both [C.sub.1D] and [C.sub.2D] are orthogonal matrices.

In practice, multiplication by C and [C.sup.T] is computed using the fast cosine transform function . In MATLAB, the syntax is as follows: in 1D, for v [member of] R"

[C.sub.1D] V = dct(v), [C.sup.T.sub.D]v = idct(v),

while in 2D, if V is an N x N array, and we define v = vec(V) and V = array(v),

array ([C.sub.2D]v) = dct2(V), array ([C.sup.T.sub.2D]) = idct2(V).

It remains to define the diagonal eigenvalue matrix [LAMBDA]. In both cases,

[LAMBDA] = diag (([Ca.sub.1] [empty set] ([Ce.sub.1]))

where [a.sub.1] is the first column of A, [e.sub.1] = [[1,0, ..., 0].sup.T], '[empty set]' denotes component-wise division, and C is [C.sub.1D] in 1D in C1D and C2D in 2D.

3. Defining the prior using Gaussian Markov random fields. We now turn to the definition of the prior (1.3), and specifically, of the precision matrix [delta]L. For this, we use Gaussian Markov random fields (GMRFs). A GMRF x = ([x.sub.1], ..., [x.sub.n]) is a specific type of Gaussian random vector. Thus x ~ N ([mu], [Q.sup.-1]) where [mu] [member of] [R.sup.n] is the mean of X, and Q is known as the n x n symmetric positive semi-definite precision matrix. Note that in (1.3), [mu] = 0 and Q = [delta]L. If Q has a zero eigenvalues is called an intrinsic Gaussian .

To define x more specifically, we need the notion of a labeled graph G = (V, [epsilon]). Here the set of pixels V = {1, ...,n} are the nodes of the graph, and [epsilon] is the set of edges {i, j}, where i,j [member of] V with i [not equal to] j. If {i,j} [member of] [epsilon] we will say that i and j are neighbors. Moreover, we define [[partial derivative].sub.i] to be the set of all neighboring nodes of i (note that i [not equal to] [[partial derivative].sub.I]),[n.sub.i] to be the number of elements in [[partial derivative].sub.i], and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]; = {[x.sub.j]| j [member of][[partial derivative].sub.i]}. We can now define a GMRF [17, Definition 2.1].

DEFINITION 3.1. A random vector x = ([x.sub.1], ...,[x.sub.n]) [member of] [R.sup.n] is called a GMRF with respect to a labeled graph G = (V, [epsilon]) with mean fj, and symmetric positive definite precision matrix Q if and only if its probability density function has the form

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where

[[Q].sub.ij] [not equal to] [??] j [member of] [[partial derivative].sub.i]

Note, therefore, that the precision matrix encodes the neighborhood structure; specifically, i and j are neighbors if and only if [[Q].sub.ij] [not equal to] 0.

The above definition of GMRFs is very general. What is extremely useful for us from a modeling perspective is that a GMRF prior p(x|[delta]) can be derived from statistical assumptions about the pixel-level conditional densities [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] for i = 1, ...,n. The idea of constructing the prior from the scalar conditional densities [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is known as conditional autoregression, and was pioneered by Besag . The following theorem for Gaussian conditional densities, and its proof, can be found in [17, Theorem 2.6].

THEOREM 3.2. Given the n Gaussian full conditional distributions with conditional mean and precision

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

then x is a GMRF with respect to with mean G = (V, [epsilon]) with mean [mu] and precision matrix Q, where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

provided [K.sub.i][[beta].sub.ij] - [K.sub.j][[beta].sub.ji], i [not equal to] j, and Q is positive definite.

3.1. A GMRF prior with Neumann boundary conditions. We now use Theorem 3.2 to construct our prior. We make the assumption that the conditional density [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is normal with mean equal to [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] the sum of the neighboring values of [X.sub.i], and an unknown precision scaled by the size of the neighborhood [n.sub.i]; specifically, we assume

(3.1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

From Theorem 3.2, we have that (3.1) yields a Gaussian joint density for x given by

(3.2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where

(3.3) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and the n - 1 appears due to the fact that L has rank n - 1 . Other possibilities for [[beta].sub.ij] and [K.sub.i] in Theorem 3.2 are discussed in , however in order to allow for the use the DCT for fast computations, [k.sub.i] and [[beta].sub.ij] must satisfy rather restrictive conditions.

Next, we construct L in two specific cases. First, we assume a uniform grid on [0,1] with n vertices {1, ...,n} at locations {[[s.sub.i]}.sup.n.sub.i=1], where [S.sub.i] = i/(n + 1). We define [X.sub.i] to be the intensity value at [S.sub.i] and assume the first-order neighborhood system: [partial derivative]1 = {2}, [[partial derivative].sub.i] = {i-1, i+1} for i = 2, ...,n-1 and [[partial derivative].sub.n] = (n-1). Thus from (3.3) we have

(3.4) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Note that is the discrete second derivative matrix with Neumann boundary conditions. Moreover, multiplication by [L.sub.1D] is equivalent to discrete convolution (2.1) with kernel l = [[0, ...,0,-1,2-1,0, ...,0].sup.T] assuming a Neumann boundary condition. Thus, since l is symmetric, [L.sub.1D] is a Toeplitz-plus-Hankel matrix that can be diagonalized by the DCT.

In 2D, we assume a uniform grid on [0,1] x [0,1] with n = [N.sup.2] vertices [{(i,j)}.sup.N.sub.i,j=1] at locations {[([s.sub.i],[t.sub.j])}.sup.n.sub.i,j=1], where [s.sub.k] = [t.sub.k] = k/(N + 1). Moreover, we define [x.sub.ij] to be the intensity value at ([s.sub.j], [t.sub.j]) for i,j = 1, ..., N, and assume the first-order neighborhood system:

[[partial derivative].sub.ij] = {(i - 1,j), (i + 1,j), (i,j - 1), (i,j + 1)}, for i,j = 2,...,N-1,

whereas if i or j is 1 or N, the vertices containing a 0 or N+1 are removed from [[partial derivative].sub.ij]; for example, [[partial derivative].sub.1j] = {(2,j), (1,j-1), (1,j+1)} and [[partial derivative].sub.11]. Note then that [n.sub.ij]= [absolute value of [[partial derivative].sub.ij]][member of]{2,3,4}

In the 2D case, the conditional autoregressive model (3.1) has the form

(3.5) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] After reordering the array [{[x.sub.ij]}.sup.N.sub.j=1] by stacking its columns to make the n x 1 vector x, i.e., x = vec(X) we obtain the precision matrix Q = [delta][L.sub.2D], with

(3.6) [L.sub.2D] = [L.sub.1D] [cross product] I + I [cross product] [L.sub.1D]

where [L.sub.1D] is defined in (3.4).

Note that [L.sub.2D] is the discrete 2D negative Laplacian matrix with Neumann boundary conditions. Moreover, multiplication by [L.sub.2D] is equivalent to discrete convolution (2.3) with the N x N kernel l defined by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

assuming a Neumann boundary condition. Thus, since the kernel l is symmetric, [L.sub.2D] is block Toeplitz-plus-Hankel with Toeplitz-plus-Hankel blocks (BTHTHB) matrix that can be diagonalized by the 2D-DCT [8, 15].

4. Numerical experiments. In this section, we implement the above MCMC method on an image deconvolution test case. Here the mathematical model is of the form

b(s,t)= [[integral].sup.1.sub.0] [[integral].sup.1.sub.0] a(s-s', t-t')x(s',t')ds'dt',

which we discretize using mid-point quadrature on an uniform computational grid over [0,1] x [0,1]. This yields a system of linear equations b = Ax. We assume that is a circular Gaussian convolution kernel, so that A has BTHTHB structure and can be diagonalized by the 2D-DCT.

The data b is generated using (1.1) with the noise variance [[lambda].sup.-1] chosen so that the noise strength is 2% that of the signal strength. In order to obtain noise-free data that is not corrupted by the Neumann BC assumption, we begin with an extended 256x256 true image, compute 2D discrete-convolution (2.3) assuming periodic BCs, and then restrict to the central sub-image to obtain Ax. The central 128x128 region of the image used to generate the data and the data b are shown in Figure 4.1.

4.1. Assessing MCMC chain convergence. Just as with an iterative method for optimization, a sampling method must be run to convergence. Convergence of an MCMC chain can be determined in a number of ways. The recommended approach presented in  requires the computation of multiple, parallel MCMC chains with randomly chosen starting points. With multiple chains in hand, a statistic for each sampled parameter is then computed whose value provides a measure of convergence.

This statistic is defined as follows. Suppose we compute nr parallel chains, each of length [n.sub.s] (after discarding the first half of the simulations), and that {[[psi].sub.ij]}, for i = 1,...,[n.sub.s] and j = 1,..., [n.sub.r], is the collection of samples of a single parameter. Then we define

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Note that [[bar.[psi]].sub..j] and [bar.[psi]].. are the individual chain mean and overall sample mean, respectively. Thus provides a measure of the variance between the chains, while provides a measure of the variance within individual chains.

The marginal posterior variance ([psi]|b) can then be estimated by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

which is an unbiased estimate under stationarity . The statistic of interest to us, however, is

(4.1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

which decreases to 1 as [n.sub.s] [right arrow] [infinity].

Once [??] is sufficiently 'near' 1 for all sampled parameters, the [n.sub.s][n.sub.r] samples are treated as samples from the target distribution . A value of 1.1 for [??] is deemed acceptable in . In what follows, we stop the MCMC chain once [??] drops below a pre-specified tolerance.

4.2. Numerical tests. Next, we reconstruct the image by sampling from the posterior density function p(x, [lambda], [delta]|b) defined in (1.7) using the above MCMC method. We computed 5 parallel MCMC chains and reached an value [??] of 1.03 when the length of the chains was 400, which took approximately 22.4 seconds. The initial values and [[lambda].sub.0] in Step 0 were chosen randomly from the uniform distributions U(5,10) and U(0,1/2), respectively. We plot the mean of the sampled images, with negative values set to zero, as the reconstruction on the upper-left in Figure 4.2. From the samples for [lambda] and [delta], on the upper-right in Figure 4.2, we plot histograms for [lambda], [delta], and the regularization parameter [alpha] = [delta]/[lambda], which has a 95% credibility interval [7.91 x [10.sup.-4], 9.07 x [10.sup.-4]]. Note that the noise precision used to generate the data, is contained within the sample 95% credibility interval for [lambda], [4.66, 4.89]. And finally, for this example, we also plot the MAP estimator computed with [alpha] taken to be the mean of the samples for. As with the sample mean, we set the negative values in the MAP estimator to zero.

It remains to quantify the uncertainty in x. First, we plot the standard deviation of the sampled values at each pixel in the lower-right in Figure 4.2; to give the reader some sense of the variability suggested by these images, we note that for a Gaussian, the 95% confidence interval is approximately two standard deviations either side of the mean. A more satisfactory approach for visualizing uncertainty in 2D is to create a movie of the image samples. We generate this movie in MATLAB, taking every 10th sample as a frame after the first half of all of the chains have been discarded. Another possible approach is to use the computed pixel-wise mean [[mu].sub.ij] and variance [[sigma].sub.ij] from the samples and then let the frames of the movie be samples from N([[mu].sub.ij], [[sigma].sub.ij]) for all ij. This is the approach taken in  and we present the results for our example in another movie, noting that since correlation between neighboring intensities is not modeled in this approach, the image appears more variable. Because of this, in our opinion, creating the movie from the MCMC samples is the better of the two approaches.

In order to see the effect of the boundary conditions, we compare the results with those obtained on the same data set using a periodic BC for the reconstruction step. We plot the results in Figure 4.3. On the left is the chain mean for samples of x after chain lengths of 1000 with the same initial [[delta].sub.0] and [[lambda].sub.0] values as in the previous example. Note the boundary artifacts. Also, the 95% quantile for [lambda] for this run was which does not contain the value [lambda] = 4.72 used to generate the data.

Finally, we compare the CPU time for our sample-based approach with that from the classical approach of estimating the regularization parameter using generalized cross validation (GCV) [8, 20] and computing the resulting estimator for x. Using the Neumann boundary condition and the 2D-DCT, the GCV estimation rule can be implemented extremely efficiently. On the left in Figure 4.3 is the GCV solution, which took 0.05 seconds to compute. Recall that the sample-based solution above required 5 chains of length 400 each to reach [??] = 1.03 and took approximately 22 seconds. The efficiency of the sample based approach can be improved significantly if [??] is increased. For example, for [??] = 1.1, the sampler stopped at chains of length 50 each and took 2.5 seconds with a reconstruction that is visually indistinguishable from that computed with a longer chain, and a 95% credibility interval of [4.49, 4.70] for the [lambda] samples, which does not quite contain the value [lambda] = 4.72 used to generate the data. A chain of length 200 results if [??] = 1.05, and in this case the 95% credibility interval for [lambda] was [4.56,4.77], and computation took approximately 10 seconds. In any case, the classical GCV-based approach is clearly more efficient, however the sample-based approach has the clear benefit that uncertainty can be quantified and visualized for the image x, precision parameters [lambda] and [delta], and regularization parameter [alpha] = [delta]/[lambda].

5. Conclusions. Our focus is on the problem of image deconvolution, which is an ill-posed inverse problem, and hence requires regularization. We take a Bayesian approach, in which case the negative-log of the prior probability density function corresponds to the regularization function. We construct our prior by assuming specific Gaussian conditional densities for [[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is the vector containing the 'neighbor' intensities [x.sub.j] of pixel [x.sub.i]. Our assumptions lead to a Gaussian prior with precision (inverse-covariance) matrix , where is the discrete negative-Laplacian matrix. This approach is known as conditional autoregression, and the random vector x is called a Gaussian Markov Random Field (GMRF). This interpretation of negative-Laplacian regularization has the benefit that the underlying statistical assumptions are made apparent.

In addition, we assume that the unknown has Neumann boundary conditions (BCs), which corresponds to extending x outside of the computational domain via a reflection about the boundary. For both convolution and GMRF models, the resulting matrices A and L have Toeplitz-plus-Hankel structure and (assuming a symmetric kernel) can be diagonalized by the discrete cosine transform (DCT). The use of the DCT in the context of GMRFs does not appear to be widespread, even though it has advantages over the oft-used DFT, which corresponds to the assumption of periodic BCs. Specifically, in terms of computational efficiency the DCT and DFT are comparable (both require 0(n [log.sub.2] n) flops), while the use of the DCT yields better results when the unknown has relatively high intensity values near the boundary of the computational domain, a fact that we demonstrate in the numerical experiments section. On the other hand, the DCT requires a symmetric kernel, while the DFT does not.

For the estimation step, we implement a Markov Chain Monte Carlo (MCMC) method for sampling from the posterior density function p(x, [lambda], [delta]|b). At every MCMC iteration, the primary work is the computation of the image sample [x.sup.k]. This requires the solution of a matrix-vector equation with coefficient matrix [[lambda].sub.k][A.sup.T]A + [[delta].sub.k]L. Since A and L are diagonalizable by the DCT, computing [x.sup.k] is efficient, and hence, so is the MCMC method. We also present a statistical technique for determining the convergence of the MCMC chain, and test the method on an image deconvolution problem. The method is efficient and yields samples of x, [lambda], and from which a reconstructed image (sample mean), a pixel-wise variance image, and histograms of [lambda], [delta] and the regularization parameter [alpha] = [delta]/[lambda] are computed.

The computational efficiency of the sample-based approach is compared with that of Estimating [alpha] using generalized cross validation and computing the corresponding regularized solution. The latter is significantly more computationally efficient, however the sample-based approach readily allows for the quantification of uncertainty in x, [lambda], [delta], and the regularization parameter [alpha] = [delta]/[lambda]. Specifically, we present histograms and credibility intervals for [lambda] and [alpha], and for the image x we present both the pixel-wise standard deviation image as well as a movie with frames taken to be a sub-sample of the x samples.

REFERENCES

 J. M. Bardsley, MCMC-based image reconstruction with uncertainty quantification, SIAM J. Sci. Comput., 34 (2012), pp. A1316-A1332.

 M. Bertero and P. Boccacci, A simple method for the reduction of boundary effects in the Richardson-Lucy approach to image deconvolution, Astronom. Astrophys., 437 (2005), pp. 369-374.

 J. Besag, Spatial interaction and the statistical analysis of lattice systems, J. Roy. Statist. Soc. Ser. B, 36 (1974), pp. 192-236.

 D. Calvetti and E. Somersalo, Introduction to Bayesian Scientific Computing, Springer, New York, 2007.

 H. W. Engl, M. Hanke, and A. Neubauer, Regularization of Inverse Problems, Kluwer, Dordrecht, The Netherlands, 1996.

 A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin, Bayesian Data Analysis, 2nd ed., Chapman & Hall/CRC, Boca Raton, FL, 2004.

 P. C. Hansen, Discrete Inverse Problems: Insight and Algorithms, SIAM, Philadelphia, 2010.

 P. C. Hansen, J. Nagy, and D. O'Leary, Deblurring Images: Matrices, Spectra, and Filtering, SIAM, Philadelphia, 2006.

 D. Higdon, A primer on space-time modelling from a Bayesian perspective, Technical Report, Los Alamos Nation Laboratory, Statistical Sciences Group, LA-UR-05-3097, 2005.

 J. Kaipio and E. Somersalo, Statistical and Computational Inverse Problems, Springer, New York, 2005.

 A. Matakos, S. Ramani, and J. A. Fessler, Accelerated edge-preserving image restoration without boundary artifacts, IEEE Trans. Image Process., 22 (2013), pp. 2019-2029.

 J. M. F. Moura and N. Balram, Recursive structure of noncausal Gauss-Markov random fields, IEEE Trans. Inform. Theory, 38 (1992), pp. 334-354.

 J. M. F. Moura and M. G. S. Bruno, DCT/DST and GaussMarkov fields: conditions for equivalence, IEEE Trans. Signal Process., 46 (1998), pp. 2571-2574.

 J. Nagy and D. O'Leary, Image restoration through subimages and confidence images, Electron. Trans. Nu mer. Anal., 13 (2002), pp. 22-37.

 M. K. Ng, R. H. Chan, and W. Tang, A fast algorithm for deblurring models with Neumann boundary conditions, SIAM J. Sci. Comput., 21 (1999), pp. 851-866.

 S. J. Reeves, Fast image restoration without boundary artifacts, IEEE Trans. Image Process., 14 (2005), pp. 1448-1453.

 H. Rue and L. Held, Gaussian Markov Random Fields: Theory and Applications, Chapman and Hall/CRC, Boca Raton, FL, 2005.

 C. Van Loan, Computational Frameworks for the Fast Fourier Transform, SIAM, Philadelphia, 1992.

 R. Vio, J. M. Bardsley, M. Donatelli, and W. Wamsteker, Dealing with edge effects in least-squares image deconvolution problems, Astronom. Astrophys., 442 (2005), pp. 397-403.

 C. R. Vogel, Computational Methods for Inverse Problems, SIAM, Philadelphia, 2002.

JOHNATHAN M. BARDSLEY ([dagger]), MARYLESA HOWARD ([dagger]), AND JAMES G. NAGY ([double dagger])

([dagger]) Department of Mathematical Sciences, University of Montana, Missoula, MT, 59812-0864. Email: bardsleyj@mso.umt.edu, marylesa.howard@umontana.edu.

([double dagger]) Department of Mathematics and Computer Science, Emory University, Atlanta, GA, 30322. Email: nagy@mathcs.emory.edu.

* Received November 14, 2012. Accepted August 28, 2013. Published online on December 18, 2013. Recommended by Bob Plemmnons.
COPYRIGHT 2013 Institute of Computational Mathematics
No portion of this article can be reproduced without the express written permission from the copyright holder.
Title Annotation: Printer friendly Cite/link Email Feedback markov chain monte carlo Bardsley, Johnathan M.; Howard, Marylesa; , James G. Nagy Electronic Transactions on Numerical Analysis Report Jan 1, 2013 6065 Multi-parameter Arnoldi-Tikhonov methods. Vector extrapolation applied to algebraic Riccati equations arising in transport theory. Engineering research Image processing Markov processes Monte Carlo method Monte Carlo methods