Printer Friendly

Numerical Reconstruction of the Covariance Matrix of a Spherically Truncated Multinormal Distribution.

1. Introduction

It has been more than forty years since Tallis [1] worked out the moment-generating function of a normal multivariate X [equivalent to] [{[X.sub.k]}.sup.v.sub.k=1] ~ [N.sub.v](0, [SIGMA]), subject to the conditional event

X [member of] [E.sub.v] ([rho]; [SIGMA]), [E.sub.v] ([rho]; [SIGMA]) [equivalent to] {x [member of] [R.sup.v] : [x.sup.T][[SIGMA].sup.-1] x [less than or equal to] [rho]}. (1)

The perfect match between the symmetries of the ellipsoid [E.sub.v]([rho]; [SIGMA]) and those of [N.sub.v](0, [SIGMA]) allows for an exact analytic result, from which the complete set of multivariate truncated moments can be extracted upon differentiation. Consider, for instance, the matrix [G.sub.E]([rho]; [SIGMA]) of the second truncated moments, expressing the covariances among the components of X within [E.sub.v]([rho]; [SIGMA]). From Tallis' paper it turns out that

[G.sub.E] ([rho]; [SIGMA]) = [c.sub.T] ([rho]) [SIGMA], [c.sub.T] ([rho]) [equivalent to] [[F.sub.v+2] ([rho])]/[F.sub.v] ([rho]), (2)

with [F.sub.v] denoting the cumulative distribution function of a [chi square]- variable with v degrees of freedom. Inverting (2)--so as to express [SIGMA] as a function of [S.sub.E]--is trivial, since [c.sub.T]([rho]) is a scalar damping factor independent of [SIGMA]. In this paper, we shall refer to such inverse relation as the reconstruction of [SIGMA] from [G.sub.E]. Unfortunately, life is not always so easy. In general, the effects produced on the expectation of functions of X by cutting off the probability density outside a generic domain D [subset] [R.sup.v] can be hardly calculated in closed form, especially if the boundary of D is shaped in a way that breaks the ellipsoidal symmetry of [N.sub.v](0, [SIGMA]). Thus, for instance, unlike (2), the matrix of the second truncated moments is expected in general to display a nonlinear/nontrivial dependence upon [SIGMA].

In the present paper, we consider the case where D is a v-dimensional Euclidean ball with center in the origin and square radius [rho]. Specifically, we discuss the reconstruction of [SIGMA] from the matrix [G.sub.B] of the spherically truncated second moments. To this aim, we need to mimic Tallis' calculation, with (1) being replaced by the conditional event

X [member of] [B.sub.v] ([rho]), [B.sub.v]([rho]) [equivalent to] {x [member of] [R.sub.v] : [x.sup.T]x [less than or equal to] [rho]}. (3)

This is precisely an example of the situation described in the previous paragraph: although [B.sub.v]([rho]) has a higher degree of symmetry than [E.sub.v]([rho]; [SIGMA]), still there is no possibility of obtaining a closed-form relation between [SIGMA] and [G.sub.B], since [B.sub.v]([rho]) breaks the ellipsoidal symmetry of [N.sub.v](0, [SIGMA]): technically speaking, in this case we cannot perform any change of variable under the defining integral of the moment-generating function, which may reduce the dimensionality of the problem, as in Tallis' paper.

In spite of that, the residual symmetries characterizing the truncated distribution help simplify the problem in the following respects: (i) the reflection invariance of the whole setup still yields E[[X.sub.k] | X [member of] [B.sub.v]([rho])] = 0 [for all]k and (ii) the rotational invariance of [B.sub.v]([rho]) preserves the possibility of defining the principal components of the distribution just like in the unconstrained case. In particular, the latter property means that [G.sub.B] and [SIGMA] share the same orthonormal eigenvectors. In fact, the reconstruction of [SIGMA] from [G.sub.B] amounts to solving a system of nonlinear integral equations, having the eigenvalues [lambda] [equivalent to] [{[[lambda].sub.k]}.sup.v.sub.k=1] of [SIGMA] as unknown variables and the eigenvalues [mu] [equivalent to] [{[[mu].sub.k]}.sup.v.sub.k=1] of [G.sub.B] as input parameters. In a lack of analytic techniques to evaluate exactly the integrals involved, we resort to a numerical algorithm, of which we investigate feasibility, performance, and optimization.

The paper is organized as follows. In Section 2, we describe a few examples illustrating the occurrence of spherical truncations in practical situations. In Section 3, we show that the aforementioned integral equations have the analytic structure of a fixed point vector equation; that is to say, [lambda] = T([lambda]). This suggests achieving the reconstruction of [lambda] numerically via suitably chosen iterative schemes. In Section 4, we prove the convergence of the simplest of them by inductive arguments, the validity of which relies upon the monotonicity properties of ratios of Gaussian integrals over [B.sub.v]([rho]). In Section 5, we review some numerical techniques for the computation of Gaussian integrals over [B.sub.v]([rho]) with controlled systematic error. These are based on and extend a classic work by Ruben [2] on the distribution of quadratic forms of normal variables. For the sake of readability, we defer proofs of statements made in this section to the appendix. In Section 6, we report on our numerical experiences: since the simplest iterative scheme, namely, the Gauss-Jacobi iteration, is too slow for practical purposes, we investigate the performance of its improved version based on overrelaxation; as expected, we find that the latter has a higher convergence rate; yet it still slows down polynomially in 1/[rho] as [rho] [right arrow] 0 and exponentially in v as b [right arrow] [infinity]; in order to reduce the slowing down, we propose an acceleration technique, which boosts the higher components of the eigenvalue spectrum. A series of Monte Carlo simulations enables us to quantify the speedup. In Section 7 we discuss the problems arising when [mu] is affected by statistical uncertainty and propose a regularization technique based on perturbation theory. To conclude, we summarize our findings in Section 8.

2. Motivating Examples

Spherical truncations of multinormal distributions may characterize different kinds of experimental devices and may occur in various problems of statistical and convex analysis. In this section, we discuss two motivating examples.

2.1. A Two-Dimensional Gedanken Experiment in Classical Particle Physics. Consider the following ideal situation. An accelerator physicist prepares an elliptical beam of classical particles with Gaussian transversal profile. The experimenter knows a priori the spatial distribution of the beam, that is, the covariance matrix [SIGMA] of the two-dimensional coordinates of the particles on a plane orthogonal to their flight direction. We can assume with no loss of generality that the transversal coordinate system has origin at the maximum of the beam intensity and axes along the principal components of the beam; thus it holds [SIGMA] = diag([[lambda].sub.1], [[lambda].sub.2]). The beam travels straightforward until it enters a linear coaxial pipeline with circular profile, schematically depicted in Figure 1, where the beam is longitudinally accelerated. While the outer part of the beam is stopped by an absorbing wall, the inner part propagates within the pipeline. At the end of the beam flight the physicist wants to know if the transversal distribution of the particles is changed, due to unknown disturbance factors arisen within the pipeline. Accordingly, he measures again the spatial covariance matrix of the beam. Unfortunately, the absorbing wall has cut off the Gaussian tail, thus damping the covariance matrix and making it no more comparable to the original one. To perform such a comparison in the general case [[lambda].sub.1] [not equal to] [[lambda].sub.2], the covariance matrix of the truncated circular beam has to go through the reconstruction procedure described in next sections.

2.2. A Multivariate Example: Connections to Compositional Data Analysis. Compositional Data Analysis (CoDA) has been the subject of a number of papers, pioneered by Aitchison [3] over the past forty years. As a methodology of statistical investigation, it finds application in all cases where the main object of interest is a multivariate with strictly positive continuous components to be regarded as portions of a total amount [kappa] (the normalization [kappa] = 1 is conventionally adopted in the mathematical literature). In other words, compositional variates belong to the [kappa]-simplex:

[S.sub.v] = {z [member of] [R.sup.v.sub.+] : [[absolute value of z].sub.1] = [kappa]}, v [greater than or equal to] 2, (4)

with [[absolute value of z].sub.1] = [[summation].sup.v.sub.k=1] [z.sub.k] the taxi-cab norm of z, while compositions with different norms can be always projected onto [S.sub.v] via the closure operator C x x [equivalent to] {[kappa][x.sub.1]/[[absolute value of x].sub.1], ..., [kappa][x.sub.v]/[[absolute value of x].sub.1]}. There are countless types of compositional data, whose analysis raises problems of interest for statistics [4], for example, geochemical data, balance sheet data, and election data. The simplex constraint induces a kind of dependency among the parts of a composition that goes beyond the standard concept of covariance. This invalidates many ordinary techniques of statistical analysis.

In order to measure distances on [S.sub.v], Aitchison introduced a positive symmetric function [d.sub.A] : [S.sub.v] x [S.sub.v] [right arrow] [R.sup.+], explicitly defined by

[d.sub.A] (x, y) = [square root of (1/2v [v.summation over (i,k=1)] [[log ([x.sub.i]/[x.sub.k]) - log ([y.sub.i]/[y.sub.k])].sub.2])]. (5)

The Aitchison distance is a key tool in CoDA. It is scale invariant in both its first and second arguments; that is, it is left invariant by redefinitions z [right arrow] {[alpha][z.sub.1], ..., [alpha][z.sub.v]} with [alpha] [member of] [R.sub.+]. Accordingly, its support can be extended to [R.sup.v.sub.+] x [R.sup.v.sub.+] by imposing

[d.sub.A] (x, y) [equivalent to] [d.sub.A] (C x x, C x y), x, y [member of] [R.sup.v.sub.+]. (6)

It was proved in [5] that [d.sub.A] is a norm-induced metric on [S.sub.v], provided the latter is given an appropriate normed vector space structure. Owing to the compositional constraint [[absolute value of (x)].sub.1] = [kappa], it holds dim [S.sub.v] = v - 1. Accordingly, the description of [S.sub.v] in terms of v components is redundant: an essential representation requires compositions to be properly mapped onto (v - 1)-tuples. Among various possibilities, the Isometric Log-Ratio (ILR) transform introduced in [5] is the only known map of this kind leaving [d.sub.A] invariant. More precisely, the ILR fulfills

[d.sub.A] (x, y) = [d.sub.E] (ilr(x), ilr(y)), [d.sub.E] (u, v) [equivalent to] [square root of ([v-1.summation over (k=1)] [([u.sub.k] - [v.sub.k]).sup.2])]. (7)

It is known from [6] that if X ~ log [N.sub.v]([mu], [SIGMA]) is a log-normal v-variate, then C x X ~ [L.sub.v]([mu]', [SIGMA]') is a logistic-normal v-variate (the reader should notice that in [6] the simplex is defined by [S.sub.v] = {z [member of] [R.sup.v.sub.+] : [[absolute value of (z)].sub.1] < 1}; thus property 2.2 of [6] is here reformulated so as to take into account such difference), with a known relation between ([mu], [SIGMA]) and ([mu]', [SIGMA]'). Analogously, it is not difficult to show that ilr(C x X) ~ [N.sub.v-1]([mu]", [SIGMA]") is a normal (v - 1)-variate, with ([mu]", [SIGMA]") related to ([mu]', [SIGMA]') via the change of basis matrices derived in [5]. Just to sum up, it holds

X ~ log [N.sub.v] ([mu], [SIGMA]) [??] C x X ~ [L.sub.v] ([mu]', [SIGMA]') [??] ilr(C x X) ~ [N.sub.v-1] ([mu]", [SIGMA]"). (8)

Now, suppose that (i) X fulfills (8) and has a natural interpretation as a composition, (ii) a representative set [D.sub.X] of observations of X is given, and (iii) we wish to select from [D.sub.X] those units which are compositionally closer to the center of the distribution, according to the Aitchison distance. To see that the problem is well posed, we first turn [D.sub.X] into a set [D.sub.CxX] [equivalent to] {y : y = C x x, x [member of] [D.sub.X]} of compositional observations of Y = C x X. Then, we consider the special point cen[Y] = C x exp{E[ln Y]}, representing the center of the distribution of Y in a compositional sense: cen[Y] minimizes the expression E[[d.sup.2.sub.A](Y, cen[Y])] over [S.sub.v] and fulfills cen[Y] = [ilr.sup.-1](E[ilr(Y)]), see [7]. By virtue of (8) this entails ilr(cen[Y]) = E[ilr(Y)] = [mu]". In order to select the observations which are closer to cen[Y], we set a threshold [delta] > 0 and consider only those elements y [member of] [D.sub.CxX] fulfilling [d.sup.2.sub.A](y, cen[y]) < [delta], with cen[y] being a sample estimator of cen[Y] on [D.sub.CxX]. Such selection rule operates a well-defined truncation of the distribution of Y. Moreover, in view of (7) and (8), we have

P[[d.sup.2.sub.A] (Y, cen [Y]) < [delta] | Y ~ [L.sub.v] ([mu]', [SIGMA]')] = P[[d.sup.2.sub.E] (Z, [mu]") < [delta] | Z ~ [N.sub.v-1] ([mu]", [SIGMA]")], (9)

with Z = ilr(C x X). As a consequence, we see that a compositional selection rule based on the Aitchison distance and (8) is equivalent to a spherical truncation of a multinormal distribution. Obviously, once Z has been spherically truncated, the covariance matrix of the remaining data is damped; thus an estimate of the full covariance matrix requires the reconstruction procedure described in next sections.

2.3. Compositional Outlier Detection via Forward Search Techniques. Outlier detection in CoDA is a practical problem where the considerations made in Section 2.2 find concrete application. Most of the established methods for outlier detection make use of the Mahalanobis metric. This is however unfit to describe the distance between compositions. Log-ratio transforms allow to get rid of the compositional constraint and make it possible to apply standard statistical methods [8]. Here we consider an innovative approach, namely, the Forward Search Algorithm (FSA), introduced in [9] and thoroughly discussed in [10]. The FSA admits an elegant extension to CoDA, of which the covariance reconstruction is a key step. In the next few lines we sketch the original algorithm and outline its extension to CoDA.

2.3.1. Construction of the Signal. In its standard formulation the FSA applies to normal data. It assumes a dataset [D.sub.X] = [{[x.sup.(k)]}.sup.N.sub.k=1] with [x.sup.(k)] [member of] [R.sup.(v).sub.+] for k = 1, ..., N. The null hypothesis is that all the elements of [D.sub.X] are independent realizations of a stochastic variable X ~ N([[mu].sub.0], [[SIGMA].sub.0]). The FSA consists of a sequence of steps where data are recursively sorted. Along the recursion a signal is built and tested.

As a preliminary step, [m.sub.0] observations are randomly chosen from the bulk of [D.sub.X]. Let S([m.sub.0]) be the set of these observations. S([m.sub.0]) is used to provide initial estimates [mu]([m.sub.0]), [SIGMA]([m.sub.0]) of the true mean [[mu].sub.0] and the true covariance matrix [[SIGMA].sub.0], respectively. For m = [m.sub.0] + 1, [m.sub.0] + 2, ..., the (m - [m.sub.0])th step of the algorithm goes as follows:

(i) sort the elements of [D.sub.X] according to the increasing values of the square Mahalanobis distance:

[d.sub.m] [(x).sup.2] = [[x - [mu] (m - 1)].sup.T] [SIGMA] [(m - 1).sup.-1] [x - [mu] (m - 1)]; (10)

(ii) take the mth element [x.sup.(m)] of the newly sorted dataset and regard [s.sub.m] = [d.sub.m][([x.sup.(m)]).sup.2] as the (m - [m.sub.0])th component of the signal;

(iii) let S(m) be the set of the first m observations of the newly sorted dataset;

(iv) use S(m) to provide new estimates [mu](m), [SIGMA](m) of the true mean and the true covariance matrix, respectively.

Notice that S(m) is a truncated dataset. Therefore, [SIGMA](m) must include the Tallis' correction factor, (2) with [rho] = [s.sub.m]. While the recursion proceeds, the inliers of [D.sub.X] populate progressively S(m). The recursion stops at the [bar.mth] step, when the first outlier [x.sup.([bar.m])] of [D.sub.X] produces a statistically relevant discontinuity in the signal.

2.3.2. Statistical Test of the Signal. Statistical tests are needed to assess the relevance of discontinuities observed in the signal. At each step of the algorithm a new test is actually performed. Specifically, at the mth step, [s.sub.m] is computed together with the lowest and highest values, respectively, [delta][s.sub.m,[alpha]] and [delta][s.sub.m,1-[alpha]], which are theoretically admissible for [s.sub.m] under the null hypothesis at (1 - [alpha]) significance level for some [alpha]. These values are nothing but the [alpha]- and (1 - [alpha])- percentage points of [s.sub.m]. Their envelopes for m - [m.sub.0] > 0 generate two curves that surround the signal when plotted versus m. More precisely, one curve lies below it and the other lies above, provided the null hypothesis is not broken. The violation of one of the curves by the signal is interpreted as the result of the entrance of an outlier into S(m). Although the distribution of [s.sub.m] cannot be calculated in closed form, its percentage points are obtained from a general formula, first derived in [11], yielding

[delta][s.sub.m,[alpha]] = [([[chi].sup.2.sub.v]).sup.-1] (m/[m + (N - m - 1) [f.sub.2(N-m-1),2m;1-[alpha]]]), (11)

with [f.sub.a,b;[alpha]] the [alpha]-percentage point of the Fisher distribution with parameters (a, b). Equation (11) holds with decent approximation, as confirmed by numerical simulations.

2.3.3. Extension of the Forward Search to CoDA. When [D.sub.X] is a compositional dataset, it is unnatural to assume that its elements are realizations of a multivariate X ~ N([[mu].sub.0], [[SIGMA].sub.0]).

In this case the use of the FSE as outlined above does not make sense at all. Sometimes it is reasonable to assume X ~ [L.sub.v]([[mu].sub.0], [[SIGMA].sub.0]), as first argued in [6]. In this case we can use the FSA to find outliers, provided that we first modify the algorithm in two respects:

(i) we replace the Mahalanobis distance by the Aitchison one;

(ii) we perform statistical tests consistently with the change of null hypothesis.

Specifically, at the mth step of the algorithm, we sort [D.sub.X] according to the increasing values of the square Aitchison distance:

[d.sub.A] [(x).sup.2] = 1/2v [v.summation over (i,k=1)] [[log ([x.sub.i]/[x.sub.k]) - log ([([c.sub.m-1]).sub.i]/[([c.sub.m-1]).sub.k])].sup.2], (12)

where [c.sub.m-1] = cen[y | y [member of] S(m - 1)] is the center of S(m - 1). Analogously, given the mth element [x.sup.(m)] of the newly sorted dataset, we regard [s.sub.m] = [d.sub.A][([x.sup.(m)]).sup.2] as the mth component of the signal. The percentage points of sm are obtained from ilr(S(m)) by using the probability correspondence established in (9). Since ilr(S(m)) is a spherically truncated dataset, the estimate of the covariance matrix [SIGMA](m) derived from it must undergo the reconstruction procedure described in next sections.

2.4. General Covariance Reconstruction Problem. The examples discussed in the previous subsections are special cases of a more general inverse problem, namely, the reconstruction of the covariance matrix [SIGMA] of a normal multivariate X on the basis of the covariance matrix [G.sub.D] of X truncated to some (convex) region D. This is the simplest yet nontrivial inverse problem, which can be naturally associated with the normal distribution. The case D = [B.sub.v]([rho]) corresponds to a setup where theoretical and practical aspects of the problem can be investigated with relatively modest mathematical effort.

It is certainly a well-defined framework where to study regularization techniques for nonlinear inverse problems in statistics, for which there is still much room for interesting work [12, 13].

3. Definitions and Setup

Let X [member of] [R.sup.v] be a random vector with jointly normal distribution [N.sub.v](0, [SIGMA]) in v [greater than or equal to] 1 dimensions. The probability that X falls within [B.sub.v]([rho]) is measured by the Gaussian integral

[mathematical expression not reproducible]. (13)

Since [SIGMA] is symmetric positive definite, it has orthonormal eigenvectors [SIGMA][v.sup.(i)] = [[lambda].sub.i][v.sup.(i)]. Let us denote by R [equivalent to] [{[v.sup.(j).sub.i]}.sup.v.sub.i,j=1] the orthogonal matrix having these vectors as columns and by [LAMBDA] [equivalent to] diag([lambda]) = [R.sup.T][SIGMA]R the diagonal counterpart of [SIGMA]. From the invariance of [B.sub.v]([rho]) under rotations, it follows that [alpha] depends upon [SIGMA] just by way of [lambda]. Accordingly, we rename the Gaussian probability content of [B.sub.v]([rho]) as

[mathematical expression not reproducible]. (14)

Note that (14) is not sufficient to fully characterize the random vector X under the spherical constraint, for which we need to calculate the distribution law P[X [member of] A | X [member of] [B.sub.v]([rho])] as a function of A [subset] [R.sup.v]. Alternatively, we can describe X in terms of the complete set of its truncated moments

[mathematical expression not reproducible]. (15)

As usual, these can be all obtained from the moment-generating function

[mathematical expression not reproducible], (16)

by differentiating the latter an arbitrary number of times with respect to the components of t; namely,

[mathematical expression not reproducible]. (17)

It will be observed that m(t) is in general not invariant under rotations of t. Therefore, unlike [alpha], the moments [mathematical expression not reproducible] depend effectively on both [lambda] and R. For instance, for the matrix of the second moments [G.sub.B] [equivalent to] [{[[partial derivative].sup.2]m/[partial derivative][t.sub.i] [partial derivative][t.sub.j][|.sub.t=0]}.sup.v.sub.i,j=1] such dependence amounts to

[mathematical expression not reproducible]. (18)

By parity, the only nonvanishing terms in the above sum are those with k = l. Hence, it follows that [SIGMA] and [G.sub.B] share R as a common diagonalizing matrix. In other words, if M [equivalent to] diag([mu]) is the diagonal matrix of the eigenvalues of [G.sub.B], then M = [R.sup.T][G.sub.B]R. Moreover, [[mu].sub.k] is related to [[lambda].sub.k] by

[mathematical expression not reproducible]. (19)

The ratios [[alpha].sub.k]/[alpha] are naturally interpreted as adimensional correction factors to the eigenvalues of [SIGMA], so they play the same role as [c.sub.T]([rho]) in (2). However, [[alpha].sub.k]/[alpha] depends explicitly on the subscript k; thus each eigenvalue is damped differently from the others as a consequence of the condition X [member of] [B.sub.v]([rho]).

Remark 1. In practical terms, (18)-(19) tell us that estimating the sample covariance matrix of X ~ [N.sub.v](0, [SIGMA]) from a spherically truncated population [{[x.sup.(m)]}.sup.M.sub.m=1], made of M independent units, via the classical estimator [Q.sub.ij] = [(M - 1).sup.-1] [[summation].sup.M.sub.m=1]([x.sup.(m).sub.i] - [[??].sup.(m).sub.i])([x.sup.(m).sub.j - [[??].sup.(m).sub.j), being [??] = [M.sup.-1] [[summation].sup.M.sub.m=1] [x.sup.(m)] the sample mean, yields a damped result. Nonetheless, the damping affects only the eigenvalues of the estimator, whereas its eigenvectors are left invariant.

3.1. Monotonicity Properties of Ratios of Gaussian Integrals. Equations (14) and (19) suggest introducing a general notation for the Gaussian integrals over [B.sub.v]([rho]), under the assumption [SIGMA] = [LAMBDA]. So, we define

[mathematical expression not reproducible], (20)

with each subscript q on the left-hand side addressing an additional factor of [x.sup.2.sub.q]/[[lambda].sub.q] under the integral sign on the right-hand side (no subscripts means [alpha]). Several analytic properties of such integrals are discussed in [14]. In the following proposition, we lay emphasis on some issues concerning the monotonicity trends of the ratios [[alpha].sub.k]/[alpha].

Proposition 2 (monotonicities). Let [[lambda].sub.(k)] [equivalent to] [{[[lambda].sub.i]}.sup.i[not equal to]k.sub.i=1, ..., v] denote the set of the full eigenvalues without [[lambda].sub.k]. The ratios [[alpha].sub.k]/[alpha] fulfill the following properties:

([p.sub.1]) [[lambda].sub.k]([[alpha].sub.k]/[alpha])([rho]; [lambda]) is a monotonic increasing function of [[lambda].sub.k] at fixed [rho] and [[lambda].sub.(k)];

([p.sub.2]) ([[alpha].sub.k]/[alpha])([rho]; [lambda]) is a monotonic decreasing function of [[lambda].sub.k] at fixed [rho] and [[lambda].sub.(k)];

([p.sub.3]) ([[alpha].sub.k]/[alpha])([rho]; [lambda]) is a monotonic decreasing function of [[lambda].sub.i] (i [not equal to] k) at fixed [rho] and [[lambda].sub.(i)],

where an innocuous abuse of notation has been made on writing ([[alpha].sub.k]/[alpha])([rho]; [lambda]) in place of [[alpha].sub.k]([rho]; [lambda])/[alpha]([rho]; [lambda]).

Proof. Let the symbol [[partial derivative].sub.k] [equivalent to] [partial derivative]/[partial derivative][[lambda].sub.k] denote a derivative with respect to [[lambda].sub.k]. In order to prove property ([p.sub.1]), we apply the chain rule of differentiation to [[lambda].sub.k][[alpha].sub.k]/[alpha] and then we pass [[partial derivative].sub.k] under the integral sign in [[partial derivative].sub.k][alpha] and [[partial derivative].sub.k][[alpha].sub.k]. In this way, we obtain

[mathematical expression not reproducible]. (21)

Moreover, since the truncated marginal density of [X.sup.2.sub.k] is positive within a set of nonzero measure in R, the monotonic trend of [[lambda].sub.k][[alpha].sub.k]/[alpha] in [[lambda].sub.k] is strict.

Properties ([p.sub.2]) and ([p.sub.3]) are less trivial than ([p.sub.1]). Indeed, the same reasoning as above now yields on the one hand

[mathematical expression not reproducible], (22)

and on the other

[mathematical expression not reproducible]. (23)

Despite being not a priori evident, the right-hand side of both (22) and (23) is negative (and vanishes in the limit [rho] [right arrow] [infinity]). The inequalities var([X.sup.2.sub.k]) [less than or equal to] 2[[lambda].sub.k]E[[X.sup.2.sub.k]] within Euclidean balls have been first discussed in [14], while the inequalities cov([X.sup.2.sub.j], [X.sup.2.sub.k]) within generalized Orlicz balls have been discussed in [15, 16] for the case where the probability distribution of X is flat instead of being normal. More recently, a complete proof of both inequalities has been given in [17]. Despite the technical difficulties in proving them, their meaning should be intuitively clear. The variance inequality quantifies the squeezing affecting [X.sup.2.sub.k] as a consequence of the truncation (in the unconstrained case it would be var([X.sup.2.sub.k]) = 2[[lambda].sup.2.sub.k]). The covariance inequality follows from the opposition arising among the square components in proximity of the boundary of [B.sub.v]([rho]). Indeed, if [X.sub.2.sub.j] [??] [rho], then [X.sup.2.sub.k] [??] 0 [for all]k [not equal to] j in order for X to stay within [B.sub.v]([rho]).

3.2. Definition Domain of the Reconstruction Problem. A consequence of Proposition 2 is represented by the following.

Corollary 3. Given v, [rho], and [lambda], [[mu].sub.k] is bounded by

[rho]/[r (v, [rho]/2[[lambda].sub.k])] [less than or equal to] [[mu].sub.k] [less than or equal to] [rho]/3, r (v, z) [equivalent to] (2v + 1) [M (v, v + 1/2, z)]/[M (v, v + 3/2, z)], (24)

with M denoting the Kummer function; namely,

M (a, b, z) = [[infinity].summation over (n=0)] 1/n! [(a).sub.n]/[(b).sub.n] [z.sup.n], [(x).sub.n] [equivalent to] [[GAMMA] (x + n)]/[[GAMMA] (x)]. (25)

Proof. The upper bound of (24) corresponds to the value of [[mu].sub.k] in the v-tuple limit [[lambda].sub.k] [right arrow] [infinity], [[lambda].sub.(k)] [right arrow] {0, ..., 0}. This is indeed the maximum possible value allowed for [[mu].sub.k] according to properties ([p.sub.1]) and ([p.sub.3]) of Proposition 2. In order to perform this limit, we observe that

[mathematical expression not reproducible], (26)

with the [delta] symbol on the right-hand side representing the Dirac delta function (the reader who is not familiar with the theory of distributions may refer, for instance, to [18] for an introduction). Accordingly,

[mathematical expression not reproducible]. (27)

The lower bound corresponds instead to the value taken by [[mu].sub.k] as [[lambda].sub.(k)] [right arrow] {[infinity], ...,[infinity]} and [[lambda].sub.k] is kept fixed. In this limit, all the Gaussian factors in the probability density function except the kth one flatten to one. Hence,

[mathematical expression not reproducible]. (28)

Numerator and denominator of the rightmost ratio are easily recognized to be integral representations of Kummer functions (see, e.g., [19, ch. 13]).

The upper bound of (24) can be sharpened, as clarified by the following.

Proposition 4 (bounds on the truncated moments). Let v, [rho], and [lambda] be given. If {[i.sub.1], ..., [i.sub.v]} is a permutation of {1, ..., v} such that [mathematical expression not reproducible], then the following upper bounds hold:

[mathematical expression not reproducible]. (29)

Proof. The overall upper bound on the sum of truncated moments follows from

[mathematical expression not reproducible]. (30)

At the same time, the sum can be split and bounded from below by

[mathematical expression not reproducible]. (31)

The single upper bounds on the [[mu].sub.k]'s are then obtained from (30)-(31). It will be noted that (29) (ii) is sharper than the upper bound of (24) only for v > 3 and k < v - 2.

From now on, we shall assume, with no loss of generality, that the eigenvalues of [SIGMA] are increasingly ordered, namely, 0 < [[lambda].sub.1] [less than or equal to] ... [less than or equal to] [[lambda].sub.v] (we can always permute the labels of the coordinate axes, so as to let this be the case). An important aspect related to the eigenvalue ordering is provided by the following.

Proposition 5 (eigenvalue ordering). Let v, [rho], and [lambda] be given. If [[lambda].sub.1] [less than or equal to] [[lambda].sub.2] [less than or equal to] ... [less than or equal to] [[lambda].sub.v], then [[mu].sub.1] [less than or equal to] [[mu].sub.2] [less than or equal to] ... [less than or equal to] [[mu].sub.v] holds as well.

Proof. In order to show that the spherical truncation does not violate the eigenvalue ordering, we make repeated use of the monotonicity properties of Proposition 2. Specifically, if i < j, then

[mathematical expression not reproducible], (32)

where the symbol "[??]" is used to explain where the inequality sign preceding it comes from and the "exchange symmetry" refers to the formal property of the one-index Gaussian integrals over [B.sub.v]([rho]) to fulfill [[alpha].sub.i]([rho]; {[[lambda].sub.1], ..., [[lambda].sub.i], ..., [[lambda].sub.j], ..., [[lambda].sub.v]}) = [[alpha].sub.j]([rho]; {[[lambda].sub.1], ..., [[lambda].sub.j], ..., [[lambda].sub.i], ..., [[lambda].sub.v]}).

Let us now focus on (19). They have to be solved in order to reconstruct [lambda] from [mu]. Formally, if we introduce a family of truncation operators [[tau].sub.[rho]] : [R.sup.v.sub.+ [right arrow] [R.sup.v.sub.+ (parametrically depending on [rho]), such that

[([[tau].sub.[rho]] x [lambda]).sub.k] [equivalent to] [[lambda].sub.k] [[alpha].sub.k]/[alpha] ([rho]; [lambda]), k = 1, ..., v, (33)

then the reconstruction of [lambda] from [mu] amounts to calculating [lambda] = [[tau].sup.-1.sub.[rho]] x [mu]. One should be aware that [[tau].sub.[rho]] is not a surjective operator in view of Corollary 3 and Proposition 4. Therefore, [[tau].sup.-1.sub.[rho]] is only defined within a bounded domain D([[tau].sup.- 1.sub.[rho]]). If we define

[D.sub.0] = {[mu] [member of] [R.sup.v.sub.+] : [[mu].sub.1] [less than or equal to] ... [less than or equal to] [[mu].sub.v], [[mu].sub.k] = [[lambda].sub.k] [[[alpha].sub.k]/[alpha]] for k = 1, ..., v, for some [lambda] [member of] [R.sup.v.sub.+]}, (34)

then we have D([[tau].sup.-1.sub.[rho]]) = {[mu] : [mu] = [sigma] x [[mu].sub.0] for some [[mu].sub.0] [member of] [D.sub.0], [sigma] [member of] [S.sub.v]}, where [S.sub.v] is the set of permutations of v elements. From Proposition 4 we conclude that [D.sub.0] [subset or equal to] [H.sub.v]([rho]), being

[H.sub.v] ([rho]) [equivalent to] {x [member of] [R.sup.v.sub.+] : [x.sub.k] [less than or equal to] min {[rho]/3, [rho]/[v - k + 1]}, [v.summation over (k=1)] [x.sub.k] [less than or equal to] [rho], [for all]k}. (35)

In fact, there are vectors [mu] [member of] [R.sup.v.sub.+] fulfilling [mu] [member of] [H.sub.v]([rho]) and [mu] [not member of] [D.sub.0]; thus we conclude that [D.sub.0] is a proper subset of [H.sub.v]([rho]). Numerical experiences based on the techniques discussed in the next sections show indeed that

[mathematical expression not reproducible]. (36)

A graphical representation of (36) in v = 2 and v = 3 dimensions is depicted in Figure 2. The reader should note that until Section 7 we shall always assume that [mu] comes from the application of [[tau].sub.[rho]] to some [lambda]; thus [mu] [member of] D([[tau].sup.-1.sub.[rho]]) by construction.

Now, we observe that (19) can be written in the equivalent form

[lambda] = T ([lambda]; [mu]; [rho]), (37)

T : [R.sup.v.sub.+] x [R.sup.v.sub.+] x [R.sub.+] [right arrow] [R.sup.v.sub.+]; [T.sub.k] ([lambda]; [mu]; [rho]) = [[mu].sub.k] [[alpha]/[[alpha].sub.k]] ([rho]; [lambda]), k = 1, ..., v. (38)

Since [rho] and [mu] are (nonindependent) input parameters for the covariance reconstruction problem (and in order to keep the notation light), in the sequel we shall leave the dependence of T upon [rho] and [mu] implicitly understood; that is, we shall write (37) as [lambda] = T([lambda]). Hence, we see that the full eigenvalue spectrum [lambda] is a fixed point of the operator T. This suggests obtaining it as the limit of a sequence

[[lambda].sup.(0)] = [mu],

[[lambda].sup.(n+1)] = T ([[lambda].sup.(n)]), n = 0, 1, ..., (39)

[mathematical expression not reproducible], (40)

provided that this can be shown to converge. Note that since [[alpha].sub.k] < [alpha], it follows that [T.sub.k]([[lambda].sup.(n)]) > [[mu].sub.k] [for all]n, so the sequence is bounded from below by [mu]. In particular, this holds for n = 0. Therefore, the sequence moves to the right direction at least at the beginning. A formal proof of convergence, based on the monotonicity properties stated by Proposition 2, is given in the next section.

4. Convergence of the Fixed Point Equation

We split our argument into three propositions, describing different properties of the sequence [[lambda].sup.(n)]. They assert, respectively, that (i) the sequence is componentwise monotically increasing; (ii) the sequence is componentwise bounded from above by any fixed point of T; and (iii) if T has a fixed point, this must be unique. Statements (i) and (ii) are sufficient to guarantee the convergence of the sequence to a finite limit (the unconstrained spectrum is a fixed point of T). In addition, the limit is easily recognized to be a fixed point of T. Hence, statement (iii) guarantees that the sequence converges to the unconstrained eigenvalue spectrum. We remark that all the monotonicities discussed in Proposition 2 are strict; that is, the ratios [[alpha].sub.k]/[alpha] have no stationary points at finite [rho] and [lambda], which is crucial for the proof.

Proposition 6 (increasing monotonicity). Given v, [rho], and [mu] [member of] D([[tau].sup.-1.sub.[rho]]), the sequence [[lambda].sup.(0)] = [mu], [[lambda].sup.(n+1)] = T([[lambda].sup.(n)]), n = 0, 1, ..., is monotically increasing; namely, [[lambda].sup.(n+1).sub.k] > [[lambda].sup.(n).sub.k] [for all]k = 1, ..., v.

Proof. The proof is by induction. We first notice that

[[lambda].sup.(1).sub.k] = [T.sub.k] ([[lambda].sup.(0)]) = [T.sub.k] ([mu]) = [[mu].sub.k] [alpha]/[[alpha].sub.k] ([rho]; [mu]) > [[mu].sub.k] = [[lambda].sup.(0).sub.k], k = 1, ..., v; (41)

the inequality follows from [[alpha].sub.k]([rho]; [mu]) < [alpha]([rho]; [mu]). Suppose now that the property of increasing monotonicity has been checked off up to the nth element of the sequence. Then,

[[lambda].sup.(n+1).sub.k] = [[mu].sub.k] [[alpha]/[[alpha].sub.k]] ([rho]; [[lambda].sup.(n)]) > [[mu].sub.k] [[alpha]/[[alpha].sub.k]] ([rho]; [[lambda].sup.(n-1)]) = [[lambda].sup.(n).sub.k]; (42)

the inequality follows this time from the inductive hypothesis and from properties ([p.sub.2]) and ([p.sub.3]) of Proposition 2.

Proposition 7 (boundedness). Given v, [rho], and [mu] [member of] D([[tau].sup.- 1.sub.[rho]]), the sequence [[lambda].sup.(0)] = [mu], [[lambda].sup.(n+1)] = T([[lambda].sup.(n)]), n = 0, 1, ..., is bounded from above; namely, [[lambda].sup.(n).sub.k] < [[lambda].sup.*.sub.k] [for all]k = 1, ..., v, [[lambda].sup.*] being a fixed point of T.

Proof. We proceed again by induction. We first notice that

[[lambda].sup.(0).sub.k] = [[mu].sub.k] < [[mu].sub.k] [alpha]/[[alpha].sub.k] ([rho]; [[lambda].sup.*]) = [[lambda].sup.*.sub.k], k = 1, ..., v; (43)

the inequality follows as previously from [[alpha].sub.k]([rho]; [[lambda].sup.*]) < [alpha]([rho]; [[lambda].sup.*]). Suppose now that the property of boundedness has been checked off up to the nth element of the sequence. Then,

[mathematical expression not reproducible]; (44)

the inequality follows for the last time from the inductive hypothesis and from properties ([p.sub.2]) and ([p.sub.3]) of Proposition 2.

According to Propositions 6 and 7, the sequence converges. Now, let [??] = [lim.sub.n[right arrow][infinity]][[lambda].sup.(n)] be the limit of the sequence. Effortlessly, we prove that [??] is a fixed point of T. Indeed,

[mathematical expression not reproducible]. (45)

Note that passing the limit over n under the integral sign is certainly allowed for Gaussian integrals.

Proposition 8 (uniqueness of the fixed point). Let [lambda]' = T([lambda]') and [lambda]" = T([lambda]") be two fixed points of T, corresponding to the same choice of v, [rho], and [mu] [member of] D([[tau].sup.-1.sub.[rho]]). Then, it must be that [lambda]' = [lambda]".

Proof. According to the hypothesis, [lambda]' and [lambda]" fulfill the equations

[mathematical expression not reproducible]. (46)


[mathematical expression not reproducible], (47)

where J denotes the Jacobian matrix of [[tau].sub.[rho]] and is given by

[mathematical expression not reproducible], (48)

having set [[OMEGA].sub.kl] [equivalent to] (1/2)([[alpha].sub.kl]/[alpha] - [[alpha].sub.k][[alpha].sub.l]/[[alpha].sup.2]). It will be noted that [OMEGA] = [{[[OMEGA].sub.kl]}.sup.v.sub.k,l=1] is essentially the covariance matrix of the square components of X under spherical truncation (we have come across its matrix elements in (21)-(23)). As such, [OMEGA] is symmetric positive definite. Indeed,

[mathematical expression not reproducible]. (49)

On setting [Z.sub.k] = ([X.sup.2.sub.k] - E[[X.sup.2.sub.k] | X [member of] [B.sub.v]([rho])])/[square root of (2)][[lambda].sub.k], we can represent [OMEGA] as [OMEGA] = E[Z[Z.sup.T] | X [member of] [B.sub.v]([rho])]. If x [member of] [R.sup.v] is not the null vector, then [x.sup.T][OMEGA]x = E[[x.sup.T]Z[Z.sup.T]x | X [member of] [B.sub.v]([rho])] = E[[([x.sup.T]Z).sup.2] | X [member of] [B.sub.v]([rho])] > 0. Moreover, the eigenvalues of [OMEGA] fulfill the secular equation

0 = det([OMEGA] - [phi][I.sub.v]) = det [[[LAMBDA].sup.-1] ([OMEGA] - [phi][I.sub.v]) [LAMBDA]] = det ([[LAMBDA].sup.-1] [OMEGA][LAMBDA] - [phi][I.sub.v]) = det(J - [phi][I.sub.v]), (50)

whence it follows that J is positive definite as well (though it is not symmetric). Since the sum of positive definite matrices is positive definite, we conclude that [[integral].sup.1.sub.0 dt J([rho]; [lambda]" + t([lambda]' - [lambda]")) is positive definite too. As such, it is nonsingular. Therefore, from (47), we conclude that [lambda]' = [lambda]".

5. Numerical Computation of Gaussian Integrals over [B.sub.v]([rho])

Let us now see how to compute [[alpha].sub.klm...] with controlled precision. Most of the relevant work has been originally done by Ruben in [2], where the case of [alpha] is discussed. We extend Ruben's technique to Gaussian integrals containing powers of the integration variable. Specifically, it is shown in [2] that [alpha]([rho]; [lambda]) can be represented as a series of chi-square cumulative distribution functions:

[alpha] ([rho]; [lambda]) = [[infinity].summation over (m=0)] [c.sub.m] (s; [lambda]) [F.sub.v+2m] ([rho]/s). (51)

The scale factor s has the same physical dimension as [rho] and [lambda]. It is introduced in order to factorize the dependence of [alpha] upon [rho] and [lambda] at each order of the expansion. The series on the right-hand side of (51) converges uniformly on every finite interval of [rho]. The coefficients [c.sub.m] are given by

[c.sub.m] (s; [lambda]) = [1/m!] [[s.sup.v/2+m]/[[absolute value of ([LAMBDA])].sup.1/2]] [[GAMMA] (v/2 + m)]/[[GAMMA] (v/2)] M [[(-Q).sup.m]], m = 0, 1, ..., (52)

having defined Q(x) [equivalent to] [x.sup.T][[[LAMBDA].sup.-1] - [s.sup.- 1][I.sub.v]]x for x [member of] [R.sup.v] and M as the uniform average operator on the (v - 1)-sphere [partial derivative][B.sub.v](1) [equivalent to] {u [member of] [R.sup.v] : [u.sup.T]u = 1}; namely,

[mathematical expression not reproducible]. (53)

Unfortunately, (52) is not particularly convenient for numerical computations, since M[[(-Q).sup.m]] is only given in integral form. However, it is also shown in [2] that the coefficients [c.sub.m] can be extracted from the Taylor expansion (at [z.sub.0] = 0) of the generating function

[mathematical expression not reproducible]. (54)

This series converges uniformly for [absolute value of (z)] < [min.sub.i][[absolute value of (1 - s/[[lambda].sub.i])].sup.-1]. On evaluating the derivatives of [psi](z), it is then shown that [c.sub.m]'s fulfill the recursion:

[mathematical expression not reproducible]. (55)

Finally, the systematic error produced on considering only the lowest k terms of the chi-square series of (51) is estimated by

[mathematical expression not reproducible], (56)

with [eta] = [max.sub.i][absolute value of (1 - s/[[lambda].sub.i])].

Now, as mentioned, it is possible to extend the above expansion to all Gaussian integrals [[alpha].sub.klm...]. Here, we are interested only in [[alpha].sub.k] and [[alpha].sub.jk], since these are needed in order to implement the fixed point iteration and to compute the Jacobian matrix of [[tau].sub.[rho]]. The extension is provided by the following.

Theorem 9 (Ruben's expansions). The integrals [[alpha].sub.k] and [[alpha].sub.jk] admit the series representations:

[[alpha].sub.k] ([rho]; [lambda]) = [[infinity].summation over (m=0)] [c.sub.k;m] (s; [lambda]) [F.sub.v+2(m+1)] ([rho]/s), (57)

[[alpha].sub.jk] ([rho]; [lambda]) = [[infinity].summation over (m=0)] [c.sub.jk;m] (s; [lambda]) [F.sub.v+2(m+2)] ([rho]/s), (58)

with s being an arbitrary positive constant. The series coefficients are given, respectively, by

[mathematical expression not reproducible], (59)

[mathematical expression not reproducible], (60)

with [[delta].sub.jk] denoting the Kronecker symbol. The series on the right-hand side of (57)-(58) converge uniformly on every finite interval of [rho]. The functions

[mathematical expression not reproducible], (61)

[mathematical expression not reproducible], (62)

[mathematical expression not reproducible] (63)

are generating functions, respectively, for the coefficients [c.sub.k;m], [c.sub.kk;m] and [c.sub.jk;m] (j [not equal to] k); that is, they fulfill

[[psi].sub.k] (z) = [[infinity].summation over (m=0)][c.sub.k;m] (s; [lambda]) [z.sup.m], [[psi].sub.jk] (z) = [[infinity].summation over (m=0)][c.sub.jk;m] (s; [lambda]) [z.sub.m], (64)

for [absolute value of (z)] < [min.sub.i] [[absolute value of (1 - s/[[lambda].sub.i])].sup.-1]. Finally, the coefficients [c.sub.k;m], [c.sub.kk;m], and [c.sub.jk;m] (j [not equal to] k) can be obtained iteratively from the recursions

[mathematical expression not reproducible]; (65)

[mathematical expression not reproducible], (66)

where the auxiliary coefficients [e.sub.k;i] and [e.sub.jk;i] are defined by

[mathematical expression not reproducible], (67)

[mathematical expression not reproducible]. (68)

It is not difficult to further generalize this theorem, so as to provide a chi-square expansion for any Gaussian integral [[alpha].sub.klm...]. The proof follows closely the original one given by Ruben. We reproduce it in the appendix for [[alpha].sub.k], just to highlight the differences arising when the Gaussian integral contains powers of the integration variable.

Analogously to (56), it is possible to estimate the systematic error produced when considering only the lowest k terms of the chi-square series of [[alpha].sub.k] and [[alpha].sub.jk]. Specifically, we find that

[mathematical expression not reproducible]. (69)

In order to evaluate all Ruben series with controlled uncertainty, we first set (see once more [2] for an exhaustive discussion on how to choose s) s = 2[[lambda].sub.1][[lambda].sub.v]/([[lambda].sub.1] + [[lambda].sub.v]); then we choose a unique threshold [epsilon] representing the maximum tolerable systematic error; for example, [[epsilon].sub.dp] = 1.0 x [10.sup.-14] (roughly corresponding to double floating-point precision), for all [alpha], [[alpha].sub.k], and [[alpha].sub.jk], and finally for each [[alpha].sub.X] we compute the integer

and finally for each [[alpha].sub.X] we compute the integer

[mathematical expression not reproducible], (70)

providing the minimum number of chi-square terms, for which the upper bound [[Real part].sub.X;n] to the residual sum [[Real part].sub.X;n] lies below [epsilon]. Of course, this procedure overshoots the minimum number of terms really required for the R's to lie below [epsilon], since we actually operate on the R's instead of the R's. Nevertheless, the computational overhead is acceptable, as it will be shown in the next section. For the sake of completeness, it must be said that typically the values of kth for [alpha], [[alpha].sub.k], and [[alpha].sub.jk] with the same [epsilon] (and [rho], [lambda]) are not much different from each other.

To conclude, we notice that [] depends nontrivially upon [lambda]. By contrast, since [F.sub.v](x) is monotically increasing in x, we clearly see that kth is monotically increasing in [rho]. Now, should one evaluate [alpha] and the like for a given [lambda] at several values of [rho], say [[rho].sub.1] [less than or equal to] [[rho].sub.2] [less than or equal to] ... [less than or equal to] [[rho].sub.max], it is advisable to save computing resources and work out Ruben coefficients just once, up to the order [] corresponding to [[rho].sub.max], since []([[rho].sub.1]) [less than or equal to] ... [less than or equal to] [] ([[rho].sub.max]). We made use of this trick throughout our numerical experiences, as reported in the sequel.

6. Numerical Analysis of the Reconstruction Process

The fixed point (39) represents the simplest iterative scheme that can be used in order to reconstruct the solution [lambda] = [[tau].sup.-1.sub.[rho]] x [mu]. In the literature of numerical methods, this scheme is known as a nonlinear Gauss-Jacobi (GJ) iteration (see, e.g., [20]). Accordingly, we shall rewrite it as [[lambda].sup.(n+1).sub.GJ,k] = [T.sub.k]([[lambda].sup.(n).sub.GJ]). As we have seen, the sequence [[lambda].sup.(n).sub.GJ] converges with no exception as n [right arrow] [infinity], provided [mu] [member of] D([[tau].sup.- 1.sub.[rho]]). Given [[epsilon].sub.T] > 0, the number of steps [] needed for an approximate convergence with relative precision [[epsilon].sub.T], that is,

[mathematical expression not reproducible], (71)

depends not only upon [epsilon]T, but also on [rho] and [mu] (note that the stopping rule is well conditioned, since [[parallel][[lambda].sup.(n)][parallel].sub.[infinity]] > 0 [for all]n and also [lim.sub.n[right arrow][infinity]] [[parallel][[lambda].sup.(n)][parallel].sub.[infinity]] > 0). In order to characterize statistically the convergence rate of the reconstruction process, we must integrate out the fluctuations of [] due to changes of [mu]; that is, we must average [] by letting [mu] fluctuate across its own probability space. In this way, we obtain the quantity [[bar.n]] [equivalent to] [E.sub.[mu]][[] | [[epsilon].sub.T], [rho]], which better synthesizes the cost of the reconstruction for given [[epsilon].sub.T] and [rho]. It should be evident that carrying out this idea analytically is hard, for on the one hand [] depends upon [mu] nonlinearly and on the other hand [mu] has a complicated distribution, as we briefly explain below.

6.1. Choice of the Eigenvalue Ensemble. Since [lambda] is the eigenvalue spectrum of a full covariance matrix, it is reasonable to assume its distribution to be a Wishart [W.sub.v](p, [[SIGMA].sub.0]) for some scale matrix [[SIGMA].sub.0] and for some number of degrees of freedom p [greater than or equal to] v. In the sequel, we shall make the ideal assumption [[SIGMA].sub.0] = [p.sup.-1] x [I.sub.v], so that the probability measure of [lambda] is (see, e.g., [21])

[mathematical expression not reproducible]. (72)

Under this assumption, the probability measure of [mu] is obtained by performing the change of variable [lambda] = [[tau].sup.- 1.sub.[rho]] x [mu] in (72). Unfortunately, we have no analytic representation of [[tau].sup.-1.sub.[rho]]. Thus, we have neither an expression for the distribution of [mu]. However, [mu] can be extracted numerically as follows:

(i) generate randomly [SIGMA] ~ [W.sub.v](p, [p.sup.-1] x [I.sub.v]) by means of the Bartlett decomposition [22];

(ii) take the ordered eigenvalue spectrum [lambda] of [SIGMA];

(iii) obtain [mu] by applying the truncation operator [[tau].sub.[rho]] to [lambda].

Note that since [W.sub.v](p, [p.sup.-1] x [I.sub.v]) is only defined for p [greater than or equal to] v, we need to rescale p as v increases. The simplest choice is to keep the ratio p/v fixed. The larger this ratio, the closer [SIGMA] fluctuates around [I.sub.v] (recall that if [SIGMA] ~ [W.sub.v](p, [p.sup.-1] x [I.sub.v]), then E[[[SIGMA].sub.ij]] = [[delta].sub.ij] and var([[SIGMA].sub.ij]) = [p.sup.- 1] [1 + [[delta].sub.ij]]). In view of this, large values of p/v are to be avoided, since they reduce the probability of testing the fixed point iteration on eigenvalue spectra characterized by large condition numbers [n.sub.cond] [equivalent to] [[lambda].sub.v]/[[lambda].sub.1]. For this reason, we have set p = 2v in our numerical study.

Having specified an ensemble of matrices from which the eigenvalue spectra are extracted, we are now ready to perform numerical simulations. To begin with, we report in Figure 3 the marginal probability density function of the ordered eigenvalues [{[[lambda].sub.k]}.sup.v.sub.k=1] and their truncated counterparts [{[[mu].sub.k]}.sup.v.sub.k=1] for the Wishart ensemble [W.sub.10](20, [20.sup.-1] x [I.sub.10]) at [rho] = 1, as obtained numerically from a rather large sample of matrices ([equivalent][10.sup.6] units). It will be noted that (i) the effect of the truncation is severe on the largest eigenvalues, as a consequence of the analytic bounds of Corollary 2.1 and Proposition 2.2; (ii) while the skewness of the lowest truncated eigenvalues is negative, it becomes positive for the largest ones. This is due to a change of relative effectiveness of (29) (i) with respect to (29) (ii).

6.2. Choice of the Simulation Parameters. In order to explore the dependence of [[bar.n]] upon [rho], we need to choose one or more simulation points for the latter. Ideally, it is possible to identify three different regimes in our problem: [rho] [??] [[lambda].sub.1] (strong truncation regime), [[lambda].sub.1] [??] [rho] [??] [[lambda].sub.v] (crossover), and [rho] [??] [[lambda].sub.v] (weak truncation regime). We cover all of them with the following set of points:

[rho] [member of] {Mo ([[lambda].sub.1]), ..., Mo ([[lambda].sub.v])} [union] {[1/2] Mo ([[lambda].sub.1]), 2Mo ([[lambda].sub.v])}, (73)

where Mo(x) stands for the mode. In principle, it is possible to determine Mo([[lambda].sub.k]) with high accuracy by using analytic representations of the marginal probability densities of the ordered eigenvalues [24]. In practice, the latter become computationally demanding at increasingly large values of v: for instance, the determination of the probability density of [[lambda].sub.2] requires [(v!).sup.2] sums, which is unfeasible even at v ~ 10. Moreover, to our aims, it is sufficient to choose approximate values, provided that these lie not far from the exact ones. Accordingly, we have determined the eigenvalue modes numerically from samples of N [equivalent] [10.sup.6] Wishart matrices. Our estimates are reported in Table 1 for v = 3, ..., 10. They have been obtained from Grenander's estimator [23]:

[mathematical expression not reproducible], (74)

with properly chosen parameters r, s.

We are now in the position to investigate numerically how many terms in Ruben's expansions must be considered as [epsilon] is set to [[epsilon].sub.dp] = 1.0 x [10.sup.-14], for our choice of the eigenvalue ensemble [lambda] ~ [W.sub.v](2v, [(2v).sup.-1] x [I.sub.v]) and with [rho] set as in Table 1. As an example, we report in Figure 4 the discrete distributions of [] for the basic Gaussian integral [alpha] at v = 10, the largest dimension that we have simulated. As expected, we observe an increase of [] with [rho]. Nevertheless, we see that the number of Ruben's components to be taken into account for a double precision result keeps altogether modest even in the weak truncation regime, which proves the practical usefulness of the chi-square expansions.

6.3. Fixed Point Iteration at Work. The GJ iteration is too slow to be of practical interest. For instance, at v = 10, [rho] [equivalent] Mo([[lambda].sub.1]) and [[epsilon].sub.T] = 1.0 x [10.sup.-7] (corresponding to a reconstruction of [lambda] with single floating-point precision), it is rather easy to extract realizations of [mu] which require [] [equivalent] 15,000 to converge. An improvement of the GJ scheme is achieved via overrelaxation (GJOR); that is,

[[lambda].sup.(0).sub.GJOR,k] = [[mu].sub.k], [[lambda].sup.(n+1).sub.GJOR,k] = [[lambda].sup.(n).sub.GJOR,k] + [omega] [[T.sub.k] ([[lambda].sup.(n).sub.GJOR]) - [[lambda].sup.(n).sub.GJOR,k]], k = 1, ..., v. (75)

Evidently, at [omega] = 1, the GJOR scheme coincides with the standard GJ one. The optimal value [[omega].sub.opt] of the relaxation factor [omega] is not obvious even in the linear Jacobi scheme, where [[omega].sub.opt] depends upon the properties of the coefficient matrix of the system. For instance, if the latter is symmetric positive definite, it is demonstrated that the best choice is provided by [[omega].sub.opt] [equivalent to] 2[(1 + [square root of (1 - [[sigma].sup.2])]).sup.-1], [sigma] being the spectral radius of the Jacobi iteration matrix [25]. In our numerical tests with the GJOR scheme, we found empirically that the optimal value of [omega] at [rho] [less than or equal tol] [[lambda].sub.v] is close to the linear prediction, provided that [sigma] is replaced by [[parallel]J[parallel].sub.[infinity]], J being defined as in Section 3 (note that [[parallel]J[parallel].sub.[infinity]] < 1).

By contrast, the iteration diverges after few steps with increasing probability as [rho]/[[lambda].sub.v] [right arrow] [infinity] if [omega] is kept fixed at [omega] = [[omega].sub.opt]; in order to restore the convergence, [omega] must be lowered towards [omega] = 1 as such limit is taken.

To give an idea of the convergence rate of the GJOR scheme, we show in Figure 5(a) a joint box-plot of the distributions of [] at v = 10 and [[epsilon].sub.T] = 1.0 x [10.sup.-7]. From the plot we observe that the distribution of [] shifts rightwards as [rho] decreases: clearly, the reconstruction is faster if [rho] is in the weak truncation regime (where [mu] is closer to [lambda]), whereas it takes more iterations in the strong truncation regime. The dependence of [[bar.n]] upon [rho], systematically displayed in Figure 6, is compatible with a scaling law

log [[bar.n]] ([rho], v, [[epsilon].sub.T])= a(v, [[epsilon].sub.T]) - b(v, [[epsilon].sub.T])log [rho], (76)

apart from small corrections occurring at large [rho]. Equation (76) tells us that [[bar.n]] increases polynomially in 1/[rho] at fixed v. In order to estimate the parameters a and b in the strong truncation regime (where the algorithm becomes challenging), we performed jackknife fits to (76) of data points with [rho] [??] 1. Results are collected in Figure 5(b), showing that b is roughly constant, while a increases almost linearly in v. Thus, while the cost of the eigenvalue reconstruction is only polynomial in 1/[rho] at fixed v, it is exponential in v at fixed [rho]. The scaling law of the GJOR scheme is therefore better represented by [[bar.n]] = [Ce.sup.[kappa]v]/[[rho].sup.b], with C being a normalization constant independent of [rho] and v and [kappa] representing approximately the slope of a as a function of v. Although the GJOR scheme improves the GJ one, the iteration reveals to be still inefficient in a parameter subspace, which is critical for the applications.

6.4. Boosting the GJOR Scheme. A further improvement can be obtained by letting [omega] depend on the eigenvalue index in the GJOR scheme. Let us discuss how to work out such an adjustment. On commenting on Figure 3, we have already noticed that the largest eigenvalues are affected by the truncation to a larger extent than the smallest ones. Therefore, they must perform a longer run through the fixed point iteration, in order to converge to the untruncated values. This is a possible qualitative explanation for the slowing down of the algorithm as [rho] [right arrow] 0. In view of it, we expect to observe some acceleration of the convergence rate, if [omega] is replaced, for instance, by

[omega] [right arrow] [[omega].sub.k] [equivalent to] (1 + [beta] x k) [[omega].sub.opt], [beta] [greater than or equal to] 0, k = 1, ..., v. (77)

The choice [beta] = 0 corresponds obviously to the standard GJOR scheme. Any other choice yields [[omega].sub.k] > [[omega].sub.opt]. Therefore, the new scheme is also expected to display a higher rate of failures than the GJOR one at [rho] [much greater than] [[lambda].sub.v], for the reason explained in Section 5.3. The componentwise overrelaxation proposed in (77) is only meant to enhance the convergence speed in the strong truncation regime and in the crossover, where the improvement is actually needed.

In order to confirm this picture, we have explored systematically the effect of [beta] on [[bar.n]] by simulating the reconstruction process at v = 3, ..., 10, with [beta] varying from 0 to 2 in steps of 1/5. First of all, we have observed that the rate of failures at large [rho] is fairly reduced if the first 30 / 50 iterations are run with [[omega].sub.k] = [[omega].sub.opt], and only afterwards [beta] is switched on. Having minimized the failures, we have checked that for each value of [beta], the scaling law assumed in (76) is effectively fulfilled. Then, we have computed jackknife estimates of the scaling parameters a and b. These are plotted in Figure 7 as functions of v. Each trajectory (represented by a dashed curve) corresponds to a given value of [beta]. Those with darker markers refer to smaller values of [beta] and the other way round. From the plots we notice that

(i) all the trajectories with [beta] > 0 lie below the one with [beta] = 0;

(ii) the trajectories of a display a clear increasing trend with v; yet their slope lessens as [beta] increases. By contrast, the trajectories of b develop a mild increasing trend with v as [beta] increases, though this is not strictly monotonic;

(iii) the trajectories of both a and b seem to converge to a limit trajectory as [beta] increases; we observe a saturation phenomenon, which thwarts the benefit of increasing [beta] beyond a certain threshold close to [[beta].sub.max] [equivalent] 2.

We add that pushing [beta] beyond [[beta].sub.max] is counterproductive, as the rate of failures becomes increasingly relevant in the crossover and eventually also in the strong truncation regime. By contrast, if [beta] [less than or equal tol] [[beta].sub.max] the rate of failures keeps very low for essentially all simulated values of [rho].

Our numerical results signal a strong reduction of the slowing down of the convergence rate. Indeed, (i) means qualitatively that C and b are reduced as [beta] increases. (ii) means that [kappa] is reduced as [beta] increases (this is the most important effect, as [kappa] is mainly responsible for the exponential slowing down with v). The appearance of a slope in the trajectories of b as [beta] increases indicates that a mild exponential slowing down is also developed at denominator of the scaling law [[bar.n]] = [Ce.sup.[kappa]v]/[[rho].sup.b], but the value of b is anyway smaller than at [beta]=0. Finally, (iii) means that choosing [beta] > [[beta].sub.max] has a minor impact on the performance of the algorithm. In Figure 8, we report a plot of the parameter [kappa] (obtained from least-squares fits of data to a linear model a = [a.sub.0] + [kappa] x v) as a function of [beta]. We see that [kappa]([beta] = 0)/[kappa]([beta] = 2) [equivalent] 4. This quantifies the maximum exponential speedup of the convergence rate, which can be achieved by our proposal. When [beta] is close to [[beta].sub.max], [[bar.n]] amounts to few hundreds at v = 10 and [rho] [equivalent] [[lambda].sub.1]/2.

7. On the Ill-Posedness of the Reconstruction in Sample Space

So far we have discussed the covariance reconstruction under the assumption that [mu]=[[tau].sub.[rho]] x [lambda] represents the exact truncated counterpart of some [lambda] [member of] [R.sup.v] and we have looked at the algorithmic properties of the iteration schemes which operatively define [[tau].sup.-1.sub.[rho]]. Such analysis is essential in order to characterize [[tau].sup.-1.sub.[rho]] mathematically; yet it is not sufficient in real situations, specifically when [mu] is perturbed by statistical noise.

In this section, we examine the difficulties arising when performing the covariance reconstruction in sample space. We first recall that, according to Hadamard [26], a mathematical problem is well posed provided that the following conditions are fulfilled:

([H.sub.1]) there exists always a solution to the problem;

([H.sub.2]) the solution is unique;

([H.sub.3]) the solution depends smoothly on the input data.

Inverse problems are often characterized by violation of one or more of them; see, for instance, [13]. In such cases, the standard practice consists in regularizing the inverse operator, that is, in replacing it by a stable approximation. With regard to our problem, the reader will recognize that ([H.sub.1]) is violated (and the problem becomes ill-posed) as soon as the space of the input data is allowed to be a superset of D([[tau].sup.-1.sub.[rho]]): once clarifying how [mu] is concretely estimated in the applications (Sections 7.1 and 7.2), we propose a perturbative regularization of [[tau].sup.-1.sub.[rho]], which improves effectively the fulfillment of ([H.sub.1]) (Section 7.3). By contrast, Proposition 8 guarantees that whenever a solution exists, it is also unique; thus ([H.sub.2]) is never of concern. Finally, the fulfillment of ([H.sub.3]) depends on how the statistical noise on [mu] is nonlinearly inflated by the action of [[tau].sup.-1.sub.[rho]]. For the sake of conciseness, in the present paper, we just sketch the main ideas underlying the perturbative regularization of [[tau].sup.-1.sub.[rho]], whereas a technical implementation of it and a discussion of (H3) are deferred to a separate paper [27].

7.1. Definition of the Sample Truncated Covariance Matrix. The examples of Section 2 assume that (i) spherical truncations are operated on a representative sample [P.sub.N] = [{[x.sup.(k)]}.sup.N.sub.k=1] of X ~ [N.sub.v](0, [SIGMA]) with finite size N, (ii) [rho] is known exactly, and (iii) the input budget for the covariance reconstruction is given by the subset

[Q.sub.M] = {x [member of] [P.sub.N] : [[parallel]x[parallel].sup.2] < [rho]}, with [absolute value of ([Q.sub.M])] = M [less than or equal to] N. (78)

As usual in the analysis of stochastic variables in sample space, we assume that the observations [x.sup.(k)] are realizations of i.i.d. stochastic variables [X.sup.(k)] ~ [N.sub.v](0, [SIGMA]), k = 1, ..., N. Thus, M is itself a stochastic variable in sample space, where it reads

[mathematical expression not reproducible], (79)

with [mathematical expression not reproducible] denoting the characteristic function of [B.sub.v]([rho]) and [mathematical expression not reproducible] being just a shortcut for its extended counterpart. It is easily recognized that M ~ B(N, [alpha]) is a binomial variate. If we indeed denote by E the sample expectation operator (i.e., the integral with respect to the product measure of the joint variables [{X(k)}.sup.N.sub.k=1]), then a standard calculation yields

[mathematical expression not reproducible]. (80)

Hence, we see that the relative dispersion of M is O([N.sup.-1/2]). Now, the simplest way to measure [SIGMA] and [G.sub.B], respectively, from the sets [P.sub.N] and [Q.sub.M] is via the classical estimators

[mathematical expression not reproducible], (81)

[mathematical expression not reproducible]. (82)

We define the sample estimates [??] and [??], respectively, of [lambda] and [mu] as the eigenvalue spectra of [??] and [[??].sub.B]. By symmetry arguments we see that [[??].sub.i] is unbiased. Indeed, it holds

E [[[??].sub.i]] = [N.summation over (k=1)]E [[X.sup.(k).sub.i] [I.sub.k]/[[summation].sup.N.sub.s=1] [I.sub.s]]. (83)

The right-hand side of (83) makes only sense if we conventionally define the integrand to be zero in the integration subdomain {[X.sup.(k)] [not member of] [B.sub.v]([rho]), [for all]k} or equivalently if we interpret E[[[??].sub.i]] as the conditional one E[[[??].sub.i] | M > 0] (the event M > 0 occurs a.s. only as N [right arrow] [infinity]). Since the sample measure is even under [X.sup.(k)] [right arrow] - [X.sup.(k)] while the integrand is odd, we immediately conclude that bias[[[??].sub.i]] = 0.

7.2. Bias of the Sample Truncated Covariance Matrix. The situation gets somewhat less trivial with [[??].sub.B]: the normalization factor [(M - 1).sup.-1], which has been chosen in analogy with (81), is not sufficient to remove completely the bias of [[??].sub.B] at finite N, though we aim at showing here that the residual bias is exponentially small and asymptotically vanishing. In order to see this, we observe that

[mathematical expression not reproducible], (84)

with Ediag denoting the sample expectation corresponding to a multinormal measure with diagonal covariance matrix [LAMBDA] = diag([lambda]) = [R.sup.T][SIGMA]R, conditioned to M > 1. Having diagonalized the product measure, we observe that the integrand on the right-hand side is odd for l [not equal to] r and even for l = r under the joint change of variables [X.sup.(k).sub.l] [right arrow] - [X.sup.(k).sub.l] for k = 1, ..., N, similarly to what we did in Section 2. As a consequence, it holds

[mathematical expression not reproducible], (85)

whence we infer that the matrix E[[??]B]is diagonalized by the same matrix R as [SIGMA]. From (85) we also conclude that

bias [[[??].sub.B]] = R diag (w) [R.sup.T], [w.sub.i] [equivalent to] [N.summation over (k=1)][E.sub.diag] [[([X.sup.(k).sub.i - [[??].sub.i]).sup.2] [I.sub.k]/[[summation].sup.N.sub.s=1 [I.sub.s] - 1] - [[mu].sub.i], (86)

i = 1, ..., v. It should be observed that in general [w.sub.i] [not equal to] bias[[[??].sub.i]]since the computation of [[??].sub.i] requires the diagonalization of [??]B, which is in general performed by a diagonalizing matrix [??] = R. Nevertheless, if w vanishes then bias[[[??].sub.B]] vanishes too. Now, we observe that [w.sub.i] splits into three contributions:

[mathematical expression not reproducible], (87)

which can be exactly calculated and expressed in terms of [[mu].sub.i], [alpha], and N. For instance,

[mathematical expression not reproducible]. (88)

Analogously, we have

[mathematical expression not reproducible]. (89)

Hence, it follows that

[w.sub.i] = -[[mu].sub.i] [1 + [alpha] (N - 1)][(1 - [alpha]).sup.N-1]. (90)

Since [alpha] > 0, we see that [lim.sub.N[right arrow][infinity]][w.sub.i] = 0. Thus, we conclude that [[??].sub.B] is asymptotically unbiased.

A discussion of the variance of the sample truncated covariance matrix is beyond the scope of the present paper.

We just observe that, apart from the above calculation, studying the sample properties of the truncated spectrum is made hard by the fact that eigenvalues and eigenvectors of a diagonalizable matrix are intimately related from their very definition; thus such study would require a careful consideration of the distribution of the sample diagonalizing matrix [mathematical expression not reproducible].

7.3. Perturbative Regularization of [[tau].sup.-1.sub.[rho]]. When [mu] is critically close to the internal boundary of D([[tau].sup.-1.sub.[rho]]), a sample estimate [??] may fall outside of it due to statistical fluctuations. In that case the iterative procedure described in the previous sections diverges. On the quantitative side, the ill-posedness of the reconstruction problem is measured by the failure probability

[] ([rho], [SIGMA], N) = P[[??] [not member of] D ([[tau].sup.-1.sub.[rho]]) | [X.sup.(k)] ~ [N.sub.v] (0; [SIGMA]), k = 1, ..., N], (91)

which is a highly nontrivial function of [rho], [SIGMA], and N. An illustrative example of it is reported in Figure 9(a), which refers to a specific case with v = 4 and [SIGMA] = diag(0.1, 0.3, 0.8, 2.2). The plot suggests that the iterative procedure becomes severely illposed in the regime of strong truncation.

In order to regularize the problem, we propose to go back to (19) and consider it from a different perspective. Specifically, we move from the observation that a simplified framework occurs in the special circumstance when the eigenvalue spectra are fully degenerate, which is essentially equivalent to the setup of [1]. If [[mu].sub.1] = ... = [[mu].sub.v] [equivalent to] [??], by symmetry arguments it follows that [[lambda].sub.1] = ... =[[lambda].sub.v] [equivalent to] [??] and the other way round. Equation (19) reduces in this limit to

[mathematical expression not reproducible]. (92)

It can be easily checked that the function [T.sub.[rho]([??])] is monotonically increasing in [??]. In addition, we have

(i) [mathematical expression not reproducible],

(ii) [mathematical expression not reproducible]; (93)

thus (92) can be surely (numerically) inverted provided that 0 < [??] < [rho]/(v + 2). We can regard (92) as an approximation to the original problem (19). When [mu] is not degenerate, we must define [??] in terms of the components of [mu]. One possibility is to average them, that is, to choose

[??] = [1/v] [v.summation over (i=1)] [[mu].sub.i]. (94)

Subject to this, we expect [??] to lie somewhere between [[lambda].sub.1] and [[lambda].sub.v]. Equation (92) can be thought of as the lowest order approximation of a perturbative expansion of (19) around the point [[lambda].sub.T] = {[??], ..., [??]}. If the condition number of [SIGMA] is not extremely large, such an expansion is expected to quickly converge, so that a few perturbative corrections to [[lambda].sub.T] should be sufficient to guarantee a good level of approximation.

As mentioned above, a technical implementation of the perturbative approach and a thorough discussion of its properties are deferred to a separate paper [27]. Here, we limit ourselves to observing that the definition domain of perturbation theory is ultimately set by its lowest order approximation, since corrections to (92) are all algebraically built in terms of it, with no additional constraints. Following (94), the domain of [T.sup.-1.sub.[rho]] comes to be defined as

D ([T.sup.-1.sub.[rho]]) = {[mu] [member of] [R.sup.v.sub.+] : [N.summation over (i=1)] [[mu].sub.i] [less than or equal to] [rho]v/[v + 2]}, (95)

and it is clear that D([[tau].sup.-1.sub.[rho]]) [subset] D([T.sup.- 1.sub.[rho]]) (it is sufficient to sum term by term all the inequalities contributing to (36)). In Figure 10, we show the set difference D([T.sup.-1.sub.[rho]]) \ D([[tau].sup.- 1.sub.[rho]]) in v = 2 and v = 3 dimensions. When [mu] [member of] D([[tau].sup.-1.sub.[rho]]) but its estimate [??] [not member of] D([[tau].sup.-1.sub.[rho]]), it may well occur [??] [member of] D([T.sup.-1.sub.[rho]]); that is, the set difference acts as an absorbing shield of the statistical noise. Therefore, if we define the failure probability of the perturbative reconstruction as

[] ([rho], [SIGMA], N) = P[[??] [not member of] D ([T.sup.- 1.sub.[rho]]) | [X.sup.(k)] ~ [N.sub.v] (0; [SIGMA]), k = 1, ..., N], (96)

we expect the inequality []([rho], [SIGMA], N) [much less than] [] ([rho], [SIGMA], N) to generously hold. An example is given in Figure 9(b): we see that [] becomes lower than [] by orders of magnitude as soon as [rho] and N are not exceedingly small. In this sense, the operator [T.sup.-1.sub.[rho]] can be regarded as the lowest order approximation of a regularizing operator for [[tau].sup.-1.sub.[rho]].

8. Conclusions

In this paper we have studied how to reconstruct the covariance matrix [SIGMA] of a normal multivariate X ~ [N.sub.v](0, [SIGMA]) from the matrix [G.sub.B] of the spherically truncated second moments, describing the covariances among the components of X when the probability density is cut off outside a centered Euclidean ball. We have shown that [SIGMA] and [G.sub.B] share the same eigenvectors. Therefore, the problem amounts to relating the eigenvalues of [SIGMA] to those of [G.sub.B]. Such relation entails the inversion of a system of nonlinear integral equations, which admits unfortunately no closed-form solution. Having found a necessary condition for the invertibility of the system, we have shown that the eigenvalue reconstruction can be achieved numerically via a converging fixed point iteration. In order to prove the convergence, we rely ultimately upon some probability inequalities, known in the literature as square correlation inequalities, which have been recently proved in [17].

In order to explore the convergence rate of the fixed point iteration, we have implemented some variations of the nonlinear Gauss-Jacobi scheme. Specifically, we have found that overrelaxing the basic iteration enhances the convergence rate by a moderate factor. However, the overrelaxed algorithm still slows down exponentially in the number of eigenvalues and polynomially in the truncation radius of the Euclidean ball. We have shown that a significant reduction of the slowing down can be achieved in the regime of strong truncation by adapting the relaxation parameter to the eigenvalue that is naturally associated with, so as to boost the higher components of the spectrum.

We have also discussed how the iterative procedure works when the eigenvalue reconstruction is performed on sample estimates of the truncated covariance spectrum. Specifically, we have shown that the statistical fluctuations make the problem ill-posed. We have sketched a possible way out based on perturbation theory, which is thoroughly discussed in a separate paper [27].

A concrete implementation of the proposed approach requires the computation of a set of multivariate Gaussian integrals over the Euclidean ball. For this, we have extended to the case of interest a technique, originally proposed by Ruben for representing the probability content of quadratic forms of normal variables as a series of chi-square distributions. In the paper, we have shown the practical feasibility of the series expansion for the integrals involved in our computations.


Proof of Theorem 9. As already mentioned in Section 4, the proof follows in the tracks of the original one of [2]. We detail the relevant steps for [[alpha].sub.k], while for [[alpha].sub.jk] we only explain why it is necessary to distinguish between equal or different indices and the consequences for either case.

In order to prove (57), we first express [[alpha].sub.k] in spherical coordinates; that is, we perform the change of variable x = ru, being r = [parallel]x[parallel] and u [member of] [partial derivative][B.sub.v](1) (recall that [d.sup.v]x = [r.sup.v-1]dr du, with du embodying the angular part of the spherical Jacobian and the differentials of v - 1 angles); then we insert a factor of 1 = exp([r.sup.2]/2s)exp(-[r.sup.2]/2s) under the integral sign. Hence, [[alpha].sub.k] reads

[mathematical expression not reproducible]. (A.1)

The next step consists in expanding the inner exponential in Taylor series (in his original proof, Ruben considers a more general setup, with the center of the Euclidean ball shifted by a vector b [member of] [R.sup.v] from the center of the distribution. In that case, the Gaussian exponential looks different and must be expanded in series of Hermite polynomials. Here, we work in a simplified setup, where the Hermite expansion reduces to Taylor's); namely,

exp(-[Q (u) [r.sup.2]]/2) = [[infinity].summation over (m=0)] 1/m! [r.sup.2m]/[2.sup.m][(-Q).sup.m]. (A.2)

This series converges uniformly in u. We review the estimate just for the sake of completeness:

[mathematical expression not reproducible], (A.3)

where [q.sub.0] = [max.sub.i][absolute value of (1/s - 1/[[lambda].sub.i])]. It follows that we can integrate the series term by term. With the help of the uniform average operator introduced in (53), [[alpha].sub.k] is recast to

[mathematical expression not reproducible]. (A.4)

The presence of an additional factor of [u.sup.2.sub.k] in the angular average is harmless, since [absolute value of ([u.sup.2.sub.k])] < 1. We finally notice that the radial integral can be expressed in terms of a cumulative chi-square distribution function on replacing r [right arrow] [square root of (rs)]; namely,

[[integral].sup.[square root of ([rho])].sub.0] dr [r.sup.v+2m+1] exp(- [r.sup.2]/2s) = [2.sup.v/2+m][s.sup.v/2+m+1] [GAMMA] (v/2 + m + 1) [F.sub.v+2(m+1)] ([rho]/s). (A.5)

Inserting (A.5) into (A.4) results in Ruben's representation of [[alpha].sub.k]. This completes the first part of the proof.

As a next step, we wish to demonstrate that the function [[psi].sub.k] of (61) is the generating function of the coefficients [c.sub.k;m]. To this aim, we first recall the identities

[mathematical expression not reproducible], (A.6)

valid for a > 0. On setting [a.sub.i] = [1 - (1 - s/[[lambda].sub.i])z], we see that [[psi].sub.k] can be represented in the integral form

[mathematical expression not reproducible], (A.7)

provided [absolute value of (z)] < [min.sub.i][[absolute value of (1 - s/[[lambda].sub.i])].sup.-1]. As previously done, we introduce spherical coordinates x = ru and expand exp{-(1/2)zsQ(x)} = exp{-(1/2)zs[r.sup.2]Q(u)} in Taylor series.

By the same argument as above, the series converges uniformly in u (the factor of zs does not depend on u), thus allowing term-by-term integration. Accordingly, we have

[mathematical expression not reproducible]. (A.8)

We see that the right-hand side of (A.8) looks similar to (A.4), the only relevant differences being the presence of the factor of [z.sup.m] under the sum sign and the upper limit of the radial integral. With some algebra, we arrive at

[[psi].sub.k] (z) = [[infinity].summation over (m=0)] [z.sup.m] {[2/m!] [s/[[lambda].sub.k] [s.sup.v/2+m]/[[absolute value of ([LAMBDA])].sup.1/2] [[GAMMA] (v/2 + m + 1)]/[[GAMMA] (v/2)] x M [[(-Q).sup.m] [u.sup.2.sub.k]}. (A.9)

The series coefficients are recognized to be precisely those of (59).

In the last part of the proof, we derive the recursion fulfilled by the coefficients [c.sub.k;m]. To this aim, the mth derivative of [[psi].sub.k] has to be evaluated at z = 0 and then identified with m![c.sub.k;m]. The key observation is that differentiating [[psi].sub.k] reproduces [[psi].sub.k] itself; that is to say,

[[psi]'.sub.k] (z) = [[PSI].sub.k] (z) [[psi].sub.k] (z), (A.10)


[[PSI].sub.k] (z) = 1/2 [v.summation over (i=1)] [e.sub.k;i] (1 - s/[[lambda].sub.i]) [[1 - (1 - s/[[lambda].sub.i]) z].sup.-1], (A.11)

and the auxiliary coefficient [e.sub.k;i] being defined as in (67). Equation (A.10) lies at the origin of the recursion. Indeed, from (A.10), it follows that that [[psi]".sub.k] is a function of [[psi]'.sub.k] and [[psi].sub.k]; namely, [[psi]".sub.k] = [[PSI]'.sub.k][[psi].sub.k] + [[PSI].sub.k][[psi].sub.'k]. Proceeding analogously yields the general formula

[mathematical expression not reproducible]. (A.12)

At z = 0, this reads

m![c.sub.k;m] = [m-1.summation over (r=0)] (m-1)!/(m - r - 1)!r! [[PSI].sup.(m-r-1).sub.k] (0) r![c.sub.k;r]. (A.13)

The last step consists in proving that

[[PSI].sup.(m).sub.k] (0) = 1/2 m![g.sub.k;m+1], (A.14)

with [g.sub.k;m] defined as in (65). This can be done precisely as explained in [2].

Having reiterated Ruben's proof explicitly in a specific case, it is now easy to see how the theorem is extended to any other Gaussian integral. First of all, from (A.1), we infer that each additional subscript in [[alpha].sub.klm ...] enhances the power of the radial coordinate under the integral sign by 2 units. This entails a shift in the number of degrees of freedom of the chi-square distributions in Ruben's expansion, amounting to twice the number of subscripts. For instance, since [[alpha].sub.jk] has two subscripts, its Ruben's expansion starts by [F.sub.v+4], independently of whether j = k or j [not equal to] k. In second place, we observe that in order to correctly identify the generating functions of Ruben's coefficients for a higher-order integral [[alpha].sub.klm ...], we need to take into account the multiplicities of the indices k, l, m, .... As an example, consider the case of [[psi].sub.jk] (j [not equal to] k) and [[psi].sub.kk]. By going once more through the argument presented in (A.7), we see that (A.6) are sufficient to show that (63) is the generating function of [[alpha].sub.jk]. By contrast, in order to repeat the proof for the case of [[psi].sub.kk], we need an additional integral identity; namely,

[a.sup.-5/2] = [1/3] [(2[pi]).sup.-1/2] [[integral].sup.+[infinity].sub.- [infinity]] dx [x.sup.4] exp (-a/2 [x.sup.2]), (A.15)

valid once more for a > 0. Hence, we infer that [[psi].sub.kk] must depend upon [[lambda].sub.k] via a factor of [[1 - (1 - s/[[lambda].sub.k])z].sup.-5/2], whereas [[psi].sub.jk] (j [not equal to] k) must depend on [[lambda].sub.j] and [[lambda].sub.k] via factors of, respectively, [[1 - (1 - s/[[lambda].sub.j])z].sup.-3/2] and [[1 - (1 - s/[[lambda].sub.k])z].sup.-3/2]. The different exponents are ultimately responsible for the specific values taken by the auxiliary coefficients [e.sub.kk;i] and [e.sub.jk;i] of (68).

To conclude, we observe that the estimates of the residuals [R.sub.k;m] and [R.sub.jk;m], presented in Section 4 without an explicit proof, do not require any further technical insight than already provided by [2] plus our considerations. We leave them to the reader, since they can be obtained once more in the tracks of the original derivation of [R.sub.m].

Competing Interests

The authors declare that they have no competing interests.


The authors are grateful to A. Reale for encouraging them throughout all stages of this work and to G. Bianchi for technical support at ISTAT. They also thank R. Mukerjee and S. H. Ong for promptly informing them about their proof of (22) and (23). The computing resources used for their numerical study and the related technical support at ENEA have been provided by the CRESCO/ENEAGRID High Performance Computing infrastructure and its staff [28]. CRESCO (Computational Research Centre for Complex Systems) is funded by ENEA and by Italian and European research programmes.


[1] G. M. Tallis, "Elliptical and radial truncation in normal populations," The Annals of Mathematical Statistics, vol. 34, no. 3, pp. 940-944, 1963.

[2] H. Ruben, "Probability content of regions under spherical normal distributions. IV. The distribution of homogeneous and non-homogeneous quadratic functions of normal variables," The Annals of Mathematical Statistics, vol. 33, pp. 542-570, 1962.

[3] J. Aitchison, The Statistical Analysis of Compositional Data, Monographs on Statistics and Applied Probability, Chapman and Hall, London, UK, 1986.

[4] V. Pawlowsky-Glahn and A. Buccianti, Compositional Data Analysis: Theory and Applications, John Wiley & Sons, 2011.

[5] J. J. Egozcue, V. Pawlowsky-Glahn, G. Mateu-Figueras, and C. Barcelo-Vidal, "Isometric logratio transformations for compositional data analysis," Mathematical Geology, vol. 35, no. 3, pp. 279-300, 2003.

[6] J. Aitchison and S. M. Shen, "Logistic-normal distributions: some properties and uses," Biometrika, vol. 67, no. 2, pp. 261-272, 1980.

[7] V. Pawlowsky-Glahn and J. J. Egozcue, "Geometric approach to statistical analysis on the simplex," Stochastic Environmental Research and Risk Assessment, vol. 15, no. 5, pp. 384-398, 2001.

[8] P. Filzmoser and K. Hron, "Outlier detection for compositional data using robust methods," Mathematical Geosciences, vol. 40, no. 3, pp. 233-248, 2008.

[9] A. C. Atkinson, "Fast very robust methods for the detection of multiple outliers," Journal of the American Statistical Association, vol. 89, no. 428, pp. 1329-1339, 1994.

[10] M. Riani, A. C. Atkinson, and A. Cerioli, "Finding an unknown number of multivariate outliers," Journal of the Royal Statistical Society B, vol. 71, part 2, pp. 447-466, 2009.

[11] W. C. Guenther, "An easy method for obtaining percentage points of order statistics," Technometrics, vol. 19, no. 3, pp. 319-321, 1977.

[12] H. W. Engl, M. Hanke, and A. Neubauer, Regularization of Inverse Problems. Mathematics and Its Applications, vol. 375, Springer, Berlin, Germany, 1996.

[13] L. Cavalier, "Inverse problems in statistics," in Inverse Problems and High-Dimensional Estimation: Stats in the Chateau Summer School, August 31-September 4, 2009, P. Alquier, E. Gautier, and G. Stoltz, Eds., vol. 203 of Lecture Notes in Statistics, pp. 3-96, Springer, Berlin, Germany, 2011.

[14] F. Palombi and S. Toti, "A note on the variance of the square components of a normal multivariate within a Euclidean ball," Journal of Multivariate Analysis, vol. 122, pp. 355-376, 2013.

[15] M. Anttila, K. Ball, and I. Perissinaki, "The central limit problem for convex bodies," Transactions of the American Mathematical Society, vol. 355, no. 12, pp. 4723-4735, 2003.

[16] J. O. Wojtaszczyk, "The square negative correlation property for generalized Orlicz balls," in Proceedings of the Geometric Aspects of Functional Analysis, Israel Seminar, pp. 305-313, 2004-2005.

[17] R. Mukerjee and S. H. Ong, "Variance and covariance inequalities for truncated joint normal distribution via monotone likelihood ratio and log-concavity," Journal of Multivariate Analysis, vol. 139, pp. 1-6, 2015.

[18] F. G. Friedlander and M. S. Joshi, Introduction to the Theory of Distributions, Cambridge University Press, Cambridge, UK, 2nd edition, 1998.

[19] M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover Publications, New York, 1964., New York, NY, USA, 1964.

[20] K. L. Judd, Numerical Methods in Economics, The MIT Press, Cambridge, Mass, USA, 1998.

[21] T.W. Anderson, An Introduction to Multivariate Statistical Analysis, Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics, John Wiley & Sons, New York, NY, USA, 2nd edition, 1984.

[22] M. S. Bartlett, "XX.--on the theory of statistical regression," Proceedings of the Royal Society of Edinburgh, vol. 53, pp. 260-283, 1934.

[23] U. Grenander, "Some direct estimates of the mode," The Annals of Mathematical Statistics, vol. 36, no. 1, pp. 131-138, 1965.

[24] A. Zanella, M. Chiani, and M. Z. Win, "On the marginal distribution of the eigenvalues of wishart matrices," IEEE Transactions on Communications, vol. 57, no. 4, pp. 1050-1060, 2009.

[25] D. M. Young, Iterative Solution of Large Linear Systems, Computer Science and Applied Mathematics, Academic Press, 1971.

[26] J. Hadamard, "Sur les problemes aux derives partielles et leur signification physique," Princeton University Bulletin, vol. 13, pp. 49-52, 1902.

[27] F. Palombi and S. Toti, "A perturbative approach to the reconstruction of the eigenvalue spectrum of a normal covariance matrix from a spherically truncated counterpart," https://arxiv .org/abs/1207.1256.

[28] G. Ponti, F. Palombi, D. Abate et al., "The role of medium size facilities in the HPC ecosystem: the case of the new CRESCO4 cluster integrated in the ENEAGRID infrastructure," in Proceedings of the International Conference on High Performance Computing and Simulation (HPCS '14), 6903807, pp. 1030-1033, July 2014.

Filippo Palombi, (1,2) Simona Toti, (1) and Romina Filippini (1)

(1) Istituto Nazionale di Statistica (ISTAT), Via Cesare Balbo 16, 00184 Rome, Italy

(2) Italian Agency for New Technologies, Energy and Sustainable Economic Development (ENEA), Via Enrico Fermi 45, 00044 Frascati, Italy

Correspondence should be addressed to Filippo Palombi;

Received 7 July 2016; Revised 7 October 2016; Accepted 12 October 2016; Published 10 January 2017

Academic Editor: Ramon M. Rodriguez-Dagnino

Caption: Figure 1: A classical particle beam with elliptical transversal profile is cut off upon entering a circular coaxial pipeline.

Caption: Figure 2: (a) Numerical reconstruction of D([[tau].sup.-1.sub.[rho]]) in v = 2 dimensions. (b) Numerical reconstruction of D([[tau].sup.-1.sub.[rho]]) in v = 3 dimensions.

Caption: Figure 3: Monte Carlo simulation of the probability density function of the ordered eigenvalues [[lambda].sub.k] (even rows) and their truncated counterparts [[mu].sub.k] at [rho] = 1 (odd rows) for the Wishart ensemble [W.sub.10](20, [20.sup.-1] x [I.sub.10]). The last two plots (bottom right) display the distribution of the sum of eigenvalues.

Caption: Figure 4: Monte Carlo simulation of the probability mass function of the parameter kth for the Gaussian probability content [alpha]. The histograms refer to the eigenvalue ensemble [lambda] of [SIGMA] ~ [W.sub.10](20, [20.sup.-1] x [I.sub.10]), with [rho] chosen as in Table 1 and [epsilon] = 1.0 x [10.sub.-14].

Caption: Figure 6: Log-log plots of [[bar.n]] versus [rho] in the GJOR scheme at [[epsilon].sub.T] = 1.0 x [10.sub.-7]. The parameter [rho] has been chosen as in Table 1. The (red) dashed line in each plot represents our best jackknife linear fit to (76) of data points with [rho] [less than or equal tol] 1.

Caption: Figure 7: Scaling parameters a and b of the modified GJOR scheme as functions of v at [[epsilon].sub.T] = 1.0 x [10.sup.-7], with [beta] varying in the range 0/2 in steps of 1/5. Each trajectory (represented by a dashed curve) refers to a different value of [beta]. Those with darker markers correspond to smaller values of [beta] and the other way round.

Caption: Figure 8: The parameter [kappa] as a function of [beta]. Estimates of [kappa] are obtained from least-squares fits of data to a linear model a = [a.sub.0] + [kappa]x v.

Caption: Figure 9: (a) Numerical reconstruction of the failure probability of the iterative procedure as V = 4 and [SIGMA] = diag(0.1, 0.3, 0.8, 2.2), for several values of [rho] and for N = 200, 250, ..., 1000. (b) Failure probability of the perturbative regularization with the same parameters.

Caption: Figure 10: (a) Set difference D([T.sup.-1.sub.[rho]]) D([[tau].sup.-1.sub.[rho]) in v = 2 dimensions. (b) Set difference D([T.sup.-1.sub.[rho]]) \ D([[tau].sup.-1.sub.[rho]]) in v = 3 dimensions.
Table 1: Numerical estimates of the mode of the ordered eigenvalues
{[[lambda].sub.1] ..., [[lambda].sub.v]} of [SIGMA] ~ [W.sub.v](2v,
[(2v).sup.-1] x [I.sub.V]) with v = 3, ..., 10. The estimates have
been obtained from Grenander's mode estimator [23].

                          v = 3    v = 4    v = 5    v = 6

[??]([[lambda].sub.1])    0.1568   0.1487   0.1435   0.1383
[??]([[lambda].sub.2])    0.6724   0.4921   0.4017   0.3424
[??]([[lambda].sub.3])    1.6671   1.0112   0.7528   0.6071
[??]([[lambda].sub.4])      --     1.8507   1.2401   0.9621
[??]([[lambda].sub.5])      --       --     2.0150   1.4434
[??]([[lambda].sub.6])      --       --       --     2.1356
[??]([[lambda].sub.7])      --       --       --       --
[??]([[lambda].sub.8])      --       --       --       --
[??]([[lambda].sub.9])      --       --       --       --
[??]([[lambda].sub.10])     --       --       --       --

                          v = 7    v = 8    v = 9    v = 10

[??]([[lambda].sub.1])    0.1344   0.1310   0.1269   0.1258
[??]([[lambda].sub.2])    0.3039   0.2745   0.2554   0.2399
[??]([[lambda].sub.3])    0.5138   0.4543   0.4048   0.3693
[??]([[lambda].sub.4])    0.7854   0.6684   0.5858   0.5288
[??]([[lambda].sub.5])    1.1269   0.9263   0.7956   0.7032
[??]([[lambda].sub.6])    1.5789   1.2559   1.0527   0.9111
[??]([[lambda].sub.7])    2.2190   1.6764   1.3673   1.1603
[??]([[lambda].sub.8])      --     2.2763   1.7687   1.4624
[??]([[lambda].sub.9])      --       --     2.3210   1.8473
[??]([[lambda].sub.10])     --       --       --     2.3775

Figure 5: (a) Box-plot of [] in the GJOR scheme at v = 10,
with [[epsilon].sub.T] = 1.0 x [10.sup.-7] and [rho] chosen as in
Table 1. The distributions have been reconstructed from a sample of
N [equivalent] [10.sup.3] eigenvalue spectra extracted from
[W.sub.10](20, [20.sup.-1] x [I.sub.10]). The whiskers extend to the
most extreme data point within (3/2)(75%-25%) data range. (b) Numerical
estimates of the scaling parameters a and b of the GJOR scheme, as
obtained from jackknife fits to (76) of data points with [rho] [less
than or equal tol] 1 and [[epsilon].sub.T] = 1.0 x [10.sup.-7]. We
quote in  parentheses the jackknife error.


v     [??](jk. err.)    [??](jk. err.)

3         4.93(3)          0.940(1)
4         5.25(2)          0.940(1)
5         5.44(2)          0.948(1)
6         5.65(1)          0.953(1)
7         5.85(1)          0.948(1)
8         6.02(1)          0.951(1)
9         6.18(1)          0.946(1)
10        6.31(1)          0.948(1)
COPYRIGHT 2017 Hindawi Limited
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2017 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Research Article
Author:Palombi, Filippo; Toti, Simona; Filippini, Romina
Publication:Journal of Probability and Statistics
Article Type:Report
Geographic Code:1USA
Date:Jan 1, 2017
Previous Article:Probability of Dying in Each of the Competing Risks under Bimorbid Condition.
Next Article:Polyhedral Star-Shaped Distributions.

Terms of use | Privacy policy | Copyright © 2021 Farlex, Inc. | Feedback | For webmasters |