# Direct determination of smoothing parameter for penalized spline regression.

1. IntroductionPenalized spline methods are a well-known efficient technique for nonparametric smoothing. Penalized splines were suggested by O'Sullivan [1] and Eilers and Marx [2]. In O'Sullivan [1], they used a cubic B-spline function and the penalty was the integrated squared second derivative of the B-spline function. On the other hand, Eilers and Marx [2] use a cubic B-spline function and a difference penalty on the spline coefficients. The Eilers and Marx's estimator is computationally efficient compared to smoothing splines and O'Sullivan's estimator since it removes the integration part of the penalty. Hence this paper focuses on the penalized spline estimator provided via Eilers and Marx [2]. The penalized spline method is efficient for both univariate regression and multiple regressions such as the additive model (see Marx and Eilers [3]). General properties usages and a description of the flexibility of penalized splines are described in Ruppert et al. [4].

When using penalized splines, the determination of the smoothing parameter is very important since it controls the trade-off between the goodness of fit and the smoothness of the fitted curve. As the classical method for achieving this, the grid search method is often used. The grid search method is selected by minimizing one criterion from candidate points of the smoothing parameter. Criteria for grid searches include cross-validation, generalized cross-validation, Mallow's [C.sub.p], and so forth. Although the grid search selection generally finds one optimal smoothing parameter, it is possible that the worth curve is obtained when not all the candidates are good. This tendency is especially striking in additive models since the number of the smoothing parameter is the same as that of the covariates. Several smoothing parameter selection methods using the grid search criteria have been developed by many authors such as Krivobokova [5], Reiss and Ogden [6], Wood [7], Wood [8], and Wood [9]. On the other hand, the mixed model representation of the spline smoothing has also been studied (see Lin and Zhang [10], Wand [11], and Ruppert et al. [4]). In mixed models, the grid search method is not necessary to obtain the final fit curve. The smoothing parameter in the mixed model can be written as the ratio of the variance of the random coefficient and the error. By estimating these unknown variances using a maximum likelihood method or a restricted maximum likelihood method (REML), the final fitted curves are obtained, yielding the estimated best linear unbiased predictor (EBLUP). Therefore the EBLUP does not require a grid search. However the fitted curve tends to theoretically oversmooth and the numerical stability is not guaranteed if a cubic spline is used (see Section 3). The Bayesian approach to select the smoothing parameter has been studied by Fahrmeir et al. [12], Fahrmeir and Kneib [13], and Heinzl et al. [14]. Kauermann [15] compared some smoothing parameter selection methods.

In this paper, we propose a new method to determining the smoothing parameter using the asymptotic properties of the penalized splines. For the remainder of this paper, our new method will be known as the direct method. Before describing the outline of the direct method, we will briefly introduce the asymptotic studies of penalized splines. First, Hall and Opsomer [16] showed the consistency of the penalized spline estimator in white noise representation. Subsequently, Li and Ruppert [17], Claeskens et al. [18], Kauermann et al, [19], and Wang et al. [20] have developed the asymptotics for the penalized spline estimator in univariate regression. Yoshida and Naito [21] and Yoshida and Naito [22] have studied the a symptotics for penalized splines in additive regression models and generalized additive models, respectively. Xiao et al. [23] suggested a new penalized spline estimator, and developed its asymptotic properties in bivariate regression. Thus, the developments of the asymptotic theories of the penalized splines are relatively recent events. In addition, the smoothing parameter selection methods using asymptotic properties have not yet been studied. This motivates us to try to establish such methods.

The direct method is conducted by minimizing the mean integrated squared error (MISE) of the penalized spline estimator. In general, the MISE of the nonparametric estimator is divided into the integrated squared bias and the integrated variance of the estimator. Of course the penalized spline estimator is no exception and hence the direct method is stated by utilizing the expression of the asymptotic bias and variance of the penalized spline estimator, which have been derived by Claeskens et al. [18], Kauermann et al. [19], and Yoshida and Naito [22]. From their result, we see that the asymptotic order of the variance of the penalized spline estimator is only dependent on the sample size and the number of knots but not the smoothing parameter. However the second term of the asymptotic variance of the penalized spline estimator contains the smoothing parameter and we can see that the variance becomes small when the smoothing parameter increases. On the other hand, the squared bias of the penalized spline estimator increases if the smoothing parameter is reduced. Therefore the minimizer of the MISE of the penalized spline estimator can be seen as one of the optimal smoothing parameters. Since the MISE is asymptotically convex with respect to the smoothing parameter, the global minimum of the MISE can be found. This detection has been sufficiently developed for bandwidth selection in kernel regression (see Ruppert et al. [24],Wand and Jones [25], etc.). First the present paper focuses on univariate regression, and we next extend the direct method to the additive models. In both models, the mathematical and the numerical properties of the direct method are studied. In additive models, we need to select a smoothing parameter of the same number as the explanatory variable, such that the computational cost of the grid search becomes large. We expect that the computational cost of the direct method is dramatically reduced compared to the grid search.

The structure of this paper is as follows. In Section 2, we introduce a penalized spline estimator in a univariate regression model. Section 3 provides the direct method and related properties. Section 4 extends the direct method to the additive model. In Section 5, we confirm the performance of the direct method using a numerical study. We provide a discussion on the outlook and further studies in Section 6. The proofs of our theorems are provided in the appendix.

2. Penalized Spline Estimator

Consider the regression problem with n observations,

[Y.sub.i] = f([x.sub.i]) + [[epsilon].sub.i], i = 1, ..., n, (1)

where [Y.sub.i] is the response variable, [x.sub.i] is the explanatory variable, f is the true regression function, and [[epsilon].sub.i] is the random error which is assumed to be independently distributed with mean 0 and variance [[sigma].sup.2]. Throughout the paper we assume the explanatory variable [x.sub.i] [member of] [a, b] (i = 1, ..., n) is not a random variable from which the expectation of [Y.sub.i] can be expressed as E[[Y.sub.i]| [x.sub.i]] = f([x.sub.i]). The support of the explanatory [x.sub.i] can be relaxed as the real space R. In order to simplify the way of locating the knots in the following, the support of the explanatory is assumed to be with compact space. We aim to estimate f via a nonparametric penalized spline method. We consider the knots a = [[kappa].sub.0] < [[kappa].sub.1] < *** < [[kappa].sub.K] < [[kappa].sub.K+1] = b and, for k = - p, ..., K - 1, let [B.sup.[p].k](x) be the pth degree B-spline basis function defined as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (2)

associated with the above knots and the additional knots [[kappa].sub.-p] = [[kappa].sub.-p+1] = *** = [[kappa].sub.0] and [[kappa].sub.K+1] = *** = [[kappa].sub.K+p+1]. The B-spline function [B.sup.[p].sub.k](x) is a piecewise pth degree polynomial on an interval [[[kappa].sub.k], [[kappa].sub.k+p+1]]. The details of the B-spline basis functions are described in de Boor [26]. For simplicity, we write [B.sub.k](x) = [B.sup.[p].sub.k](x) since we do not specify the p in the following sentence.

We use the linear combination of {[B.sub.k](x) | k = -p, ..., K - 1} and the unknown parameters [b.sub.k](k = -p, ..., K - 1) to approximate the regression function and consider the B-spline regression problem,

[Y.sub.i] = s([x.sub.i]) + [[epsilon].sub.i], i = 1, ..., n, (3)

where

s (x) = [K-1.summation over (k=-p)][B.sub.k](x) [b.sub.k]. (4)

The purpose is to estimate the parameter b = [([b.sub.-p] *** [b.sub.K-1]).sup.T] included in s(x) instead of f directly.

The penalized spline estimator [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] of b is defined as the minimizer of

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5)

where [lambda] is the smoothing parameter and [DELTA] is the backward difference operator defined as [DELTA][b.sub.k] = [b.sub.k] - [b.sub.k-1] and

[[DELTA].sup.m] ([b.sub.k]) = [[DELTA].sup.m-1] ([DELTA][b.sub.k]) = [m.summation over (j=0)][(-1).sup.m-j] m [C.sub.j][b.sub.k-m+j]. (6)

Let [D.sub.m] = [([d.sup.(m).sub.ij]).sub.ij] be a (K + p - m) x (K + p) matrix, where [d.sup.(m).ij] = [(-1).sup.[absolute value of i-j].m] [C.sup.[absolute value of i-j]] for i [less than or equal to] j [less than or equal to] m+1 and 0 other wise. Using the notation Y = [([Y.sub.1] *** [Y.sub.n]).sup.T] and Z = ([B.sub.-p+j-1][([x.sub.i])).sub.ij], (5)

can then be expressed as

[(Y - Zb).sup.T] (Y - Zb) + [lambda][b.sup.T][D.sup.T.sub.m][D.sub.m]b. (7)

The minimum of (7) is obtained when

[??] = [([Z.sup.T]Z + [lambda][D.sup.T.sub.m][D.sub.m]).sup.-1][Z.sup.T]Y. (8)

The penalized spline estimator [??](x) of f(x) for x [member of] [a,b] is defined as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (9)

where B(x) = [([B.sub.-p] (x) *** [B.sub.K-1](x)).sup.T].

If [lambda] [equivalent of] 0, [??](x) is reduced to the regression spline estimator, which is the spline estimator obtained via the least squares method. The regression spline estimator will lead to an oscillatory fit if the number of knots K is large. However the determination of K and the location of knots are very difficult problems. The advantage of the penalized spline smoothing is that the good smoothing parameter brings the estimator to the curve with the fitness and smoothness simultaneously without choosing the number and location of knots precisely. In the present paper, we use equidistant knots [[kappa].sub.k]= a + k/K and focus on the determination of the smoothing parameter. As the location of knots other than above, the quantiles of the data points {[x.sub.1], ..., [x.sub.n]} are often used (see Ruppert [27]). However it is known that the penalized spline estimator is hardly affected by the location of knots if K is not too small. Therefore we do not discuss the location of knots. We suggest the direct method for this in the next section.

3. Direct Determination of Smoothing Parameter

In this section, we provide the direct method for determining the smoothing parameter without a grid search. This direct method is given theoretical justification by asymptotic theory of the penalized spline estimator. To investigate the asymptotic property of the penalized spline estimator, we assume that f [member of] [C.sup.p+1], K = o([n.sup.1/2]), and [lambda] = O(n/[K.sup.1-m]).

For convenience we first give some notation. Let [G.sub.n] = [n.sup.-1][Z.sup.T]Z and [[LAMBDA].sub.n] = [G.sub.n] + ([lambda]/n)[D.sup.T.sub.m][D.sub.m]. Let b* be a best [L.sub.[infinity]] approximation to the true function f. This means that b* satisfies

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (10)

where

[b.sub.a](x) = - [f.sup.(p+1)](x)/(p + 1)![K.summation over (k=1)] I ([[kappa].sub.k-1] [less than or equal to] x < [[kappa].sub.k]) [B.sub.rp+1](x - [[kappa].sub.k-1]/[K.sub.-1])), (11)

I(a < x < b) is the indicator function of an interval (a, b), and [Br.sub.p](x) is the pth Bernoulli polynomial (see Zhou et al. [28]). It can be easily shown that [b.sub.a] (x) = O(1) as K [right arrow] [infinity].

The penalized spline estimator can be written as

[??] (x) = B[(x).sup.T][([Z.sup.T]Z).sup.-1] [Z.sup.T] Y

- [lambda]/n B[(x).sup.T][[LAMBDA].sup.-1.sub.n][D.sup.T.sub.m][D.sub.m][G.sup.-1..sub.n][(Z/n).sup.T]Y. (12)

The first term of the right hand side of (12) is equal to the regression spline estimator denoted by [[??].sub.rs](x). The asymptotics for the regression spline estimator has been developed by Zhou et al. [28] and can be expressed as

E[[[??].sub.rs] (x)] = f(x)+[K.sup.-(p+1)][b.sub.a](x) + o([K.sup.-(p+1)]),

V[[[??].sub.rs](x)] = [[sigma].sup.2]/n B[(x).sup.T][G.sup.-1.sub.n]B (x) {1 + o(1)} = O(K/n)). (13)

From Theorem 2(a) of Claeskens et al. [18], we have

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (14)

where

C(x | n, K, [lambda]) = 2[[sigma].sup.2]/n B[(x).sup.T][[LAMBDA].sup.-1.sub.n][D.sup.T.sub.m][D.sub.m][G.sup.- 1.sub.n]B(x) (15)

is the covariance of [[??].sub.rs](x) and the second term of the right hand side of (12). The variance of the second term of (12) can be shown to be negligible (see the appendix). The following theorem leads to [lambda] controlling the trade-off between the squared bias and variance of the penalized spline estimator.

Theorem 1. The covariance C(x | n, K, [lambda])/K in (15) is positive. Furthermore, as n [right arrow] [infinity], C(x | n, K, [lambda]) = C(x | n,K, 0)(1 + o(1)) and C(x | n,K,0)/K = O(K/n).

From the asymptotic form of E[[??](x)] and V[[??](z)] and Theorem 1, we see that, for small [lambda], the bias of [??](x) is small and the variance becomes large. On the other hand, the large [lambda] indicates the bias of [??](x) increases and the variance decreases. From Theorem 1, the MISE of [??](x) can be expressed as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (16)

where [r.sub.1n](K) and [r.sub.2n](K, [lambda]) are of negligible order, respectively, compared to the regression spline and penalized spline of the second term of the right hand side of (12). Actually we have [r.sub.1n](K) = o(K/n) and [r.sub.2n](K, [lambda]) = o([[lambda].sup.2][K.sup.2(1-m)]/[n.sup.2]). The MISE of [??](x) is asymptotically quadratic and a global minimum exists. Let

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (17)

And let [[lambda].sub.opt] be the minimizer of MISE([lambda]). We suggest the use of [[lambda].sub.opt] as the optimal smoothing parameter,

[[lambda].sub.opt] = n/2 [[integral].sup.b.sub.a][D.sub.2n](x | [f.sup.(p+1)], b*) dx/[[integral].sup.b.sub.a][D.sub.1n](x | b*) dx, (18)

where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (19)

However it is easy to see that MISE ([lambda]) and [[lambda].sub.opt] contain an unknown function and parameters, and hence these must be estimated. We construct the estimator of [[lambda].sub.opt] by using the consistent estimator of [f.sup.(p+1)] and b*. We can use the penalized spline estimator and its derivative as the pilot estimator of [f.sup.(p+1)] and b*. If we then use another smoothing parameter [[lambda].sub.0], it should be chosen appropriately. Therefore we use the regression spline estimator as the pilot estimator of [f.sup.(p+1)](x) and b*. First we establish [??] = [([Z.sup.T]Z).sup.-1][Z.sup.T]Y. Next we construct the pilot estimator with [f.sup.(p+1)](x) by using the (p+2)th degree B-spline basis. Let [B.sup.[p]](x) = [([B.sup.[p].sub.-p](x) *** [B.sup.[p].sub.K-1](x)).sup.T]. Using the fundamental property of the B-spline function, [f.sup.(m)](x) can be written as [??](m)(x) = [B.sup.[p-m]][(x).sup.T][D.sub.m]b* asymptotically. Hence the regression spline estimator [[??].sup.(p+1)] can be constructed as [[??].sup.(p+1)](x) = [B.sup.[p+1]][(x).sup.T][D.sub.p+1][b.sup.(p+2)], where [b.sup.(p+2)] = [[(([Z.sup.[p+2]]).sup.T][Z.sup.[p+2]]).sup.-1][([Z.sup.[p+2]]).sup.T]Y and [Z.sup.[p+2]] = [([B.sup.[p+2].sub.-p+j-1]([x.sub.i])).sub.ij]. Since the regression spline estimator tends to be oscillatory with a higher order pth spline function, the fewer knots are used to construct [[??].sub.(p+1)](x). The variance [[sigma].sup.2] included in C(x | n, K, 0) is estimated via

[[??].sup.2] = 1/n - (K + p) [n.summation over (i=1)] [{[Y.sub.i] - B[([x.sub.i]).sup.T][??]}.sup.2]. (20)

Using the above pilot estimators,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (21)

with some finite grid points [{[z.sub.j]}.sup.J.sub.1] on [a, b].

Consequently the final penalized spline estimator is defined as

[??](x) = B[(x).sup.T][[??].sub.opt], (22)

where

[[??].sub.opt] = [([Z.sup.T]Z + [[??].sub.opt][D.sup.T.sub.m][D.sub.m]).sup.-1][Z.sup.T]Y. (23)

It is known that the optimal order of K of the penalized spline estimator is the same as that of the regression spline estimator, K = O([n.sup.1/(2p+3)]) (see Kauermann et al. [19] and Zhou et al. [28]). Using this, we show the asymptotic property of [[lambda].sub.opt] in following theorem.

Theorem 2. Let f [member of] [C.sup.p+1]. Suppose that K = o([n.sup.1/2]) and [lambda] = o(n/[K.sup.1-m]). Then [[lambda].sub.opt] given in (21) exists, and [[lambda].sub.opt] = O(n[K.sup.m-p-2] + [K.sup.2m]) as n [right arrow] [infinity]. Furthermore K = O([n.sup.1/(2p+3)]) leads to the optimal order

[[lambda].sub.opt] = O([n.sup.(p+m+1)/(2p+3)]) + O([n.sup.2m/(2p+3)]) (24)

and the rate of convergence of MISE of [??](x) becomes O([n.sup.-2(p+1)/(2p+3)]).

The proof of Theorem 2 is given in the appendix. At the end of this section, we give a few remarks.

Remark 3. The asymptotic order of the squared bias and variance of the penalized splines are O([K.sup.-2(p+1)]) + O([[lambda].sup.2][K.sup.2(1-m)]/[n.sup.2]) and O(K/n), respectively. Therefore under K = ([n.sup.1/(2p+3)]) and [lambda] = O([n.sup.(p+m+1)/(2p+3)]), the optimal rate of convergence of MISE of the penalized splines is O([n.sup.-2(p+1)/(2p+3)]). From Theorem 2, we see that the asymptotic order of [lambda]opt yields the optimal rate of convergence.

Remark 4. O'Sullivan [1] used [mu][[integral].sup.b.sub.a][{[s.sup.(m)](x)}.sup.2]dx as the penalty term, where [mu] is the smoothing parameter. When equidistant knots are used, the penalty [[integral].sup.b.sub.a][{[s.sup.(m)](x)}.sup.2]dx can be expressed as [K.sup.2m][b.sup.T][D.sup.T.sub.m]R[D.sub.m]b, where R = [([[integral].sup.b.sub.a][B.sup.[p-m].sub.i] (x)[B.sup.[p-m].sub.j](x)).sub.ij]. The penalty [b.sup.T][D.sup.T.sub.m][D.sub.m]b provided by Eilers and Marx [2] can be seen as the simple version of [K.sup.2m][b.sup.T][D.sup.T.sub.m]R[D.sub.m]b by replacing R with [K.sup.-1]I and [lambda] = [mu][K.sup.2m-1], where I is the identity matrix. So the order m of the difference matrix [D.sub.m] controls the smoothness of the pth B-spline function s(x), and hence m should be set such that m [less than or equal to] p to give a theoretical justification although the penalized spline estimator can be also calculated for m > p. Actually, (p,m) = (3, 2) is often used by many authors.

Remark 5. The penalized spline regression is often considered as the mixed model representation (see Ruppert et al. [4]). In this frame work, we use the pth truncated spline model

s (x) = [[beta].sub.0] + [[beta].sub.1] x + *** + [[beta].sub.p][x.sup.p] + [K.summation over (j=1)][u.sub.k] [(x - [[kappa].sub.k]).sup.p.sub.+], (25)

where [(x - a).sub.+] = max{x - a,0} and the [beta]'s are unknown parameters. Each [u.sub.k] is independently distributed as [u.sub.k] ~ N(0, [[sigma].sup.2.sub.u]), where [[sigma].sup.2.sub.u] is an unknown variance parameter of [u.sub.k]. The penalized spline fit of f(x) is defined as the estimated BLUP (see Robinson [29]). The smoothing parameter in the ordinal spline regression model corresponds to [[sigma].sup.2]/[[sigma].sup.2.sub.u] in the spline mixed model. Since the [[sigma].sup.2] and [[sigma].sup.2.sub.u] are estimated using the ML or REML method, we do not need to choose the smoothing parameter via a grid search. It is known that the estimated BLUP fit is linked to the penalized spline estimator (9) with m = p + 1 (see Kauermann et al. [19]). Hence the estimated BLUP tends to have a theoretically underfit (see Remark 4).

Remark 6. From Lyapunov's theorem, the asymptotic normality of the penalized spline estimator [??](x) with [[??].sub.opt] can be derived under the same assumption as Theorem 2 and some additional mild conditions. Although the proof is omitted, it is straightforward since [[??].sub.opt] is the consistent estimator of [lambda]opt and the asymptotic order of [[lambda].sub.opt] satisfies Lyapunov's condition.

4. Extension to Additive Models

We extend the direct method to the regression model with multidimensional explanatory variables. In particular, we consider additive models in this section. For the dataset {([Y.sub.i], [x.sub.i]) : i = 1, ..., n} with 1-dimensional response [Y.sub.i] and d-dimensional explanatory [x.sub.i] = ([x.sub.i1], ..., [x.sub.id]), the additive models are connected via unknown regression functions [f.sub.j](j = 1, ..., d) and the mean parameter [mu] such that

[Y.sub.i] = [mu] + [f.sub.1]([x.sub.i1]) + *** + [f.sub.d]([x.sub.id]) + [[epsilon].sub.i]. (26)

We assume that [x.sub.ij] is located on an interval [[a.sub.j], [b.sub.j]] and [f.sub.1], ..., [f.sub.d] are normalized as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (27)

to ensure the identifiability of [f.sub.j]. Then the intercept [mu] is typically estimated via

[??] = 1/n [n.summation over (i=1)][Y.sub.i]. (28)

Hence we replace [Y.sub.i] with [Y.sub.i] - [??] in (26) and set [mu] = 0, redefining the additive model as

[Y.sub.i] = [f.sub.1] ([x.sub.i1]) + *** + [f.sub.d]([x.sub.id]) + [[epsilon].sub.i], i = 1, ..., (29)

where each [Y.sub.i] is centered. We aim to estimate [f.sub.j] via a penalized spline method. Let

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (30)

be the B-spline model, where [B.sub.j,k](x) is the pth B-spline basis function with knots [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are unknown parameters. We consider the B-spline additive models

[Y.sub.i] = [s.sub.1] ([x.sub.i1]) + *** + [s.sub.d]([x.sub.id]) + [[epsilon].sub.i], i = 1, ..., n, (31)

and estimate [s.sub.j] and [b.sub.j,k]. For j = 1, ..., d, the penalized spline estimator [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is defined as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (32)

where [[lambda].sub.j] are the smoothing parameters and [D.sub.jm] is the mth order difference matrix with size ([K.sub.j] + p - m) x ([K.sub.j] + p) for j = 1, ..., d. By using [[??].sub.j], the penalized spline estimator of [f.sub.j]([x.sub.j]) is defined as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (33)

The asymptotics for [[??].sub.j]([x.sub.j]) have been studied by Yoshida and Naito [22] who derived the asymptotic bias and variance to be

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (34)

where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and [b.sup.*.sub.j] is the best [L.sub.[infinity]] approximation of [f.sub.j],

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (35)

The above asymptotic bias and variance of [[??].sub.j]([x.sub.j]) are similar to that of the penalized spline estimator in univariate regression with {([Y.sub.i], [x.sub.ij]) : i = 1, ..., n}. Furthermore the asymptotic normality of [[[[??].sub.1]([x.sub.1]) *** [[??].sub.d]([x.sub.d])].sup.T] has been shown by Yoshida and Naito [22]. From their paper, we find that [[??].sub.i]([x.sub.i]) and [[??].sub.j]([x.sub.j]) are then asymptotically independent for i [not equal to]j. This indicates some theoretical justification to select [[lambda].sub.j] in minimizing the MISE of [[??].sub.j]. Similar to the discussion in Section 3, the minimizer [[lambda].sub.j,opt] of the MISE of [[??].sub.j]([x.sub.j]) can be obtained for j = 1, ..., d,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (36)

where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (37)

Since [f.sub.j], [b.sup.*.sub.j], and [[sigma].sup.2] are unknown, they should be estimated. The pilot estimators of [f.sub.j], [b.sup.*.sub.j], and[[sigma].sup.2] are constructed based on the regression spline method. By using the pilot estimators [[??].sub.j],[[??].sub.j] of [f.sub.j], [b.sup.*.sub.j] and the estimator of [[sigma].sup.2], we construct the estimator of [[lambda].sub.j,opt]:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (38)

where [{[z.sub.r]}.sup.R.sub.1] is some finite grid point sequence on [[a.sub.j], [b.sub.j]]. We obtain for j = 1, ..., d, the penalized spline estimator [[??].sub.j]([x.sub.j]) of [f.sub.j]([x.sub.j]),

[[??].sub.j]([x.sub.j]) = [B.sub.j][([x.sub.j]).sup.T][[??].sub.j,opt], (39)

where [[??].sub.j,opt] is the penalized spline estimator of [b.sub.j] using [[??].sub.j,opt]. From Theorem 2 and the proof of Theorem 3.4 of Yoshida and Naito [22], the asymptotic normality of [[[[??].sub.1]([x.sub.1]) *** [[??].sub.d]([x.sub.d])].sup.T] using ([[lambda].sub.1,opt,] ..., [[lambda].sub.d,opt]) can be shown.

Remark 7. Since the true regression functions are normalized, the estimator [[??].sub.j] should be also centered as

[[??].sub.j]([x.sub.j]) - 1/n [n.summation over (i=1)][[??].sub.j]([x.sub.ij]). (40)

Remark 8. The penalized spline estimator of [b.sub.1], ..., [b.sub.d] can be obtained using a backfitting algorithm(Hastie and Tibshirani [30]). The backfitting algorithm for the penalized splines in additive regression is detailed in Marx and Eilers [3] and Yoshida and Naito [21].

Remark 9. Although we focus on the nonparametric additive regression in this paper, the direct method can be also applied to the generalized additive models. However we omit this discussion because the procedure is similar to the case of the additive models discussed in this section.

Remark 10. The direct method is quite computationally efficient when compared to the grid search method in additive models. In grid searches, we prepare the candidate of [[lambda].sub.j]. Let [M.sub.j] be the set of all possible candidate grid value of [[lambda].sub.j] for j = 1, ..., d. Then we need to compute the backfitting algorithm {[M.sub.1] x *** x [M.sub.d]} times. On the other hand, it is sufficient to perform the backfitting algorithm for only two steps for the pilot estimator and the final penalized spline estimator. Thus, compared with the conventional grid search method, the direct method can drastically reduce computation time.

5. Numerical Study

In this section, we investigate the finite sample performance of the proposed direct method in a Monte Carlo simulation. Let us first consider the univariate regression model (1) for the data {([Y.sub.i], [x.sub.i]) : i = 1, ..., n}. Then we use the three types of true regression function f(x) = cos([pi](x - 0.3)), f(x) = [phi]((x - 0.8)/0.05)/[square root of 0.05] - [phi]((x - 0.2)/0.04)/[square root of 0.04], and f(x) = 2[x.sup.3] + 3 sin(2[pi][(x-0.8).sup.3])+3 exp[-[(x-0.5).sup.2]/0.1], which are labeled by F1, F2, and F3, respectively. Here [phi](x) is the density function of the normal distribution. The explanatory [x.sub.i] and error [[epsilon].sub.i] are independently generated from uniform distribution on [0, 1] and N(0, 0.52), respectively. We estimate each true regression function via the penalized spline method. We then use the linear and cubic B-spline bases with equidistant knots and the second order difference penalty. In addition we set K = 5[n.sup.2/5] equidistant knots and the smoothing parameter is determined by the direct method. The penalized spline estimator with the linear spline and cubic spline are denoted as L-Direct and C-Direct, respectively. For comparison with L-Direct, the same studies are also implemented for the penalized spline estimator with a linear spline and the smoothing parameter selected via GCV and restricted maximum likelihood method (REML) in mixed model representation, and the local polynomial estimator with normal kernel and Plug-in bandwidth (see Ruppert et al. [24]). In GCV, we set the candidate values of [lambda] as {i/10 : i = 0, 1, ..., 99}. The above three estimators are denoted by L-GCV, L-REML, and local linear, respectively. Furthermore we compare C-Direct with C-GCV and CREML, which are the penalized spline estimator with the cubic spline and the smoothing parameter determined by GCV and REML. Let

sMISE = 1/J [J.summation over (j=1)][1/R [R.summation over (r=1)][{[[??].sub.r]([z.sub.j]) - f([z.sub.j])}.sup.2]] (41)

be the sample MISE of any estimator [[??].sub.(x)] of f(x), where [[??].sub.r](z) is [[??].sub.(z)] with rth replication and [z.sub.j] = j/J. We calculate the sample MISE of the penalized spline estimator with the direct method, GCV, and REML and the local linear estimator. In this simulation, we use J = 100 and R = 1000. We have simulated n = 50 and 200.

The sMISE of all estimators for each model and n are given in Table 1. The penalized spline estimator using the direct method shows good performance in each setting. In comparison with other smoothing parameter methods, the direct method is a little better than the GCV as a whole. However for n = 200, C-GCV is better than C-Direct in F1 and F3 though its difference is very small. We see that the sMISE of C-Direct is smaller than local linear, whereas local linear behaves better than L-Direct in some case. In F2, C-Direct is the smallest of all estimators for n = 50 and 200. Although the performance totally seems to depend on situation in which data sets are generated, we believe that the proposed method is one of the efficient methods.

Next the difference between [[??].sub.opt] and [[lambda].sub.opt] is investigated empirically. Let [[??].sub.opt,r] be the [[??].sub.opt] with rth replications for r = 1, ..., 1000. Then we calculate the sample MSE of [[??].sub.opt]:

sMSE = 1/R [R.summation over (r=1)][{[[??].sub.opt,r] - [[lambda].sub.opt]}.sup.2] (42)

for F1, F2, and F3 and n = 50 and 200. To construct [[lambda].sub.opt] for F1, F2, and F3, we use true [f.sup.(4)](x) and [[sigma].sup.2] and an approximate b*. Here the approximate b* means the sample average of [??] with n = 200 and 1000 replications.

In Table 2, the sMSE of [[??].sub.opt] for each true function are described. For comparison, the sMSE of the smoothing parameter obtained via GCV and REML are calculated. The sMSE of the L-Direct and the C-Direct is small even in n = 50 for all true functions. Therefore it seems that the accuracy of [[??].sub.opt] is guaranteed. It indicates that the pilot estimator constructed via least squares method is not bad. The sMSE with the direct method are smaller than that with GCV and REML. This result is not surprising since GCV and REML are not concerned with [[lambda].sub.opt]. However together with Table 1, it seems that the sample MSE of the smoothing parameter is reflected in the sample MISE of the estimator.

The proposed direct method was derived based on the MISE. On the other hand the GCV and the REML are obtained in context of minimizing prediction squared error (PSE) and prediction error. Hence we compare the sample PSE of the penalized spline estimator with the direct method, GCV and REML, and the local linear estimator. Since the prediction error is almost the same as MISE (see Section 4 of Ruppert et al. [4]), we omit the comparison of the prediction error. Let

sPSE = 1/n [n.summation over (i=1)][1/R [R.summation over (r=1)][{[y.sub.ir] - [[??].sub.r]([x.sub.i])}.sup.2]] (43)

be the sample PSE for any estimator [[??].sup.(x)], where [y.sub.ir](r = 1, ..., R) is independently generated from [Y.sub.i] ~ N(f([x.sub.i]), (0.5)2) for all i.

In Table 3, the modified sPSE, [absolute value of sPSE - [(0.5).sup.2]], of all estimators for each model and n are described. In Remark 11, we discuss the modified sPSE. From the result, we can confirm that the direct method has good predictability. GCV can be regarded as the estimator of sPSE. Therefore in some case, sPSE with GCV is smaller than that with the direct method. It seems that the cause is the accuracy of the estimates of variance (see Remark 11). However its difference is very small.

We admit that the computational time (in second) taken to obtain [??](x) for F1, p = 3, and n = 200. The fits with the direct method, GCV and REML took 0.04, 1.22, and 0.34. Although the difference is small, the computational time of the direct method was faster than that of GCV and REML. Next we confirm the behavior of the penalized spline estimator with the direct method in the additive model. For the data {([Y.sub.i], [x.sub.i1], [x.sub.i2], [x.sub.i3]) : i = 1, ..., n}, we assume the additive model with true functions [f.sub.1]([x.sub.1]) = sin(2[pi][x.sub.1]), [f.sub.2]([x.sub.2]) = [phi](([x.sub.2] - 0.5)/0.2), and [f.sub.3]([x.sub.3]) = 0.4[phi](([x.sub.3] - 0.1)/0.2) + 0.6[phi](([x.sub.2] - 0.8)/0.2). The error is similar to the first simulation. The design points ([x.sub.i1], [x.sub.i2], [x.sub.i3]) are generated by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (44)

where [z.sub.ij](i = 1, ..., n, j = 1, 2, 3) are generated independently from the uniform distribution on [0, 1]. In this simulation, we adopt [rho] = 0 and [rho] = 0.5. We then corrected to satisfy

[[integral].sup.1.sub.0][f.sub.j](z)dz = 0, j = 1,2,3 (45)

in each [rho]. We construct the penalized spline estimator via the backfitting algorithm with the cubic spline and second order difference penalty. The number of equidistant knots is [K.sub.j] = 5[n.sup.2/5] and the smoothing parameters are determined using the direct method. The pilot estimator to construct [[??].sub.j,opt] is the regression spline estimator with fifth spline and [K.sub.j]= [n.sup.2/5]. We calculate the sample MISE of each [[??].sub.j] for n = 200 and 1000 Monte Carlo iterations. Then we calculate the sample MISE of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] In order to compare with the direct method, we also conduct the same simulation with GCV and REML. In GCV, we set the candidate values of [[lambda].sub.j] as {i/10 : i = 0, 1, ..., 9} for j = 1, 2, 3. Table 4 summarizes the sample MISE of [[??].sub.j](j = 1, 2, 3) denoted by MISEj for direct method, GCV, and REML. The penalized spline estimator with the direct method performs like that with the GCV in both the uncorrelated and correlated design cases. For [rho] = 0, the behaviors of MISE1, MISE2, and MISE3 with the direct method are similar. On the other hand, for GCV, the MISE1 is slightly larger than MISE2 and MISE3. The direct method leads to an efficient estimate of all covariates. On the whole, the direct method is better than REML. From above, we believe that the direct method is preferable in practice.

To confirm the consistency of the direct method, the sample MSE of [[??].sub.j,opt] is calculated the same manner as that given in univariate regression case. For comparison, the sample MSE of GCV and REML are obtained. Here the true [[lambda].sub.j,opt] (j = 1, 2, 3) is defined in the same way as that for univariate case. In Table 5, the results for each [f.sub.1], [f.sub.2], and [f.sub.3], and [rho] = 0 and 0.5 are shown. We see from the results that the behavior of the direct method is good. The sample MSE of the direct method is smaller than that of GCV and REML for all j. For the random design [rho] = 0.5, such tendency can be also seen.

Table 6 shows the sPSE of [??]([x.sub.i]) = [[??].sub.1]([x.sub.i1]) + [[??].sub.2]([x.sub.i2]) + [[??].sub.3]([x.sub.i3]) with each smoothing parameter selection method. We see from result that the efficiency of the direct method is guaranteed in the context of prediction accuracy.

Finally we show the computational time (in second) required to construct the penalized spline estimator with each method. The computational times with the direct method, GCV and REML are 11.87 s, 126.43 s, and 43.5 s, respectively. We see that the direct method is more efficient than other methods in context of the computation (see Remark 11). All of computations in the simulation were done using a software R and a computer with 3.40GHz CPU and 24.0 GB memory. Though this is only few examples, the direct method can be seen as one of the good methods to select the smoothing parameter.

Remark 11. We calculate [absolute value of sPSE - (0.5)2] as the criteria to confirm the prediction squared error. The ordinal PSE is defined as

PSE = 1/n [n.summation over (i=1) E[[{[Y.sub.i] - [??]([x.sub.i])}.sup.2]]

= [[sigma].sup.2] + 1/n [n.summation over (i=1)]E[[{[??]([x.sub.i]) - f([x.sub.i])}.sup.2]], (46)

where [Y.sub.i]'s are the test data. The second term of the PSE is similar to MISE. So it can be said that the sample PSE evaluates the accuracy of the variance part and MISE part. To see the detail of the difference of the sample PSE between the direct method and other method, we calculated [absolute value of sPSE - [[sigma].sup.2]] in this section.

6. Discussion

In this paper, we proposed a new direct method for determining the smoothing parameter for a penalized spline estimator in a regression problem. The direct method is based on minimizing MISE of the penalized spline estimator. We studied the asymptotic property of the direct method. The asymptotic normality of the penalized spline estimator using [[??].sub.opt] is theoretically guaranteed when the consistent estimator is used as the pilot estimator to obtain [[??].sub.opt]. In numerical study, for the additive model, the computational cost of this direct method is dramatically reduced when compared to grid search methods such as GCV. Furthermore we find that the performance of the direct method is better than or at least similar to that of other methods.

The direct method will be developed for other regression models such as varying-coefficient models, Cox proportional hazards models, single-index models, and others if the asymptotic bias and variance of the penalized spline estimator are derived. It is not limited to the mean regression; it can be applied to quantile regression. Actually, Yoshida [31] has presented the asymptotic bias and variance of the penalized spline estimator in univariate quantile regressions. Furthermore, it is seen that improving the direct method is important for various situations and datasets. In particular, the development of the determination of locally adaptive [lambda] is an interesting avenue of further research.

http://dx.doi.org/10.1155/2014/203469

Appendix

We describe the technical details. For the matrix A = [([a.sub.ij]).sub.ij], [parallel]A[[parallel].sub.[infinity]] = [max.sub.ij]{[absolute value of [a.sub.ij]]}. First, to prove Theorems 1 and 2, we introduce the fundamental property of penalized splines in the following lemma.

Lemma A.1. Let A = [([a.sub.ij]).sub.ij] be (K + p) matrix. Suppose that K = o([n.sup.1/2]), [lambda] = o(n[K.sup.1-m]), and [[parallel]A[parallel].sub.[infinity]] = O(1). Then, [[parallel]A[G.sup.-1.sub.n][parallel].sub.[infinity]] = O(K) and [[parallel]A[[LAMBDA].sup.-1.sub.n][parallel].sub.[infinity]] = O(K).

The proof of Lemma A.1 is shown in Zhou et al. [28] and Claeskens et al. [18]. The repeated use of Lemma A.1 yields that the asymptotic order of the variance of the second term of (12) is O([[lambda].sup.2] [(K/n).sup.3]). Actually, the asymptotic order of the variance of the second term of (12) can be calculated as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (A.1)

When K = o([n.sup.1/2]) and [lambda] = O(n/[K.sup.1-m]), [[lambda].sup.2][(K/n).sup.3] = o([lambda](K/n)2) holds.

Proof of Theorem 1. We write

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (A.2)

Since

[[LAMBDA].sup.-1.sub.n] - [G.sup.-1.sub.n] = [[LAMBDA].sup.-1.sub.n]{I - [[LAMBDA].sub.n] [G.sup.-1.sub.n]}

= [lambda]/n [[LAMBDA].sup.-1.sub.n][D.sup.T.sub.m][D.sub.m][G.sup.-1.sub.n], (A.3)

we have [[parallel][[LAMBDA].sup.-1.sub.n] - [G.sup.-1.sub.n][parallel].sub.[infinity]] = O([lambda][K.sup.2]/n) and hence the second term of right hand side of (A.2) can be shown O((K/n)2([lambda]K/n)). From Lemma A.1, it is easy to derive that C(x | n, K, 0) = O(K(K/n)). Consequently we have

C (x | n,K,[lambda])/K = C (x | n,K,0)/K + O([(K/n).sup.2]([lambda]/n))

= C (x | n,K,0)/K + o(K/n) (A.4)

and this leads to Theorem 1.

Proof of Theorem 2. It is sufficient to prove that [[integral].sup.b.sub.a][D.sub.1n](x | b*)dx = O([K.sup.2(1-m)]) and

[[integral].sup.b.sub.a][D.sub.2n](x | [f.sup.(p+1)], b*)dx = O([K.sup.-(p+m)]) + O([K.sup.2]/n). (A.5)

Since

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (A.6)

we have

[[integral].sup.b.sub.a][D.sub.1n](x | b*) dx = O([K.sup.2(1-m)]) (A.7)

by using Lemma A.1 and the fact that [[parallel][D.sub.m]b*[parallel].sub.[infinity]] = O([K.sup.-m])(see Yoshida [31]).

From the property of B-spline function, for (K+p) square matrix A, there exists C > 0 such that

[[integral].sup.b.sub.a]B[(x).sup.T]AB (x) dx [less than or equal to] [[parallel]A[parallel].sub.[infinity]] [[integral].sup.b.sub.a] B[(x).sup.T]B(x)dx

[less than or equal to] [parallel]A[[parallel].sub.[infinity]]C. (A.8)

In other words, the asymptotic order of [[integral].sup.b.sub.a] B[(x).sup.T]AB(x)dx and that of [[parallel]A[parallel].sub.[infinity]] are the same. Let [A.sub.n] = [G.sup.-1.sub.n] [D.sup.T.sub.m][D.sub.m][G.sup.-1.sub.n].

Then, since [[parallel][A.sub.n][parallel].sub.[infinity]] = [K.sup.2], we obtain

[[integral].sup.b.sub.a] C (x | n,K,0)dx = O([K.sup.2]/n). (A.9)

Furthermore because [sup.sub.a[member of][a.b]]{[b.sub.a](x)} = O(1), we have

[K.sup.-(p+1)][absolute value of [[integral].sup.b.sub.a][b.sup.a](x) B(x)T[G.sup.- 1.sub.n][D.sup.T.sub.m][D.sub.m]b*dx] = O([K.sup.-(p+m])), (A.10)

and hence it can be shown that

[[integral].sup.b.sub.a][D.sub.2n](x | [f.sup.(p+1)], b*)[d.sub.x] = O([K.sup.-(p+m)]) + O([n.sup.-1][K.sup.2]). (A.11)

Together with (A.7) and (A.11), [[lambda].sub.opt] = O(n[K.sup.m-p-2] + [K.sup.2m]) can be obtained. Then rate of convergence of the penalized spline estimator with [[lambda].sub.opt] is O([n.sup.-2(p+2)]/(2p + 3)) when K = O([n.sup.1/(2p+3)]), which is detailed in Yoshida and Naito [22].

Conflict of Interests The author declares that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

The author also would like to thank two anonymous referees for their careful reading and comments to improve the paper.

References

[1] F. O'Sullivan, "A statistical perspective on ill-posed inverse problems," Statistical Science, vol. 1, no. 4, pp. 502-527, 1986.

[2] P. H. C. Eilers and B. D. Marx, "Flexible smoothing with B-splines and penalties," Statistical Science, vol. 11, no. 2, pp. 89-121, 1996, With comments and a rejoinder by the authors.

[3] B. D. Marx and P. H. C. Eilers, "Direct generalized additive modeling with penalized likelihood," Computational Statistics and Data Analysis, vol. 28, no. 2, pp. 193-209, 1998.

[4] D. Ruppert, M. P. Wand, and R. J. Carroll, Semiparametric Regression, Cambridge University Press, Cambridge, UK, 2003.

[5] T. Krivobokova, "Smoothing parameter selection in two frameworks for penalized splines," Journal of the Royal Statistical Society B, vol. 75, no. 4, pp. 725-741, 2013.

[6] P. T. Reiss and R. T. Ogden, "Smoothing parameter selection for a class of semiparametric linear models," Journal of the Royal Statistical Society B, vol. 71, no. 2, pp. 505-523, 2009.

[7] S. N. Wood, "Modelling and smoothing parameter estimation with multiple quadratic penalties," Journal of the Royal Statistical Society B, vol. 62, no. 2, pp. 413-428, 2000.

[8] S. N. Wood, "Stable and efficient multiple smoothing parameter estimation for generalized additive models," Journal of the American Statistical Association, vol. 99, no. 467, pp. 673-686, 2004.

[9] S. N. Wood, "Fast stable direct fitting and smoothness selection for generalized additive models," Journal of the Royal Statistical Society B, vol. 70, no. 3, pp. 495-518, 2008.

[10] X. Lin and D. Zhang, "Inference in generalized additive mixed models by using smoothing splines," Journal of the Royal Statistical Society B, vol. 61, no. 2, pp. 381-400, 1999.

[11] M. P. Wand, "Smoothing and mixed models," Computational Statistics, vol. 18, no. 2, pp. 223-249, 2003.

[12] L. Fahrmeir, T. Kneib, and S. Lang, "Penalized structured additive regression for space-time data: a Bayesian perspective," Statistica Sinica, vol. 14, no. 3, pp. 731-761, 2004.

[13] L. Fahrmeir and T. Kneib, "Propriety of posteriors in structured additive regression models: theory and empirical evidence," Journal of Statistical Planning and Inference, vol. 139, no. 3, pp. 843-859, 2009.

[14] F. Heinzl, L. Fahrmeir, and T. Kneib, "Additive mixed models with Dirichlet process mixture and P-spline priors," Advances in Statistical Analysis, vol. 96, no. 1, pp. 47-68, 2012.

[15] G. Kauermann, "A note on smoothing parameter selection for penalized spline smoothing," Journal of Statistical Planning and Inference, vol. 127, no. 1-2, pp. 53-69, 2005.

[16] P. Hall and J. D. Opsomer, "Theory for penalised spline regression," Biometrika, vol. 92, no. 1, pp. 105-118, 2005.

[17] Y. Li and D. Ruppert, "On the asymptotics of penalized splines," Biometrika, vol. 95, no. 2, pp. 415-436, 2008.

[18] G. Claeskens, T. Krivobokova, and J. D. Opsomer, "Asymptotic properties of penalized spline estimators," Biometrika, vol. 96, no. 3, pp. 529-544, 2009.

[19] G. Kauermann, T. Krivobokova, and L. Fahrmeir, "Some asymptotic results on generalized penalized spline smoothing," Journal of the Royal Statistical Society B, vol. 71, no. 2, pp. 487-503, 2009.

[20] X. Wang, J. Shen, and D. Ruppert, "On the asymptotics of penalized spline smoothing," Electronic Journal of Statistics, vol. 5, pp. 1-17, 2011.

[21] T. Yoshida and K. Naito, "Asymptotics for penalized additive B-spline regression," Journal of the Japan Statistical Society, vol. 42, no. 1, pp. 81-107, 2012.

[22] T. Yoshida and K. Naito, "Asymptotics for penalized splines in generalized additive models," Journal of Nonparametric Statistics, vol. 26, no. 2, pp. 269-289, 2014.

[23] L. Xiao, Y. Li, and D. Ruppert, "Fast bivariate P-splines: the sandwich smoother," Journal of the Royal Statistical Society B, vol. 75, no. 3, pp. 577-599, 2013.

[24] D. Ruppert, S. J. Sheather, and M. P. Wand, "An effective bandwidth selector for local least squares regression," Journal of the American Statistical Association, vol. 90, no. 432, pp. 1257-1270, 1995.

[25] M. P. Wand and M. C. Jones, Kernel Smoothing, Chapman & Hall, London, UK, 1995.

[26] C. de Boor, A Practical Guide to Splines, Springer, New York, NY, USA, 2001.

[27] D. Ruppert, "Selecting the number of knots for penalized splines," Journal of Computational and Graphical Statistics, vol. 11, no. 4, pp. 735-757, 2002.

[28] S. Zhou, X. Shen, and D. A. Wolfe, "Local asymptotics for regression splines and confidence regions," The Annals of Statistics, vol. 26, no. 5, pp. 1760-1782, 1998.

[29] G. K. Robinson, "That BLUP is a good thing: the estimation of random effects," Statistical Science, vol. 6, no. 1, pp. 15-32, 1991.

[30] T. J. Hastie and R. J. Tibshirani, Generalized Additive Models, Chapman & Hall, London, UKI, 1990.

[31] T. Yoshida, "Asymptotics for penalized spline estimators in quantile regression," Communications in Statistics-Theory and Methods, 2013.

Takuma Yoshida

Graduate School of Science and Engineering, Kagoshima University, Kagoshima 890-8580, Japan

Correspondence should be addressed to Takuma Yoshida; yoshida@sci.kagoshima-u.ac.jp

Received 7 January 2014; Revised 31 March 2014; Accepted 31 March 2014; Published 22 April 2014

Academic Editor: Dejian Lai

TABLE 1: Results of sample MISE for n = 50 and n = 200. All entries for MISE are [10.sup.2] times their actual values. True F1 F2 Sample size n = 50 n = 200 n = 50 n = 200 L-Direct 0.526 0.322 3.684 1.070 L-GCV 0.726 0.621 5.811 1.082 L-REML 2.270 1.544 9.966 3.520 Local linear 0.751 0.220 4.274 0.901 C-Direct 0.401 0.148 3.027 0.656 C-GCV 0.514 0.137 3.526 0.666 C-REML 1.326 0.732 8.246 4.213 True F3 Sample size n = 50 n = 200 L-Direct 1.160 0.377 L-GCV 1.183 0.379 L-REML 1.283 0.873 Local linear 1.689 0.503 C-Direct 1.044 0.326 C-GCV 1.043 0.290 C-REML 3.241 0.835 TABLE 2: Results of sample MSE of smoothing parameter obtained by direct method, GCV, and REML for n = 50 and n = 200. All entries for MSE are [10.sup.2] times their actual values. True F1 F2 Sample size n = 50 n = 200 n = 50 n = 200 L-Direct 0.331 0.193 0.113 0.043 L-GCV 0.842 0.342 0.445 0.101 L-REML 1.070 1.231 0.842 0.437 C-Direct 0.262 0.014 0.082 0.045 C-GCV 0.452 0.123 0.252 0.092 C-REML 0.855 0.224 0.426 0.152 True F3 Sample size n = 50 n = 200 L-Direct 0.025 0.037 L-GCV 0.072 0.043 L-REML 0.143 0.268 C-Direct 0.053 0.014 C-GCV 0.085 0.135 C-REML 0.093 0.463 TABLE 3: Results of sample PSE for n = 50 and n = 200. All entries for PSE are [10.sup.2] times their actual values. True F1 F2 Sample size n = 50 n = 200 n = 50 n = 200 L-Direct 0.424 0.052 0.341 0.070 L-GCV 0.331 0.084 0.381 0.089 L-REML 0.642 0.124 0.831 1.002 Local linear 0.751 0.220 0.474 0.901 C-Direct 0.341 0.48 0.237 0.63 C-GCV 0.314 0.43 0.326 0.76 C-REML 0.863 1.672 0.456 1.21 F3 True n = 50 n = 200 Sample size 0.360 0.117 L-Direct 0.333 0.139 L-GCV 0.733 0.173 L-REML 0.489 0.143 Local linear 0.424 0.086 C-Direct 0.533 0.089 C-GCV 1.043 0.125 C-REML TABLE 4: Results of sample MISE of the penalized spline estimator. For each direct method, GCV and p, MISE1, MISE2, and MISE3 are the sample MISE of [[??].sub.1], [[??].sub.2], and [[??].sub.3], respectively. All entries for MISE are [10.sup.2] times their actual values. [rho] = 0 [rho] = 0.5 Method MISE1 MISE2 MISE3 MISE1 MISE2 MISE3 Direct 0.281 0.209 0.205 0.916 0.795 0.972 GCV 0.687 0.237 0.390 0.929 0.857 1.112 REML 1.204 0.825 0.621 2.104 1.832 2.263 TABLE 5: Results of sample MSE of the selected smoothing parameter via direct method, GCV, and REML for [rho] = 0 and [rho] = 0.5. All entries for MSE are [10.sup.2] times their actual values. [rho] = 0 True [f.sub.1] [f.sub.2] [f.sub.3] Direct 0.124 0.042 0.082 GCV 1.315 0.485 0.213 REML 2.531 1.237 2.656 [rho] = 0.5 True [f.sub.1] [f.sub.2] [f.sub.3] Direct 0.312 0.105 0.428 GCV 0.547 0.352 0.741 REML 1.043 0.846 1.588 TABLE 6: Results of sample PSE of the penalized spline estimator. All entries for PSE are [10.sup.2] times their actual values. n = 200 [rho] = O [rho] = O.5 Direct 0.281 0.429 GCV 0.387 0.467 REML 1.204 0.925

Printer friendly Cite/link Email Feedback | |

Title Annotation: | Research Article |
---|---|

Author: | Yoshida, Takuma |

Publication: | Journal of Probability and Statistics |

Article Type: | Report |

Geographic Code: | 1USA |

Date: | Jan 1, 2014 |

Words: | 9246 |

Previous Article: | Estimation of parameters of generalized inverted exponential distribution for progressive type-II censored sample with binomial removals. |

Next Article: | Improved inference for moving average disturbances in nonlinear regression models. |

Topics: |