Printer Friendly

Method for Solving LASSO Problem Based on Multidimensional Weight.

1. Introduction

Data mining has shown its charm in the era of big data; it has gained much attention in academia regarding how to mine useful information from mass data by mathematical statistics model [1,2]. In linear model, model error usually is a result of the lack of key variable. At the beginning of the modeling, generally, the more variables (attribute set) chosen are, the less the model error is. But, in the process of modeling, we need to find the attribute set which has the largest explanatory ability to the response, that is, improving the prediction precision and accuracy of the model through selecting variable [3]. Linear regression analysis is the most widely used of all statistical techniques [4]; the accuracy of that analysis mainly depends on the selection of variables and values of regression coefficients [5]. LASSO is an estimate method which can simplify the index set. In 1996, inspired by the ridge regression (Frank and Friedman, 1993) [6] and Nonnegative Garrote (Breiman, 1995) [7], Tibshirani proposed one new method of variable selection. The idea of this method is minimizing the square of the residuals with the constraint that the sum of the absolute values of regression coefficient p is less than a constant by construct a penalty function to shrinkage coefficient [8]. As a kind of compression estimates, the LASSO method has higher detection accuracy and better parameter convergence consistency. Efron et al. (2002) proposed the LARS algorithm to support the solution of LASSO [9]. And they proposed improved-LARS algorithm (2004) to eliminate the opposite sign of regression coefficient p and solve LASSO better [10]. The improved-LARS algorithm regresses stepwise; each path keeps the correlation between current residual individual and all the variables the same. It also satisfies the solution of LASSO with the same current approach direction and ensures the optimal results and algorithm complexity.

Zou (2006) introduced the adaptive-LASSO by using the different tuning parameters for different regression coefficients. He suggests minimizing the following objective function [11]:

[n.summation over (j=1)] [([y.sub.j] - [x.sup.T.sub.j][beta]).sup.2] + n[p.summation over (i=1)][[lambda].sub.i][absolute value of [[beta].sub.i]] (1)

Keerthi and Shevade (2007) proposed a fast tracking algorithm for LASSO/LARS [12]; it approximates the logistic regression loss by a piecewise quadratic function.

Charbonnier et al. (2010) suggest that owns an internal structure that describes classes of connectivity between the variables [13]. They present the weighted-LASSO method to infer the parameters of a first-order vector autoregressive model that describes time course expression data generated by directed gene-to-gene regulation networks.

Since the LASSO method minimizes the sum of squared residual errors, even though the least absolute deviation (LAD) estimator is an alternative to the OLS estimate, Jung (2011) proposed a robust-LASSO-estimator that is not sensitive to outliers, heavy-tailed errors, or leverage points [14].

Bergersen et al. (2011) found that a large value of Wj, the regression coefficient for variable j, is subject to a larger penalty and therefore is less likely to be included in the model, and vice versa [15]. They proposed to use weighted-LASSO with integrated relevant external information on the covariates to guide the selection towards more stable results.

Arslan (2012) found that, compared with the LAD-LASSO method, the weighted LAD-LASSO (WLAD-LASSO) method will resist the heavy-tailed errors and outliers in explanatory variables [16].

LASSO problem is a convex minimization problem; the forward-backward splitting operator method is important to solving it. Salzo and Villa (2012) proposed accelerated version to improve the method's convergence ability [17].

Zhou et al. (2013) proposed an alternative selection procedure based on the kernelized LARS-LASSO method [18]. By formulating the RBF neural network as a linear-in-the-parameters model, they derived a [I.sub.1]-constrained objective function for training the network.

Zhao et al. (2015) added two tuning parameters A and j0 to the wavelet-based weighted-LASSO methods. The tuning parameter A controls the model sparsity. The choice of j0 controls the optimal level of wavelet decomposition for the functional data. They improved wavelet-based LASSO by adding a prescreening step prior to the model fitting or, alternatively, by using a weighted version of wavelet-based LASSO [19].

Salama et al. (2016) proposed a new LASSO algorithm, the minimum variance distortionless response (MVDR) LARSLASSO [20], which solves the DOA problem in the CS framework.

In light of superior performance achieved in [10] for solving LASSO problem, a new idea is extended in this paper into the uses of multidimensional weight LARS. Our main contributions are as follows:

(i) In the solving process of LASSO, each attribute in the evaluation population has different relative importance to the overall evaluation. The relative importance include the following: not all attributes influence the regression results and each individual in the regression model has different weight. When improved-LARS algorithm calculated the equiangular vector, we distinguish the effect resulting from different attribute variable, considering joint correlation between regression variables and surplus variable.

(ii) We discuss the method proposed in this paper by the experimental evidence of the Pima Indians Diabetes Data and two sets of evaluation index.

In Section 2, we introduce the LASSO problem and improved-LARS algorithm briefly, including theory and definition. In Section 3 we put forward the LARS algorithm based on multidimensional weighting model, which calculates the direction and variables based on the weighting variables and accelerates the approximation process in promising direction. We introduce the data sets and evaluation indicators when we verify algorithm and discuss the experimental results in Section 4. Section 5 is the summary and prospect of this paper.

2. LASSO Problem and

Improved-LARS Algorithm

2.1. The Definition of LASSO. Suppose that there are the multidimensional variables [X.sub.j] [member of] Rn, j = 1, 2, ..., m, and response Y [member of] Rn. Each group of [X.sub.j] has a corresponding [y.sub.i]. Regression coefficient [beta] is estimated where [[parallel][beta][parallel].sub.l1] [less than or equal to] t when the sum of squared residuals is minimal. The LASSO linear regression model is defined by

y = X[beta] + e. (2)

[beta]is j-dimensional column vector, the parameter to be estimated. Error vector e meets E(e) = 0 and Var(e) = [[sigma].sup.2]. Suppose sparse model E(y | x) = [[beta].sub.1][x.sub.1] + [[beta].sub.2][x.sub.2] + ... + [[beta].sub.j][x.sub.j]; most of regression coefficients are [beta] are 0 in [[beta].sub.1], [[beta].sub.2], ..., [[beta].sub.j]. Based on obtaining data, variable selection can identify which coefficient is zero and estimate other nonzero parameters; it is looking for parameters to build a sparse model. The problem we need to solve in matrix is defined by

[mathematical expression not reproducible], (3)

where t is the threshold value of the sum of regression coefficient and [l.sub.1] and [l.sub.2] are two types of regularization norms.

2.2. The Improved-LARS Algorithm. The improved-LARS algorithm can solve LASSO problem well, which is based on the Forward Selection algorithm and Forward Gradient algorithm. The improved-LRAS has appropriate forward distance, lower complexity, and more relevance of information. Figure 1 shows the basic steps of algorithm.

(i) The improved-LARS calculates the correlation between [X.sub.j] and Y constantly and finds the individual [x.sub.K] most correlated with the response. It takes the largest step possible in the direction of this individual, using xK to approximate [y.sub.i].

(ii) Until some other individual, say [x.sub.P], has the same correlation with the current residual individual, [mathematical expression not reproducible]. Improved-LARS process is in an equiangular direction [x.sub.U] ([x.sub.U] is the direction between the two predictors [x.sub.P] and [x.sub.K]).

(iii) When a third individual [x.sub.T] earns its way into the "most correlated" set, improved-LARS then proceeds equiangularly between [x.sub.K], [x.sub.P], and [x.sub.T], that is, along the "least angle direction," until a fourth individual enters, and so forth the direction equiangular means the bisector of each vector in high dimension.

(iv) The LARS procedure works until the residual error is less than a threshold or all the variables are involved in the approach, the algorithm stop.

In Figure 1, an example of two-dimensional problem, it starts with both coefficients equal to zero, firstly, finding the individual more correlated with the response [mathematical expression not reproducible], approximating y along the [x.sub.1] direction until residual of [[beta].sub.1][x.sub.1] and y has the same correlation with [x.sub.1] and [x.sub.2]. Then the approximating direction changes to the equiangular between [x.sub.1] and [x.sub.2].

3. LARS Based on Multidimensional Weight

3.1. Algorithm Analysis. In the process of improved-LARS stepwise regression, the angle regression takes all selected variables with the same importance. However, each individual of [X.sub.j] has different weight in the regression model that the indicators in its overall evaluation have different relative importance. We take the correlation between individual [X.sub.j] and surplus variables into consideration, taking it and the correlation between [X.sub.j] and [y.sub.i] as the condition to select approximation individual.

In Figure 3, originally, we choose [x.sub.1] as the first approximating variable for rXiy > rxyy. We take the predictor [X.sub.j]'s contribution rate to the whole system as one approximating condition; the new correlation is

[mathematical expression not reproducible], (4)

where [mathematical expression not reproducible] is the contribution rate; we will describe the calculation method of it in detail later; u are v are constants.

Because of the addition condition, it will inevitably increase the range of values about judgment condition. In order to keep the stability of the system, we limit the product in [v, u].

After transformation, it may indicate the possibility that [X.sub.j] could have been chosen to be increasing. For example, Figure 2 shows that the transformed [x.sub.2] gets [[theta].sub.2] closer to y, reaching [x'.sub.2], and the transformed [x.sub.1] gets [[theta].sub.1] closer to y, reaching [x'.sub.1]. It also may indicate the possibility that Xj could be selected to be reducing; the transformed [x.sub.1] gets [[theta].sub.2] far away to y, reaching [x".sub.1].

On the basis of Figure 1, the new correlation will significantly change the approximation process. Figure 3 shows the predictor direction when adding the multidimensional weight: the possibility is selected of two predictors' change. It starts with [x'.sub.2], the correlation of [x'.sub.1] is the same as that of residual e = y - [[beta]'.sub.1][x'.sub.2]; wh0en moving to [[beta]'.sub.1][x'.sub.2], the approximation direction changes to the equiangular direction of [x'.sub.1] and [x'.sub.2], then it moves forward [[beta]'.sub.2]([x'.sub.1] + [x'.sub.2]); the approximation process is complete. Approximating path changes, and the regression variables [[beta].sub.1] and [[beta].sub.2] also change; we can get the new [beta] calculated by improved regression method.

Applying the aforementioned process to multidimensional high-order system, the collected m feature indicators and n objects are expressed as

[mathematical expression not reproducible] (5)


X =[[X.sub.1] ... [X.sub.j] ... [X.sub.m], j = 1, 2, ..., m. (6)

The collected result response is

[mathematical expression not reproducible]. (7)

There are many calculation methods of [mathematical expression not reproducible] in regression process. Without loss of generality, the calculation methods should be unartificial, relative number, quantitative, and independent. We use part_PCA, Independence Weight, and CRITIC to control regression process and verify superiority of the algorithm.

3.1.1. Part of Principle Components Analysis. PCA uses orthogonal transformation for dimension reduction in statistics [21, 22]. It transforms data into a new coordinate system. The biggest variance of the data projection is in the first coordinate, called the 1st principal component. The second of the data projection is in the second coordinate, called the 2nd principal component, and so on. It keeps the low-order principal component of data set, ignores the high-order principal component after transform, identifies the dominating factors, and keeps top m principal component whose overall information utilization rate is higher than 85%. Inspired by the principle of PCA, we preserve the value of all the components simultaneously and identify the variance contribution of each component; the part_PCA algorithm steps are as follows.

Sample Standardization

[Z.sub.i]j = [x.sub.i]j - [[bar.x].sub.j]/[s.sub.j], i=1,2,...,N; j = 1,2, ..., m, (8)


[[bar.x].sub.j] = [[summation].sup.n.sub.i=1]([x.sub.ij] /n,

[s.sup.2.sub.j] [[summation].sup.n.sub.i=1][([x.sub.ij] - [[bar.x].sub.j]).sup.2]/n - 1. (9)

The correlation coefficient matrix R is information utilization of each feature

R = [[[r.sub.ij]].sub.m] xp = [Z.sup.T]Z/n - 1, (10)


[r.sub.ij] = [summation][z.sub.kj] x [z.sub.kj]/n - 1, i, j = 1,2, ...., m. (11)

3.1.2. Independence Weight. We sort the multiple correlation coefficient of individual by multiple regression in statistical methods [23, 24]; the greater the multiple correlation coefficient is, the more repeated the information should be, the smaller the information utilization should be, and the smaller the weight should be. Calculation steps are

[mathematical expression not reproducible]. (12)

[??] is the rest matrix in X except [X.sub.j]

[bar.X] = mean (X). (13)

For R is negatively related with weight, we take the reciprocal of multiple correlation coefficient as score, getting the weighting coefficient through normalized processing

R = 1/[R'.sub.1], ... 1/[R'.sub.j], ... 1/[R'.sub.m], = j = 1, 2, ..., m. (14)

3.1.3. CRITIC. Based on Independence Weight, CRITIC is a kind of objective weighting method proposed by Diakoulaki et al. [25]. It is based on identifying the objective weight of individual to evaluate the contrast and conflict between indicators [26, 27]. The standard deviation indicates the gap of each scheme in the same index, which can express contrast intensity. The conflict between the indicators is based on the correlation between indicators; relevancy can express conflict.

The quantitative conflict between the j indicator and other indicators is

[m.summation over (t=1)](1 - [r.sub.t]j). (15)

[r.sub.t]j is the correlation coefficient between [X.sub.t] and [X.sub.j]

[] = [summation] ([X.sub.t] - [bar.X]) ([X.sub.j] - [bar.X])/ [square root of [summation] [([X.sub.t] - [bar.X]).sup.2] [summation] [([X.sub.j] - [bar.X]).sup.2] (16)

[C.sub.j] is the amount of information the j indicator includes

[C.sub.j] = [sigma]j[m.summation over (t=1)](1 - [rt.sub.j]), j=1, 2, ..., m. (17)

The greater [C.sub.j], the greater amount of information and the more relative important of that indicator. The objective weight of j indicator is

[R.sub.j] = [C.sub.j)/[[summation].sup.m.sub.t=1] [C.sub.j], j = 1, 2 ...., m. (18)

3.2. Algorithm Steps. In order to obtain the numerical solution of stability, X and y in (3) are standardized and preprocessed to omit [alpha], so that [[summation].sub.j][x.sub.ij]/N = 0, [[absolute value of X].sub.l1] = 1, [bar.Y] = 0.

For A = {[s.sub.j1]=[x.sub.jl], ..., [s.sub.j1]=[x.sub.jl], ..., [s.sub.jk][x.sub.jk]} [subset of equal to] {1, 2, ..., m},

define the matrix

[X.sub.A] [[s.sub.j1]=[x.sub.jl], ..., [s.sub.j1]=[x.sub.jl], ..., [s.sub.jk][x.sub.jk]] [member of] [R.sup.nxk]. (19)

[X.sub.A] is the column vector selected from X to satisfy A, with the same direction to current Y, where the signs equal to

[mathematical expression not reproducible], (20)

The equiangular [u.sub.A] is as the following:

[mathematical expression not reproducible]. (21)

[u.sub.A] is the unit vector making equal angles, less than 90[degrees], with the columns of [X.sub.A]. [1.sub.A] is a vector of 1's of length equaling [absolute value of A], the size of A. [w.sub.A] is the equiangular contribution of each attribute [X.sub.i] which is selected in [X.sub.A]. [w.sub.A] is processed by weighting to change approach direction and approach variable selection.

We now can further describe the improved-LARS based on multidimensional algorithm; we begin at [u.sub.0] = 0 and build up new u by steps. Suppose that [[??].sub.A] is the current computation and c is the current correlation between predictor and response vector

C = [X.sup.T] {y - [[??].sub.A]) or [c.sub.j] = <[x.sub.j], y - [[??].sub.A]). (22)

The active set A is the set of indices corresponding to covariates with the greatest absolute current correlations, when = [max.sub.j]{[absolute value of [c.sub.j]]}; [u.sub.A] corresponding to A is the approximate direction. Let

[mathematical expression not reproducible]. (23)

The length of approach along the [u.sub.A] direction now is

[mathematical expression not reproducible]. (24)

"[min.sup.+]" indicates the calculation of the minimum of positive components within each choice of j in this approximating process. Each predictor in A corresponds to increase in [gamma][[omega].sub.A]; we add weight to control approach direction.

Part_PCA, Independent Weight evaluation, and CRITIC are added in the process of improved-LARS algorithm, respectively. Three parallel algorithms are established to calculate the weight of each attribute.

[R.sub.j(part_PCA)] = part_PCA (X (1: row (w),:)), (25)

[R.sub.j(IW)] = IW (X(1: row (w),:)), (26)

[R.sub.j(CRITIC)] = CRITIC (X(1: row (w),:)). (27)

The centralized weight [R.sub.j] indicates the impact of weight on the approach direction after the aforementioned three methods.

[R.sub.j] = centralization ([R.sub.j(part_PCA/IW/CRITIC)]), W' = w x [R.sub.j], [[beta].sub.A] = [[beta].sub.A] + [gamma][omega]'. (28)

[R.sub.j] is the same-dimension weight matrix from [R.sub.j](*) the approach direction estimate is as the following:

[mathematical expression not reproducible]. (29)

Then the new active set is

[A.sub.+] = A [union] {j'}. (30)

j' is the minimizing index in (25) of j. In order to conform to the requirements of the LASSO solution that the track should keep the same direction with current approach direction, the step size of the first opposite sign is

[mathematical expression not reproducible]. (31)

When [??] < [gamma], there is opposite sign, removing j' from A. Then the algorithm enters the next approximation process. When using [A.sub.+] instead of A, repeat the above steps, until the residual error is less than a threshold or all the variables are involved in the approach. The pseudocode is shown in Pseudocode 1.

This improved algorithm increases the calculation steps for adding the weighting analysis, so the calculation time increases. But the approach mechanism of each variable in the statistical model stays the same, so the space complexity is consistent with the original algorithm.

4. Experiment and Result Analysis

4.1. Introducing Data Set. For the characteristics of the compression of the regression coefficient, the experiment set should be sparse, as well as one dependent variable which is easy to distinguish. We take Pima Indians Diabetes Data Set provided by Applied Physics Laboratory of the Johns Hopkins University, for example [28]. The data record 768 performance descriptions, negative and positive diabetes sample, including 8 attributes variables and one classification value.
Pseudocode 1

Input. Response y dependent on predictor X, error tolerance [epsilon]
Output. LASSO solution [beta] for (3)

(1) begin
(2) data preprocessing: Normalization (X, y)
(3) initialization: u = 0, [??] = y - u
(4) c = [X.sup.T][??]
(5) C = [max.sub.j]{[absolute value of [c.sub.j]]}
(6) [??] = arg [max.sub.j]{[absolute value of [c.sub.j]]}, A = {[??]}
(7) calculating weight by [R.sub.weight] = part_PCA(X) or
    [R.sub.weight] = IW(X) or [R.sub.weight] = CRITIC(X)
(8) centralization [R.sub.w]eight;
(9) when [parallel][??][[parallel].sub.l2] < [epsilon] and [absolute value of A] [less
    than or equal to] m do [beta] cycle;
(10) in LARS cycle
             rate = [R.sub.weight](1 : row(w),:)
             w' = w * rate

(11) end cycle;
(12) return ft.

In the classification value, "1" represents that the diabetes test result is positive; "0" represents negative. We verify algorithm by predictor of 8 attributes variables and response of one classification value. The goal of this test is improving on veracity comparing to original LARS algorithm.

4.2. Verification Condition. We use ROC curve to show results in order to evaluate more intuitively the performance of the proposed method. That is a binary classification problem whether the participants' diabetes is positive or negative; the testing results have the following four types:

TP (true positive), the testing results are positive and are positive actually

FP (false positive), the testing results are positive but are negative actually

TN (true negative), the testing results are negative and are negative actually

FN (false negative), the testing results are negative but are positive actually

We take the following three characters as the inspection standards through basic four types of statistics by ROC space:

ACC (accuracy), ACC = (TP + TN)/(P + N)

TPR (true positive rate), TPR = TP/P = TP/(TP + FN)

NPV (negative predictive value), NPV = TN/(TN + FN)

NPV is the proportion of correct detecting about negative; it means that the people who tested negative actually are negative. ACC represents the proportion of correct estimating in the sum of positive and negative. NPV is the proportion of people who tested negative in actually negative population. Compared to NPV, TPR is also called Hit Rate; it is the proportion of correctly detecting the people who actually are positive in the tested positive population. ACC, TPR, and NPV tell us the result is better or worse than LARS.

Another character we judge the result with is SSR. The smooth turning point of SSR is corresponding to the optimal regression coefficient of predictor variable.

4.3. Experimental Result. Threshold t starting from 0 increases to 1 for the step length 0.01; we draw the changing curve of ACC, TPR, and NPV, with the negative and positive as dependent variables and the 8 attribute variables as the independent variables.

Figure 4 shows the accuracy and the comprehensive optimal value of three inspection standards after each cycle of LASSO with Pima Indians Diabetes.

Figure 4 shows that NPV are all improved when adding the weighting to LASSO's solution; NPV is improved 5.16% when adding part_PCA, 5.58% when adding Independence Weight, and 5.1% when adding CRITIC. We can find that TPR is improved 13% when adding part_PCA, 14% when adding Independence Weight, and 13% when adding CRITIC for those methods changing the approach direction of algorithm. It is observed that ACC is improved 0.32% when adding part_PCA and Independence Weight. And there is no effect to ACC when adding CRITIC so the ACC keeps the original number.

Therefore, the solutions of LASSO when adding the Independence Weight are optimal, followed by the CRITIC and part_PCA. This improved algorithm significantly increases the NPV and TPR ensuring the ACC is not reduced. The veracity of LASSO's solutions is improved through changing approach direction when adding the weighting.

Figure 5 shows the SSR (Sum of Squared Residuals) of response and equiangular direction after each cycle of LASSO with Pima Indians Diabetes. It is obvious that the general trend of SSR completely remains consistent and the residual of optimal coefficient almost remains consistent when adding the different weight judgment. It will not change the advantages of LASSO when adding the weighting. If the optimal regression coefficient of predictor increases significantly, the SSR will change too. It increases 0.051 residual of LASSO solution when adding part_PCA, reduces 0.021 residual when adding CRITIC and Independence Weight. The results show that the final regression result is closer to the real response when adding CRITIC and Independence Weight.

It can be found when synthesizing these three inspection standards that, adding the approach with multidimensional weight, the threshold value of the optimal solution is mainly reduced (except CRITIC), which means that the sum of absolute values of system's regression coefficient is less than a smaller threshold; this algorithm meets the requirements in more extreme threshold range.

Table 1 shows the original regression coefficient [beta], [[beta].sub.part_PCA] which adds part_PCA, [[beta].sub.IW] which adds Independence Weight, and [[beta].sub.CRITIC] which adds CRITIC.

Table 2 shows the difference and innovation of the multidimensional weight LARS, improved-LARS associated with part_PCA; Independence Weight and CRITIC act on [beta], changing the regression track and getting more accurate results.

5. Conclusions

In this paper, a method considering the variables choosing and the approach direction of LARS algorithm is used to solve LASSO; we propose the LARS algorithm based on multidimensional weight to improve the veracity of LASSO's solutions and keep the advantage of LASSO's parameter estimation, which has stable regression coefficient, reduces the number of parameters, and has good consistency of parameter convergence. We verify the efficiency of the algorithm with Pima Indians Diabetes Data Set. The precision of the calculated weight was flawed for the greater dimension of individual, so we need to further optimize the embedding weight algorithm in the later studies, to improve the accuracy and precision of regression algorithm in approach variable and direction choosing which is changed by weighting.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This study was supported by the National Natural Science Foundation of China (61170192, 41271292), Chinese Postdoctoral Science Foundation (2015M580765), the Fundamental Research Funds for the Central Universities (XDJK2014C039, XDJK2016C045), Doctoral Fund of Southwestern University (swu1114033), and the Project of Scientific and Technological Research Program of Chongqing Municipal Education Commission (KJ1403106).


[1] A. John and J. A. Rice, Mathematical Statistics and Data Analysis, Mathematical statistical physics Elsevier, 2006.

[2] E. P. G. Box, S. J. Hunter, and W. G. Hunter, "Statistics for experimenters: an introduction to design, data analysis, and model building," Statistics for Experimenters an Introduction to Design, vol. 73, no. 10, article S229, 2014.

[3] Z. L. Ke, The Application of LASSO and Other Related Methods in Multiple Linear Regression Modle, Jiaotong University, Beijing, China, 2011.


[5] S. A. van de Geer, "High-dimensional generalized linear models andthe lasso," The Annals of Statistics,vol. 36,no. 2,pp. 614-645, 2008.

[6] L. E. Frank and J. H. Friedman, "A statistical view of some chemometrics regression tools," Technometrics, vol. 35, no. 2, pp. 109-135, 1993.

[7] L. Breiman, "Better subset regression using the nonnegative garrote," Technometrics, vol. 37, no. 4, pp. 373-384,1995.

[8] R. Tibshirani, "Regression shrinkage and selection via the LASSO," Journal oftheRoyal Statistical Society, vol. 58, no. 1, pp. 267-288, 1996.

[9] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, "Least angle regression," Tech. Rep., Stanford University, Stanford, Calif, USA, 2002.

[10] B. Efron and R. Tibshirani, "Least angle regression," Mathematics, vol. 32, no. 2, pp. 407-499, 2004.

[11] H. Zou, "The adaptive lasso and its oracle properties," Journal of the American Statistical Association, vol. 101, no. 476, pp. 1418-1429, 2006.

[12] S. S. Keerthi and S. Shevade, "A fast tracking algorithm for generalized LARS/LASSO," IEEE Transactions on Neural Networks, vol. 18, no. 6, pp. 1826-1830, 2007.

[13] C. Charbonnier, J. Chiquet, and C. Ambroise, "Weighted-LASSO for structured network inference from time course data," Statistical Applications in Genetics & Molecular Biology, vol. 9, no. 1, article 15, 2010.

[14] K. M. Jung, "Weighted least absolute deviation LASSO estimator," Communications of the Korean Statistical Society, vol. 18, no. 6, pp. 733-739, 2011.

[15] C. L. Bergersen, K. I. Glad, and H. Lyng, "Weighted LASSO with data integration," Statistical Applications in Genetics & Molecular Biology, vol. 10, no. 1, pp. 1-29, 2011.

[16] O. Arslan, "Weighted LAD-LASSO method for robust parameter estimation and variable selection in regression," Computational Statistics & Data Analysis, vol. 56, no. 6, pp. 1952-1965, 2012.

[17] S. Salzo and S. Villa, "Inexact and accelerated proximal point algorithms," Journal of Convex Analysis, vol. 19, no. 4, pp. 1167-1192, 2012.

[18] Q. Zhou, S. Song, C. Wu, and G. Huang, "Kernelized LARSLASSO for constructing radial basis function neural networks," Neural Computing and Applications, vol. 23, no. 7-8, pp. 1969-1976, 2013.

[19] Y. Zhao, H. Chen, and R. T. Ogden, "Wavelet-based weighted LASSO and screening approaches in functional linear regression," Journal of Computational & Graphical Statistics, vol. 24, no. 3, 2015.

[20] A. A. Salama, M. O. Ahmad, and M. N. Swamy, "Underdetermined DOA estimation using MVDR-weighted LASSO," Sensors, vol. 16, no. 9, p. 1549, 2016.

[21] I. T. Jolliffe, Principal Component Analysis, vol. 87, Springer, Berlin, Germany, 1986.

[22] P. Zhang, Research of Comprehensive Evaluation Based on Principal Component Analysis, Nanjing University of Science and Technology, 2004.

[23] H. Cai and W. L. Shen, The Weight of Comprehensive Benefit Evaluation in Hospital (2)--Independence Weight, Chinese Hospital Statistics, 1997.

[24] J. He, E. S. Gao, and L. Chaohua, "The study of the weight coefficient and standardized method of the comprehensive evaluation," Journal of Public Health in China, vol. 17, no. 11, pp. 1048-1050, 2001.

[25] D. Diakoulaki, G. Mavortas, and L. Papayanakis, "Determining objective weights in multiple criteria problem: the CRITIC method," Computer & Operation Researvh, no. 22, pp. 763-770, 1995.

[26] U. E. Choo, B. Schoner, and W. C. Wedley, "Interpretation of criteria weights in multicriteria decision making," Computers and Industrial Engineering, vol. 37, no. 3, pp. 527-541,1999.

[27] B. Srdjevic, Y. D. P. Medeiros, and A. S. Faria, "An objective multi-criteria evaluation of water management scenarios," Water Resources Management, vol. 18, no. 1, pp. 35-54, 2004.

[28] V. Sigillito, UCI Machine Learning Repository, The Johns Hopkins University, Applied Physics Laboratory, School of Information and Computer Science, 1990.

Chen ChunRong, Chen ShanXiong, Chen Lin, and Zhu YuChen

College of Computer & Information Science, Southwest University, Chongqing, China

Correspondence should be addressed to Chen ShanXiong;

Received 15 November 2016; Revised 12 February 2017; Accepted 21 March 2017; Published 4 May 2017

Academic Editor: Farouk Yalaoui

Caption: Figure 1: Basic steps of the LARS.

Caption: Figure 2: The correlation of X and Y change when adding the multidimensional weight.

Caption: Figure 3: The predictor direction when adding the multidimensional weight.

Caption: Figure 4: Curves about three kinds of inspection standard.

Caption: Figure 5: The curve about the SSR of response and equiangular direction.
Table 1: Regression coefficient.

               [beta]1   [beta]2    [beta]3    [beta]4

B              1.7211    3.5409     -0.4862    -0.3665
Bpart_PCA      1.5300    3.5311     -0.4269    -0.3018
[beta]IW       1.5050    3.5294     -0.4253    -0.2863
[beta]CRITIC   1.6449    3.4482     -0.5019    -0.3705

               [beta]5   [beta]6    [beta]7    [beta]8

B              -0.6504   2.2141     0.7071     -0.1810
Bpart_PCA      -0.6181   2.0409     0.6727     -0.0567
[beta]IW       -0.6072   2.0066     0.6651     -0.0356
[beta]CRITIC   -0.6302   2.1668     0.5471     -0.1597

Table 2: Principles of w in different algorithm.

Comparison of w

Improved-LARS      w = [w.sub.A] = [A.sub.A][G.sup.-1.sub.A][1.sub.A]
weight LARS
part_PCA           w = w * centralization(part_PCA(X))
Independence       w = w * centralization(IW(X))
CRITIC             w = w * centralization(CRITIC(X))
COPYRIGHT 2017 Hindawi Limited
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2017 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Research Article; least absolute shrinkage and selection operator
Author:ChunRong, Chen; ShanXiong, Chen; Lin, Chen; YuChen, Zhu
Publication:Advances in Artificial Intelligence
Article Type:Report
Date:Jan 1, 2017
Previous Article:Pop-out: a new cognitive model of visual attention that uses light level analysis to better mimic the free-viewing task of static images.
Next Article:iWordNet: A New Approach to Cognitive Science and Artificial Intelligence.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters