Printer Friendly

Nonlinear Regression with High-Dimensional Space Mapping for Blood Component Spectral Quantitative Analysis.

1. Introduction

The component concentration in human blood may be an indicator of some diseases. Fast and accurate determination is very essential to the early diagnosis of the diseases. For instance, the serum uric acid (UA) level can be used as an indicator for the detection of diseases related to purine metabolism [1-3] and leukemia pneumonia [4-6]. Various analytical methods are developed for the determination of UA. These include electrochemical and chemiluminescence [7, 8], high-performance liquid chromatography [9, 10], and spectroscopic quantitative analysis [11, 12]. As spectroscopic quantitative analysis only requires a small sample with easy preparation, the method for blood analysis attracts more attention [13, 14].

In a spectroscopic quantitative analysis, when radiation hits a sample, the incident radiation may be absorbed, and the relative contribution of absorption spectrum depends on the chemical composition and physical parameters of the sample. A spectrometer is used to collect a continuous absorption spectrum. The concentration of the component could be predicted by a regression algorithm [15-17]. Partial least square (PLS) regression and support vector regression (SVR) models have been applied [18-20]. PLS focus on finding the wavelengths that have the closest relationship with the concentration regression. SVR operates on the structural risk minimization (SRM) principle and the Vapnik-Chervonenkis (VC) theory [21, 22]. SVR uses the SRM principle instead of the traditional empirical risk minimization (ERM) which equips the model great generalization. The wildly used linear regression models may not be guaranteed in practice for some restrictions on the spectral data [23, 24]. The spectral data collected by low precision system always exhibits a characteristic of nonlinearity. Moreover, a high concentration of blood component may be beyond the optical determination linear range [25]. Samples with higher concentration should be diluted to meet the linearity requirement. The kernel method can be introduced to overcome the restriction [26]. Using the kernel method, the input vectors are mapped into a higher dimensional feature space and makes nonlinear problems into linearly or approximately linearly problems. A kernel can be any function that meets Mercy's condition. Kernel-based regression methods are reported for spectral quantitative determination. The comparison of SVR with Gaussian kernel and other four nonlinear models are presented by Balabin and Lomakina [27]. The performance of SVR with Gaussian kernel and PLS model for fruit quality evaluation is compared by Malegori et al. [28]. The evaluation of SVR with Gaussian, polynomial, sigmoid, linear kernel, and PLS for Biodiesel content determination is discussed by Alves and Poppi [29]. As the most famous kernel, Gaussian kernel is usually adopted by most of the researchers (three more traditional kernels are discussed by Alves and Poppi). Different kernels describe different high-dimensional feature space mapping which affects regression performance [30, 31]. More kernels are needed to be discussed to evaluate the high-dimensional space mapping. Moreover, PLS can be extended to nonlinear regression and compared with SVR with the same kernel.

Nonlinear regression with high-dimensional space mapping for blood component spectral quantitative analysis is discussed in this paper. Kernels are incorporated with PLS and SVR to realize nonlinear regression in the original input space. The kernel extension of PLS and SVR is completed by replacing the dot product calculation of elements with the kernel. Eight kernels are used in this paper to discuss the influence of different space mapping to the blood component spectral quantitative analysis. Each kernel and corresponding parameters are assessed to build the optimal nonlinear regression model. The dataset obtained from spectral measurement of uric acid concentration is used to evaluate the effectiveness of the proposed method. The experiment results are analyzed, and the mean squared error of prediction (MSEP) is used to compare the predictive capability of the various models.

This article is organized as follows. The methods are introduced in Section 2. The experimental process is explained in Section 3. The results are analyzed in Section 4. Finally, Section 5 concludes the paper.

2. The Methods

2.1. PLS. PLS is advantageous to ordinary multiple linear regression for it examines for collinearities in the predictor variables. It assumes uncorrelated latent variables which are linear combinations of the original input data. PLS relies on a decomposition of the input variable matrix based on covariance criteria. It finds factors (latent variables) that are descriptive of input variables and are correlated with the output variables. For PLS, the concentration of the blood component (Y) is calculated by the following linear equation: Y = XB + F, where X is an input matrix of wavelength signals, B is a matrix of regression coefficients, and F is a bias vector. Matrix B has the form

B = [X.sup.T] U [([T.sup.T]X[X.sup.T]).sup.-1] [T.sup.T]Y, (1)

where the latent vectors T and U are linear combinations of input and output variables.

2.2. SVR. For SVR, a linear regression can be performed between the matrix of wavelength signals X and corresponding blood component concentration Y: Y = [omega]X + B, where [omega] is the matrix of weight coefficients and B is a bias vector. According to Lagrange multiplier and Karush-KuhnTucker (KKT) condition

[omega] = [l.summation over (i=1)] ([[alpha].sub.i] - [[alpha].sub.i.sup.*]) [x.sub.i], (2)

where [x.sub.i] is a variable of matrix X, [[alpha].sub.i] and [[alpha].sub.i.sup.*] are the corresponding Lagrange coefficients, and i is the number of samples. The linear regression equation can be written as

Y = [l.summation over (i=1)]([[alpha].sub.i] - [[alpha].sub.i.sup.*]) [x.sub.i.sup.T] [x.sub.j] + B. (3)

2.3. High-Dimensional Mapping. The regression ability of linear model could be enhanced by mapping the input data into high-dimensional space. By using the kernel method, the algorithm realizes a prediction in high-dimensional feature space without an explicit mapping of original space. A kernel describes the function of two elements in the original space which is concerned to be the dot product of them in feature space. A kernel extension of a linear algorithm can be completed by replacing the dot product calculation of elements. The combination kernel extension of PLS and SVR will be introduced.

Kernel PLS is a nonlinear extension of PLS. A nonlinear mapping [PHI] : x [member of] [R.sup.N] [right arrow] [PHI](x) [member of] F is used to transform the original data into a feature space. When a linear PLS regression is constructed, a nonlinear PLS is obtained for original input data. The kernel gram matrix K can be calculated in the following form: K = [PHI][[PHI].sup.T]. The component concentration regression model comes out as

[mathematical expression not reproducible], (4)

where [??] and Y are the output variables of validation set and calibration set, [[PHI].sub.v] is the matrix of validation variable feature space mapping, latent vectors T and U are linear combinations of input and output variables, and [K.sub.v] is the matrix composed of [K.sub.ij] = K([x.sub.i], [x.sub.j]), where [x.sub.i] and [x.sub.j] are input variables of validation set and calibration set. The nonlinear regression can be determined when the kernel function is selected.

Kernel extended SVR, the concentration of the component, is calculated by the regression function: Y = [omega][PHI](x) + B. [PHI](x) is a high-dimensional mapping that introduced to complete the nonlinear regression. x is an input variable of wavelength signals and [omega] and B act the same role in SVR. Define kernel function: K = [PHI][[PHI].sup.T]. The component concentration regression model can be expressed as the following expression:

[mathematical expression not reproducible], (5)

where Y is the output variable of the validation set, [[PHI].sub.v] is the matrix of validation variable feature space mapping, and [K.sub.v] is the matrix composed of [K.sub.ij] = K([], [x.sub.j]), where x, and are input variables of the validation set and calibration set. The kernel extended SVR is completed.

Kernel determines the feature of high-dimensional space mapping and affects the regression performance. To build the optimal nonlinear regression model, different kernels should be evaluated combined with PLS and SVR. The kernels [32] used in the experiments are the following:

(1) Linear kernel:

K([x.sub.i], [x.sub.j]) = [x.sub.i.sup.T] [x.sub.j]. (6)

Linear kernel has no parameter. Actually, KPLS turns into PLS, and SVR turns into LinearSVR when linear kernel is adopted.

(2) Gaussian kernel:

K ([x.sub.i]m [x.sub.j]) = exp(-[[parallel][x.sub.i] - [x.sub.j][parallel].sup.2]/2[[sigma].sup.2]). (7)

The kernel parameter is the width, [sigma].

(3) Polynomial kernel:

K([x.sub.i], [x.sub.j]) = [([x.sub.i.sup.T] [x.sub.j] + 1).sup.d]. (8)

The kernel parameter d is the degree.

(4) Inverse multiquadric kernel:

K ([x.sub.i] - [x.sub.j]) = 1/ [square root of ([[parallel] [x.sub.i] - [x.sub.j][parallel].sup.2] + [c.sup.2])]. (9)

The kernel parameter is [c.sup.2].

(5) Semi-local kernel:

[mathematical expression not reproducible] (10)

The kernel parameter is the width, [sigma].

(6) Exponential kernel:

[mathematical expression not reproducible]. (11)

The kernel parameter is the width, [sigma].

(7) Rational kernel:

[mathematical expression not reproducible]. (12)

The kernel parameter is [c.sup.2].

(8) Kmod kernel:

[mathematical expression not reproducible]. (13)

The kernel parameter is [c.sup.2].

The prediction performance of high-dimensional mapping by the kernels introduced and the related parameter optimization will be discussed in the next section.

3. Experimental

3.1. Dataset. To evaluate the effectiveness of nonlinear regression with high-dimensional space for blood component spectral quantitative, the UA dataset is used in the experiment.

200 samples are obtained by uric acid concentration spectral determination experiment. Each spectrum has 601 signals from 400 nm to 700 nm with a 0.5 nm interval. The UA concentrations from 105 to 1100 [micro]mol/L are evaluated. A spectrum of the UA data is shown in Figure 1.

3.2. Experimental Procedure. In order to assess the prediction effect of high-dimensional space mapping nonlinear regression for blood component spectral quantitative analysis, the linear, Gaussian, polynomial, inverse multiquadric, semilocal, exponential, rational, and Kmod kernels are combined with PLS (abbreviated as PLS, GKPLS, PKPLS, IMKPLS, SLKPLS, EKPLS, RKPLS, and KKPLS) and SVR (abbreviated as LinearSVR, GSVR, PSVR, IMSVR, SLSVR, ESVR, RSVR, and KSVR) to build the prediction models for the uric acid dataset and the effectiveness of these models are evaluated.

For the experiments, the dataset should be split into the calibration set and the validation set. The dataset is divided based on the shutter grouping strategy. One sample is selected into the validation set every five samples, and the rest samples are into the calibrating set. Out of the total 200 samples, 40 samples are used as the validation set while the left 160 samples the calibrating set. The calibrating set is used for building the prediction model, and the validation set is adapted for evaluating the effectiveness of the model. Both the spectral signals and the reference UA concentrations for the two sets are normalized according to the values of the calibration set.

To compare the prediction effect with different kernels, kernel parameter and related parameters will be optimized. The kernel parameter 1/2[[sigma].sup.2] and [c.sup.2] search ranges are [[2.sup.-8], [2.sup.8]] in steps of [2.sup.0.5] for Gaussian, semi-local, exponential, inverse multiquadric, rational, and Kmod kernels. The kernel parameter d search ranges are [1, 5] in steps of 1 for polynomial kernel. For kernel PLS, the search ranges are [1, 30] in steps of 1 for the number of latent variants []. For SVR, the search ranges are [[2.sup.-4], [2.sup.10]] in steps of [2.sup.0.5] for penalty parameter C and [[2.sup.-8], [2.sup.-1]] in steps of 2 for nonsensitive loss [epsilon].

Grid search based on cross-validation is used for parameter optimization. Different combinations of the parameters will be tested for each kernel on the calibration set using the 10-fold cross-validation method. In the 10fold cross-validation, data are divided into 10 groups, 9 groups are used as the training data, and the left group is used as the test data. Change the test group next time until all the groups are tested. The cross-validation is then repeated 10 times, and the 10 results are averaged as the final prediction. The combination of parameters for each kernel with minimum MSECV is adopted to build the regression model.

The MSEP for the validation set, the squared correlation coefficient for the validation set ([R.sup.2.sub.p]), the MSECV for the calibration set, and the cross-validation correlation coefficient calculated by 10-fold cross-validation for the calibration set ([]) are used to assess and compare the predictive ability of the various models.

In the next section, the parameter influence on the MSECV of cross-validation for each kernel introduced above will be discussed. The experiment results of kernel prediction capability will be evaluated on the validation data.

4. Results and Discussion

For each kernel, the curves of parameter optimization for KPLS and SVR are shown in Figures 2 and 3. The analytical results for the UA dataset are summarized and arranged in order of MSEP in Table 1.

The influence of [] and kernel parameter on the MSECV for KPLS is shown in Figures 2(b)-2(h). Figure 2(a) is the plot of linear kernel (PLS) which describes the relationship between the [] and MSECV as the linear kernel has no kernel parameter. Figure 2(a) shows that MSECV of PLS reduces rapidly in the beginning and reaches the minimum equivalent to 10, then increased. Figures 2(b), 2(e), and 2(f) describe the curves of Gaussian, semi-local, and exponential. The three kernels have the same parameter a, and their curves have some familiar features. The [] has a prior influence on MSECV when a is small. The best MSECV can be achieved in this area. When 1/2[[sigma].sup.2] is close to 1, MSECV grows quickly with the increasing of a and then becomes stable when it exceeds a certain value. Figure 2(c) shows that while the degree parameter (d) of polynomial kernel varies from 1 to 5, and the MSECV increases steadily. The same as PLS, the minimum of MSECV is achieved when [] is set to 10. Figures 2(d), 2(g), and 2(h) show the curves of inverse multiquadric, rational, and Kmod which have the same kernel parameter of [c.sup.2]. The MSECV reduces rapidly with the increasing of [c.sup.2] when [c.sup.2] is smaller than 1. When [c.sup.2] is big, the [] has a major impact on MSECV.

The influence of the penalty parameter C and kernel parameter on the MSECV with the optimal [epsilon] obtained for different kernels combined with SVR is shown in Figures 3(b)-3(h). For linear kernel (LinearSVR), Figure 3(a) describes the relationship among the MSECV and C and [epsilon]. The MSECV reduces with the increasing of C. When C goes up to 32, the MSECV reaches the minimum and then increased. The curves of Gaussian, semi-local, and exponential (Figures 3(b), 3(e), and 3(f)) are probably similar. The MSECV reduces quickly with the reducing of C at first. When 1/2[[sigma].sup.2] is close to 1, MSECV begins to level off. The C has an inferior impact on MSECV. In Figure 3(c), MSECV grows rapidly with the rise of the polynomial kernel parameter d, the penalty constant C makes little change to the MSECV. Figures 3(d), 3(g), and 3(h) present that the MSECV of inverse multiquadric, rational, and Kmod kernels reduces with the increase of kernel parameter [c.sup.2] in general. The MSECV of Kmod kernel rises slightly at the beginning and then dropped. With the increasing of C, the general changing trend of MSECV is reducing. Unlike the two kernels, C makes no significant effect on the MSECV of rational kernel. The analytical results for UA dataset are summarized in Table 1. MSEP and [R.sup.2.sub.p] based on the validation set and MSECV and [] associated to the calibrating set are presented. The optimized parameters of models are also listed.

For KPLS, SLKPLS achieves the most accurate prediction with the lowest MSEP and the highest [R.sup.2.sub.p]. According to the MSEP, the prediction performance of SLKPLS is regarded as the best (MSEP is 1880.18). Second is that of the GKPLS (MSEP is 2347.11) and then IMKPLS, RKPLS, KKPLS, EKPLS, and PKPLS.

Linear kernel (PLS) produces the worst prediction performance with MSEP of 8554.57. For SVR, the IMSVR has the best predictive capability with MSEP of 1523.42 followed by RSVR (MSEP is 1528.66), KSVR (MSEP is 1530.01), GSVR (MSEP is 2021.93), SSVR (MSEP is 2359.86), ESVR (MSEP is 2971.75), PSVR (MSEP is 5518.49), and LinearSVR (MSEP is 5519.22). It is obvious that the traditional linear regression algorithm cannot perform well on blood component spectral quantitative analysis. PLS has the highest MSEP and then LinearSVR. IMSVR exhibits the best performance on the validation set. The MSEP values of the IMSVR are 0.34%, 0.44%, 18.97%, 24.65%, 35.09%, 35.44%, 35.59%, 39.54%, 41.70%, 46.75%, 48.74%, 72.39%, 72.40%, 82.19%, 82.19% lower than the values obtained by the RSVR, KSVR, SLKPLS, GSVR, GKPLS, SLSVR, IMKPLS, RKPLS, KKPLS, EKPLS, ESVR, PSVR, LinearSVR, PKPLS, and PLS. Taking advantage of the SRM principle, SVR has a better prediction performance in general.

For both PLS and SVR, the optimized kernel parameter d is 1 which makes the polynomial kernel act the same as the linear kernel. That explains that the polynomial kernel has almost the same MSEP, [R.sup.2.sub.p], MSECV, and [] as linear kernel in this experiment. The ranking of kernels based on [R.sup.2.sub.p] is basically similar to that of MSEP. As a global kernel, polynomial or linear kernel allows data points that are far away from the test point to have an influence on the kernel values.

The other kernels used in the paper are local kernels for only data points that are close to the test point have an influence on the kernel values. The good extrapolation abilities presented by local kernels show that only some specific spectral data are essential to the blood component concentration prediction. The performance of critical data is enhanced during high-dimensional mapping by local kernels. Based on the above studies, IMSVR is recommended for the nonlinear regression for blood component spectral quantitative analysis. The optimal kernel parameter [c.sup.2] is 64, the penalty parameter is set to 256, and the nonsensitive loss is set to 0.003906.

5. Conclusions

In the paper, high-dimensional space mapping methods which combined kernels with PLS and SVR are proposed for blood component spectral quantitative. For each model, the general trend of MSECV on model parameters is discussed. Some conclusions could be drawn as follows. Initially, the blood component spectral quantitative results show that for nonlinear regression models, prediction errors are more precise than the ones obtained by linear models. Furthermore, SVR provides better performance than PLS when combined with kernels. Additionally, local kernels are recommended for high-dimensional mapping according to the blood spectral data features. Finally, the experiment results verify that the IMSVR (a local kernel combined with SVR) has the higher predicative ability and could be used for blood component spectral quantitative effectively.

Conflicts of Interest

The authors declare that there is no conflict of interests regarding the publication of this paper.


This work is supported by the National Natural Science Foundation of China (61375055), Program for New Century Excellent Talents in University (NCET-12-0447), Provincial Natural Science Foundation of Shaanxi (2014JQ8365), State Key Laboratory of Electrical Insulation and Power Equipment (EIPE16313), and Fundamental Research Funds for the Central University.


[1] B. Alvarez-Lario and J. Macarron-Vicente, "Uric acid and evolution," Rheumatology, vol. 49, no. 11, pp. 2010-2015, 2010.

[2] R. Duan, J. Jiang, S. Liu et al., "Spectrofluorometric determination of ascorbic acid using thiamine and potassium ferricyanide," Instrumentation Science & Technology, vol. 45, no. 3, pp. 312-323, 2017.

[3] A. Fang, Q. Wu, Q. Lu et al., "Upconversion ratiometric fluorescence and colorimetric dual-readout assay for uric acid," Biosensors and Bioelectronics, vol. 86, pp. 664-670, 2016.

[4] Z. Wang, Y. Lin, Y. Liu et al., "Serum uric acid levels and outcomes after acute ischemic stroke," Molecular Neurobiology, vol. 53, no. 3, pp. 1753-1759, 2016.

[5] U. A. A. Sharaf el Din, M. M. Salem, and D. O. Abdulazim, "Uric acid in the pathogenesis of metabolic, renal, and cardiovascular diseases: a review," Journal of Advanced Research, vol. 8, no. 5, pp. 537-548, 2017.

[6] S. P. Haen, V. Eyb, N. Mirza et al., "Uric acid as a novel biomarker for bone-marrow function and incipient hematopoietic reconstitution after aplasia in patients with hematologic malignancies," Journal of Cancer Research and Clinical Oncology, vol. 143, no. 5, pp. 759-771, 2017.

[7] D. Zhao, D. Fan, J. Wang, and C. Xu, "Hierarchical nanoporous platinum-copper alloy for simultaneous electrochemical determination of ascorbic acid, dopamine, and uric acid," Microchimica Acta, vol. 182, no. 7-8, pp. 1345-1352, 2015.

[8] Y. Sheng, H. Yang, Y. Wang, L. Han, Y. Zhao, and A. Fan, "Silver nanoclusters-catalyzed luminol chemiluminescence for hydrogen peroxide and uric acid detection," Talanta, vol. 166, pp. 268-274, 2017.

[9] X. L. Li, G. Li, Y. Z. Jiang et al., "Human nails metabolite analysis: a rapid and simple method for quantification of uric acid in human fingernail by high-performance liquid chromatography with UV-detection," Journal of Chromatography B, vol. 1002, pp. 394-398, 2015.

[10] N. Sher, N. Fatima, S. Perveen, and F. A. Siddiqui, "Determination of benzimidazoles in pharmaceuticals and human serum by high-performance liquid chromatography," Instrumentation Science & Technology, vol. 44, no. 6, pp. 672-682, 2016.

[11] A. Kumar, A. Hens, R. K. Arun et al., "A paper based microfluidic device for easy detection of uric acid using positively charged gold nanoparticles," Analyst, vol. 140, no. 6, pp. 1817-1821,2015.

[12] P. Xu, R. Li, Y. Tu, and J. Yan, "A gold nanocluster-based sensor for sensitive uric acid detection," Talanta, vol. 144, pp. 704-709, 2015.

[13] G. Tsiminis, E. P. Schartner, J. L. Brooks, and M. R. Hutchinson, "Measuring and tracking vitamin B12: a review of current methods with a focus on optical spectroscopy," Applied Spectroscopy Reviews, vol. 52, no. 5, pp. 439-455, 2017.

[14] Y. Shang, Z. Qian, R. C. Mesquita, and M. Dehaes, "Recent advances in optical spectroscopic and imaging methods for medicine and biology," Journal of Spectroscopy, vol. 2016, Article ID 4095790, 2 pages, 2016.

[15] A. Oleszko, J. Hartwich, A. Wojtowicz, M. Gasior-Glogowska, H. Huras, and M. Komorowska, "Comparison of FTIR-ATR and raman spectroscopy in determination of VLDL triglycerides in blood serum with PLS regression," Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, vol. 183, no. 5, pp. 239-246, 2017.

[16] A. Gredilla, S. Fdez-Ortiz de Vallejuelo, N. Elejoste, A. de Diego, and J. M. Madariaga, "Non-destructive spectroscopy combined with chemometrics as a tool for green chemical analysis of environmental samples: a review," TrAC Trends in Analytical Chemistry, vol. 76, pp. 30-39, 2016.

[17] P. P. Meleiro and C. Garcia-Ruiz, "Spectroscopic techniques for the forensic analysis of textile fibers," Applied Spectroscopy Reviews, vol. 51, no. 4, pp. 278-301, 2016.

[18] T. R. Viegas, A. L. M. L. Mata, M. M. L. Duarte, and K. M. G. Lima, "Determination of quality attributes in wax jambu fruit using NIRS and PLS," Food Chemistry, vol. 190, pp. 1-4, 2016.

[19] A. J. Fernandez-Espinosa, "Combining PLS regression with portable NIR spectroscopy to on-line monitor quality parameters in intact olives for determining optimal harvesting time," Talanta, vol. 148, pp. 216-228, 2016.

[20] X. Ma, Y. Zhang, and Y. Wang, "Performance evaluation of kernel functions based on grid search for support vector regression," in 2015 IEEE 7th International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, Automation and Mechatronics (RAM), pp. 283-288, Siem Reap, Cambodia, 2015, IEEE.

[21] V. Vapnik, Statistical Learning Theory, Wiley, New York, NY, USA, 1998.

[22] J. M. Melenk and I. Babuska, "The partition of unity finite element method: basic theory and applications," Computer Methods in Applied Mechanics and Engineering, vol. 139, no. 1-4, pp. 289-314, 1996.

[23] N. Labbe, S.-H. Lee, H. W. Cho, M. K. Jeong, and N. Andre, "Enhanced discrimination and calibration of biomass NIR spectral data using non-linear kernel methods," Bioresource Technology, vol. 99, no. 17, pp. 8445-8452, 2008.

[24] Z. Wu, E. Xu, J. Long et al., "Measurement of fermentation parameters of chinese rice wine using raman spectroscopy combined with linear and non-linear regression methods," Food Control, vol. 56, pp. 95-102, 2015.

[25] A. N. Dhinaa and P. K. Palanisamy, "Optical nonlinearity in measurement of urea and uric acid in blood," Natural Science, vol. 2, no. 2, pp. 106-111, 2010.

[26] W. Ni, L. N0rgaard, and M. M0rup, "Non-linear calibration models for near infrared spectroscopy," Analytica Chimica Acta, vol. 813, pp. 1-14, 2014.

[27] R. M. Balabin and E. I. Lomakina, "Support vector machine regression (SVR/LS-SVM)--an alternative to neural networks (ANN) for analytical chemistry? Comparison of nonlinear methods on near infrared (NIR) spectroscopy data," Analyst, vol. 136, no. 8, pp. 1703-1712, 2011.

[28] C. Malegori, E. J. Nascimento Marques, S. T. de Freitas, M. F. Pimentel, C. Pasquini, and E. Casiraghi, "Comparing the analytical performances of micro-NIR and FT-NIR spectrometers in the evaluation of acerola fruit quality, using PLS and SVM regression algorithms," Talanta, vol. 165, no. 1, pp. 112-116, 2017.

[29] J. C. L. Alves and R. J. Poppi, "Biodiesel content determination in diesel fuel blends using near infrared (NIR) spectroscopy and support vector machines (SVM)," Talanta, vol. 104, pp. 155-161, 2013.

[30] T. Berry and T. Sauer, "Local kernels and the geometric structure of data," Applied and Computational Harmonic Analysis, vol. 40, no. 3, pp. 439-469, 2016.

[31] S. Yin and J. Yin, "Tuning kernel parameters for SVM based on expected square distance ratio," Information Sciences, vol. 370-371, pp. 92-102, 2016.

[32] M. G. Genton, "Classes of kernels for machine learning: a statistics perspective," Journal of Machine Learning Research, vol. 2, pp. 299-312, 2001.

Xiaoyan Ma (iD), (1) Yanbin Zhang (iD), (1) Hui Cao (iD), (1) Shiliang Zhang (iD), (1) and Yan Zhou (iD), (2)

(1) Shaanxi Key Laboratory of Smart Grid & the State Key Laboratory of Electrical Insulation and Power Equipment, School of Electrical Engineering, Xi'an Jiaotong University, Xi'an 710049, China

(2) School of Energy and Power Engineering, Xi'an Jiaotong University, Xi'an 710049, China

Correspondence should be addressed to Yan Zhou;

Received 11 July 2017; Accepted 25 October 2017; Published 10 January 2018

Academic Editor: Vincenza Crupi

Caption: Figure 1: Spectra of the UA dataset.

Caption: Figure 2: The influence of [] and kernel parameter on the MSECV of different kernels for KPLS on UA dataset. (a) The influence of [] and kernel parameter on the MSECV of PLS. (b-f) The penalty constant and kernel parameter curves of GKPLS, PKPLS, IMKPLS, SLKPLS, EKPLS, RKPLS, and KKPLS.

Caption: Figure 3: The influence of penalty constant and kernel parameter on the MSECV of different kernels for SVR on UA dataset. (a) The influence of penalty constant and nonsensitive loss on the MSECV of LinearSVR. (b-f) The penalty constant and kernel parameter curves of GSVR, PSVR, IMSVR, SLSVR, ESVR, RSVR, and KSVR.
Table 1: Analytical results for UA dataset.

Model        MSEP     [R.sup.   MSECV    []
IMSVR       1523.42   0.9831   0.0535       0.8433
RSVR        1528.66   0.9830   0.0526       0.8456
KSVR        1530.09   0.9829   0.0526       0.8455
SLKPLS      1880.18   0.9786   0.0410       0.8804
GSVR        2021.93   0.9765   0.0448       0.8691
GKPLS       2347.11   0.9740   0.0430       0.8740
SLSVR       2359.86   0.9731   0.0427       0.8749
IMKPLS      2365.22   0.9721   0.0495       0.8560
RKPLS       2519.68   0.9692   0.0481       0.8691
KKPLS       2613.13   0.9693   0.0481       0.8589
EKPLS       2860.96   0.9692   0.0672       0.8023
ESVR        2971.75   0.9660   0.0597       0.8254
PSVR        5518.49   0.9410   0.0365       0.8935
LinearSVR   5519.22   0.9410   0.0365       0.8935
PKPLS       8554.57   0.9062   0.0393       0.8852
PLS         8554.57   0.9062   0.0393       0.8852

Model       Kernel parameter   Parameter 1 (a)   Parameter 2 (b)

IMSVR              64                256            0.003906
RSVR               64                16             0.003906
KSVR               64               1024            0.003906
SLKPLS         0.0055243             28                 /
GSVR            0.005524             512            0.003906
GKPLS          0.0039063             25                 /
SLSVR           0.003906           11.3137          0.003906
IMKPLS             32                25                 /
RKPLS              32                23                 /
KKPLS              32                23                 /
EKPLS            0.0625               2                 /
ESVR            0.003906           90.5097          0.003906
PSVR               1                 32             0.031250
LinearSVR          /                 32             0.031250
PKPLS              1                 10                 /
PLS                /                 10                 /

MSEP: mean squared error of prediction; MSECV: mean squared error
of cross-validation. [mathematical expression not reproducible]:
prediction correlation coefficient; []:
cross-validation correlation coefficient; (a) penalty parameter (C)
for SVR; number of latent variables ([]) for KPLS; (b)
nonsensitive loss (e) for SVR.
COPYRIGHT 2018 Hindawi Limited
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2018 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Research Article
Author:Ma, Xiaoyan; Zhang, Yanbin; Cao, Hui; Zhang, Shiliang; Zhou, Yan
Publication:Journal of Spectroscopy
Date:Jan 1, 2018
Previous Article:The Synergistic Antiwear Performances of Organic Titanium Compounds Containing Sulfur with Borate Ester Additive.
Next Article:Hazardous Gas Emission Monitoring Based on High-Resolution Images.

Terms of use | Privacy policy | Copyright © 2020 Farlex, Inc. | Feedback | For webmasters