# Parameter Selection Method for Support Vector Regression Based on Adaptive Fusion of the Mixed Kernel Function.

1. IntroductionThe core components and important mechanical structures of mechanical equipment will inevitably be subject to varying degrees of failure with the complex operating conditions and bad working environment. It may cause huge economic losses and casualties when equipment fails. Rolling bearing is widely used in rotating machinery, and its running state directly affects the accuracy, reliability, and life of the machine. Timely and accurate diagnoses for the fault of rolling bearing are helpful to improve the reliability of equipment and reduce the probability of accidents. Due to the highly nonlinear nature between the fault and the characteristic, fault diagnosis methods based on machine learning have been applied more and more widely in the field of automation in recent years [1-3]. Vapnik invented the superior performance of support vector regression method (support vector regression, SVR) [4]. The improved SVR algorithm will greatly improve the accuracy of fault diagnosis.

The key to SVR is the kernel function and parameter selection. There are several methods that have been used to optimize the kernel and select the regression parameters, such as cross validation learning [5, 6], gradient descent learning [7, 8], evolutionary learning [9, 10], and positive semidefinite programming learning [11, 12]. The model of support vector regression and the selection of kernel parameters are relatively few, and it primarily uses the grid search cross validation method and evolutionary method. But the efficiency of these methods is very low because of the exhaustive searching for optimal parameters [13]. When the number of parameters is more than two, it becomes almost impossible to operate, such as the genetic algorithm [14] and particle swarm optimization algorithm [15]. The more serious case is that the evolutionary algorithm may easily fall into local optimization; that is, it only obtains a suboptimal solution, rather than the optimal solution. Literature [16] provides another way to estimate kernel parameters by using the adjustment of multiparameters of LS-SVM as the parameter identification problem of a nonlinear dynamic system. By using the smoothness of the system model, the kernel parameters and the regression parameters are automatically adjusted by the extended Kalman filter (EKF). Reference [17] puts forward a new method based on the unscented Kalman filter support vector regression model selection method (UKF-SVR) to solve the loss function on the hyperparameters of a nondifferentiable problem. However, considering the accuracy of UKF, it is difficult to meet the needs of practical applications, and when the system dimensions are high, the performance of UKF is significantly degraded, leading to a curse of dimensionality. In 2009, Arasaratnam and Haykin proposed a cubature Kalman filter (CKF) that uses radial quadrature rules for optimization sigma points and weights, enhancing the ability to handle high dimensional nonlinear state estimated accuracy and improving stability. CKF is based on the 3rd degree of radial integration rule, and filtering accuracy is still limited [18]. Recently, [19] proposed a class of high-order spherical and radial integral cubature Kalman filters and proved that the five-order cubature Kalman filter (5th-degree CKF) can obtain high accuracy and high stability with low computational cost.

Although the above actions to resolve specific problems have improved the algorithm, they are single kernel function based on support vector regression. Kernel functions can be divided into local kernel functions and global kernel functions. The learning ability of local kernel functions is stronger, and the extrapolation ability of global kernel functions is stronger. The selection of kernel functions significantly affects the generalization ability of support vector regression. Using only one kernel function will often have limitations. Based on the characteristics of the original kernel function, linear fusion of a local kernel function and a global kernel function constitutes a new kernel function, the mixed kernel function, and the kernel function learned from the advantages of local and global kernel functions that can accurately reflect the actual characteristics of a sample. Hybrid kernel function is introduced to determine the local kernel and the global kernel function fusion coefficient. An appropriate fusion coefficient can better exert the advantages of hybrid kernel function. At present, combined weight values are often determined by experience [20, 21]. Reference [20] refers to the removing of pulmonary nodules which indicates that the initial selection of mixed kernel function coefficients can be based on Gray features, morphological features, and texture features. And the support vector regression based on the constructed hybrid kernel function cannot ensure the best performance.

Because of this, this paper uses the mixed kernel function as the kernel function of the SVM and the 5th-degree CKF as the basic framework, adaptively adjusting the fusion coefficient, kernel parameters, and regression parameters of the mixed kernel function. The remainder of this article is as follows: the first part is the description of the problem; the second part is a review of the typical kernel function; the third part is the parameter selection method of support vector regression based on the adaptive fusion of the mixed kernel function; the fourth part is the analysis of the algorithm; the fifth part is a simulation example; and the sixth part is the summary.

2. Problem Description

2.1. Support Vector Regression. The ultimate goal of the support vector regression is to find a regression function f: [R.sup.D] [right arrow] R:

y = f (x) = [w.sup.T][phi] (x) + b, (1)

where [phi](x) is a function that can map data x from low dimension to high dimensional feature space, w is a weight vector, and b is a numeric value that can be up or down. Standard support vector regression adopts [epsilon]-insensitive function. It is assumed that all the training data are fitted with a linear function in the accuracy of [epsilon]. The problem is translated into an objective function to optimize the objective function minimization problem as follows [22]:

[mathematical expression not reproducible], (2)

where [[xi].sub.i], [[xi].sup.*.sub.i] is the relaxation factor. When there is an error in fitting, [[xi].sub.i], [[xi].sup.*.sub.i] are greater than 0. If not, [[xi].sub.i], [[xi].sup.*.sub.i] are all equal to 0. The first term of the optimization function further smooths the fitting function to improve generalization. The second item is to reduce the error; when constant C > 0, it indicates the extent of the penalty for a sample out of error [epsilon].

The performance of support vector regression is affected by the error penalty parameter C, which is the degree of punishment that is used to process the mistakenly divided sample. C is a tradeoff between the algorithm complexity and degree of mistakenly classified samples. When the value of C is small, it means that the punishment for the empirical error of the original data is small. Machine learning complexity is small, but the experience risk is high. When the value of C is larger, the empirical error penalty is larger, and the experience risk is small. However, this can lead to high computational complexity and poor generalization ability. Therefore, it is very important to choose the appropriate punishment coefficient C for practical problems.

The structure of the support vector regression is shown in Figure 1. Another key factor that affects the performance of the support vector regression is the kernel function and its parameters. The core of the support vector regression algorithm is the introduction of kernel function. The kernel function has two aspects: the construction of the kernel function and the selection of the kernel function model. In fact, the appropriate choice of the model is the key to improve the performance of support vector regression. Model selection determines the kernel function that is more suitable for the data characteristics of the original sample data before training. The kernel function involves two steps: first, determine the type of kernel function, and then select the relevant parameters of the kernel function. Current research is focusing on the choice of the kernel function model. Because different samples may have different characteristics, the construction of the kernel function is more important than the choice of kernel function. Construction of a good kernel function is still a challenging problem.

High performance of support vector regression is difficult to obtain with a single kernel function. The characteristics of the actual sample are complicated and changeable and cannot be completely characterized by the local kernel function or the global kernel function. The mixed kernel function combines the global kernel function and the local kernel function according to a ratio that can accurately reflect the characteristics of the actual sample based on the local and global kernel functions. Therefore, the mixed kernel function has good learning ability and good generalization ability.

2.2. 5th-Degree CKF Principle. Different nonlinear filters have different performance characteristics. The estimation performance of a nonlinear filter is dependent on the specific nonlinear filter type. 5th-degree cubature Kalman filtering algorithm can obtain high accuracy and high stability with low computational cost; thus, it is selected to adaptively adjust the fusion coefficient, kernel parameters, and regression parameters. The general nonlinear system is given as follows:

[x.sub.k] = f ([x.sub.k-1]) + [w.sub.k]

[z.sub.k] = h ([x.sub.k]) + [v.sub.k], (3)

where, [x.sub.k] is n-dimensional state vector; [z.sub.k] is m-dimensional observation vector. f and h are known nonlinear functions. Both {[w.sub.k]} and {[v.sub.k]} are independent zero mean Gaussian white noise.

Similar to the CKF, the structure of 5th-degree CKF is also divided into two steps: state prediction (time updating) and measurement update. The core of the difference is that the high-order volume Kalman filter uses the phase cubature rule and weight coefficient of high dimension to solve the problem above in introduction. High-order cubature rule satisfies

[mathematical expression not reproducible], (4)

where [e.sub.j] is the jth column of the unit vector matrix of n-dimensional space. [s.sup.+.sub.j] and [s.sup.-.sub.j] are the set of points as follows:

[mathematical expression not reproducible]. (5)

The weights [[bar.w].sub.s1] and [[bar.w].sub.s2] are

[[bar.w].sub.s1] = [A.sub.n] / n (n + 2)

[[bar.w].sub.s2] = (4 - n) [A.sub.n] / (2n (n + 2)), (6)

where [GAMMA](z) = [[integral].sup.[infinity].sub.0] exp (-[lambda]) [[lambda].sup.z-1] d [lambda],

[A.sub.n] = 2[square root of ([[pi].sup.n])]/[GAMMA](n/2) is the surface area of the unit sphere. According to the moment matching method, when n = 2, the weight is

[w.sub.1] = [GAMMA] (n / 2) / (n + 2)

[w.sub.2] = n[GAMMA] (n/2) / (2 (n + 2)). (7)

This paper first chooses the mixed kernel function as the kernel function of the support vector regression. The problem with constructing the kernel function is to select the fusion coefficients of the local kernel function and the global kernel function, the kernel parameters, and the penalty parameter C of the mixed kernel function. Then, the fusion coefficients are imbedded into the super kernel parameters as the state vector so that the construction of the kernel function and the selection of the parameters of the kernel function can be transformed into a nonlinear filtering problem that can be solved by the 5th-degree CKF. Finally, the adaptive adjustment of the fusion coefficient as well as the estimation of the kernel parameters and the penalty parameters is determined.

3. Review of Classic Kernel Functions

The key to support vector regression is the introduction of kernel function. When the data set is in a low dimensional space, it is usually difficult to separate; when the data set is mapped to a high dimensional space, the formation of new data sets is more easily separated, but the computational effort for this method is huge. The introduction of kernel function reduces computation in the high dimensional feature space directly after the transformation that avoids the "curse of dimensionality" problem. The kernel function is denoted as K ([x.sub.i], [x.sub.j]), where [x.sub.i], [x.sub.j] are the sample data. Four types of kernel functions are widely used in the research and application of support vector regression [21], as shown in Table 1.

Different kernel functions are selected to form different support vector regressions. The linear kernel function, the polynomial kernel function, and the Gauss kernel function have been widely used. The most widely used one is the RBF kernel function with good learning ability. No matter what the conditions are, low dimensional, high dimensional, small samples, and large samples, RBF kernel functions are applicable. RBF kernel function has a wide convergent region, and it is an ideal classification basis function. Sigmoid kernel function from neural networks in practical application is limited. Only under specific conditions (parameters V and C satisfy certain conditions) can the sigmoid kernel function meet the conditions of symmetric and positive definite kernel function. Sigmoid kernel function is proven to have good global classification performance in the application of neural networks, but the classification performance of the application in SVM needs further research [23].

Kernel function skillfully solves the low dimensional vectors that are mapped into a high dimensional curse of dimensionality problem and improves machine learning nonlinear processing ability. However, each kind of kernel function has its own characteristics based on different kernel functions of support vector regression with different generalization abilities. At present, the kernel function is divided into two categories: global kernel function and local kernel function. The local kernel function is effective in extracting the local character of the sample. The value of kernel function is affected by the data points at a very close distance, and the interpolation ability is stronger. Therefore, learning ability is strong. The Gauss kernel function is a local kernel function. Global kernel function is effective in extracting the global characteristics of the samples. Kernel function is only affected by the distance of data points of the value, so generalization ability is strong [24]. Compared with the local kernel function, the global kernel function is weak. The linear kernel function, the polynomial kernel function, and the sigmoid kernel function all are global kernel functions. In short, the learning ability of a local kernel function is strong, and its generalization ability is weak. The generalization ability of a global kernel function is strong, but its learning ability is weak.

4. Parameter Selection Method for Support Vector Regression Based on Adaption Fusion of Mixed Kernel Function

4.1. Mixed Kernel Function. Based on the above analysis, we fuse the local and global kernel functions so that the mixed kernel function is of both strong learning ability and generalization ability. In this section, we propose a mixed kernel function based on adaptive fusion. The adaption means that the weight of every kernel function in the mixed kernel function is estimated by a filter rather than determined according to past experiences.

Theorem 1. Denote the local and global kernel functions by [K.sub.local] ([x.sub.j], [x.sub.j]) and [K.sub.global] ([x.sub.t], [x.sub.j]), respectively. Then, the mixed kernel function can be expressed by

[K.sub.mix] ([x.sub.i], [x.sub.j]) = pl x [K.sub.local] ([x.sub.i], [x.sub.j]) + p2 x [K.sub.global] ([x.sub.i], [x.sub.j]), (8)

where p 1 and p2 are the weights of the two kernel functions in the mixed kernel function and 0 [less than or equal to] p1, p2 [less than or equal to] 1, p1 + p2 = 1. The mixed kernel function is still a Mercer kernel.

Proof. Since [K.sub.local] ([x.sub.i], [x.sub.j]) and [K.sub.global] ([x.sub.1], [x.sub.j]) are local and global kernel functions, they both satisfy the Mercer condition [23]; that is, for any [phi] (x) [not equal to] 0 and [integral] [[phi].sup.2] (x) dx < 0, (9) is satisfied.

[integral] [K.sub.local] (x, x') [phi] (x) [phi] (x') dxdx' > 0

[integral] [K.sub.global] (x, x') [phi] (x) [phi] (x') dxdx' > 0. (9)

Since 0 [less than or equal to] p1, p2 [less than or equal to] 1, it can be derived that

p1 x [integral] [K.sub.local] (x, x') [phi] (x) [phi] (x') dxdx' + p2 x [integral] [K.sub.global] (x, x') x [phi] (x) [phi] (x') dxdx' > 0; (10)

that is, [integral] [K.sub.mix] (x, x') [phi] (x) [phi] (x') dxdx' > 0.

It has already been proven that any function can be chosen as a kernel function as long as it satisfies the Mercer condition. Therefore, mixed kernel function (8) can be chosen as a kernel function since it satisfies the Mercer condition. The mixed kernel function is the convex combination of the local and global kernel functions. The introduction of the mixed kernel function eliminates the deficiencies in using a single global or local kernel function.

When p1 = 0 or p2 = 0, the mixed kernel function becomes a single kernel function. The model selection of single kernel function based support vector regression only concerns the selection of the internal parameters of the kernel function. However, the model selection of mixed kernel function based support vector regression concerns not only the selection of the internal parameters of both the local and global kernel functions but also the fusion coefficients of the two kernel functions to make sure that the performance of the mixed kernel function based support vector regression is best. Before we train the support vector regression, we need to determine the weighed fusion coefficients. The coefficients p1 and p2 in (6) are usually determined by past experiences. Since the mixed kernel function does not describe the properties of the training samples very well, regression forecast performance will be degraded. Currently, there is no analytical method for the selection of fusion coefficients. The fusion coefficients are usually selected according to experience and it is difficult to estimate the coefficients on line.

4.2. The Establishment of Parameter Filter Model

4.2.1. Mixed Kernel Function Based Support Vector Regression. Differing from the support vector regression based on a single kernel function, the support vector regression based on a fused kernel function utilizes a fused kernel function containing both local and global kernel functions; that is, [phi](x) in (2) is a high dimensional feature space function composed of the mixed kernel function. To solve the convex quadratic optimization problem defined by (2), we introduce the Lagrange multipliers [[alpha].sub.i], [[alpha].sup.*.sub.i]. Then, the optimization problem can be transformed into a dual problem as follows [25]:

[mathematical expression not reproducible]. (11)

By solving the dual problem above, we can derive the solution [bar.[alpha]] = [([[bar.[alpha]].sub.1], [[bar.[alpha]].sup.*.sub.1], [[bar.[alpha]].sub.2], [[bar.[alpha]].sup.*.sub.2], ..., [[bar.[alpha]].sub.l], [[bar.[alpha]].sup.*.sub.l]).sup.T] of the original optimization problem defined by (2). Replacing the inner product ([x.sub.i] x [x.sub.j]) in objective function (11) by the mixed kernel function [K.sub.mix] ([x.sub.i], [x.sub.j]), we can construct a decision function as follows:

f(x) = [l.summation over (i=1)] ([[bar.[alpha]].sup.*.sub.i] - [[bar.[alpha]].sub.i]) [K.sub.mix] ([x.sub.i] x x) + [bar.b]. (12)

where [bar.b] is calculated in the following way. Select [[bar.[alpha]].sub.j] or [[bar.[alpha]].sup.*.sub.k] in an open interval. If [[bar.[alpha]].sub.j] is selected, then

[bar.b] = [y.sub.j] - [l.summation over (i=1)] ([[bar.[alpha]].sup.*.sub.1] - [[bar.[alpha]].sub.i]) [K.sub.mix] ([x.sub.i], [x.sub.j]) + [epsilon]. (13)

If [[bar.[alpha]].sup.*.sub.k] is selected, then

[bar.b] = [y.sub.k] - [l.summation over (i=1)] ([[bar.[alpha]].sup.*.sub.i] - [[bar.[alpha]].sub.i]) [K.sub.mix] ([x.sub.i], [x.sub.j]) - [epsilon]. (14)

4.2.2. Predictive Output Function. Suppose the sample data set of support vector regression is D = {([x.sub.i], [y.sub.i]) | i [member of] I}, where I = {1, 2, ..., N} is the index set and [y.sub.i] is the objective vector of the data. Divide the sample data into k groups by the k-fold cross validation method; that is,

[D.sub.j] = {([x.sub.i], [y.sub.i]) | i [member of] [I.sub.j]}, (15)

where j [member of] {1, 2, ..., k}, [I.sub.1] [union] [I.sub.j] [union] ... [union] [I.sub.k] = I, [D.sub.1] [union] [D.sub.2] [union] ... [union] [D.sub.k] = D. In each iteration randomly choose a group of data [D.sub.p] as the prediction and the remaining k-1 groups as the training database. Given the initial parameter [[gamma].sub.0], we use LIBSVM [26] to train the support vector regression. Suppose the training result is [mathematical expression not reproducible]. Then, the decision function is

[mathematical expression not reproducible], (16)

where [mathematical expression not reproducible].

Substitute [D.sub.p] into (16) and we can derive the prediction output of [D.sub.p] as follows:

[mathematical expression not reproducible]. (17)

Choose [D.sub.i], i [member of] {1, 2, ..., k} as the prediction group and the remaining groups [D.sub.1], ..., [D.sub.i-1], [D.sub.i+1], ..., [D.sub.k] as the training groups. After k-fold cross validation regression prediction, all data in sample data set D has one and only one prediction output. Therefore, for parameter vector [gamma], we can define a prediction output function as follows:

[??] = h ([gamma]). (18)

4.2.3. The Establishment of Parameter Filter Model. In this paper, the kernel function weighted fusion coefficients pi, p2, the local kernel function parameters, the parameters of the global kernel function, and the penalty parameter C are combined together as the support vector regression parameters denoted as [gamma]. For convenience, let [k.sub.1], [k.sub.2] be the kernel parameters of local kernel function and the kernel parameters of the global kernel function, respectively. The selection of the whole parameter can be used as a filter estimation problem for a nonlinear dynamic system. The establishment of parameters of nonlinear system is as follows:

[gamma](k) = [gamma](k-1) + w(k) (19)

y(k) = h(y(k)) + v (k), (20)

where [gamma](k) is an n-dimensional state vector parameter; y(k) is the output observation. Process noise w(k) and observation noise v(k) are both Gaussian white noise sequences with zero mean and known variances Q and R.

Because solving the optimal kernel parameters can be considered a fixed constant for a specific practical object, we can establish the linear state equation with respect to the parameters described in formula (19). For any state vector, all primitive data has a predictive output after being trained and predicted by LIBSVM, so a nonlinear observation equation can be established as formula (20). For operation of the 5th-degree cubature Kalman filtering algorithm, artificial process noise and observation noise need to be added to the system model.

4.3. Parameter Selection Method for Support Vector Regression Based on Adaptive Fusion of Mixed Kernel Function. In this section, the method for selecting model parameters of support vector regression and the specific steps of the proposed algorithm are described. The design of the parameter adjustment system is shown in Figure 2.

ALGORITHM 1: The detailed algorithm steps of the parameter selection method for support vector regression based on adaptive fusion of mixed kernel function. The parameter selection method for support vector regression based on adaptive fusion of mixed kernel function Initialization: (1) For original data set D, select the mixed kernel function, and set the initial parameter state value [[gamma].sub.0]. (2) Divide D into k groups by using k fold cross validation method denoted by [D.sub.1], [D.sub.2], ..., [D.sub.k]. While (Parameter state value does not meet the set conditions) do Time update process: (3) Calculate weights [[bar.w].sub.s1], [[bar.w].sub.s2], [w.sub.1], [w.sub.2] using formulas (6)-(7). Measurement update process: (4) Decompose one step prediction error covariance matrix [P.sub.k|k-1] and evaluate the cubature point [[xi].sub.i] according to formula (14) in reference [19]. (5) Train the data set based on the LIBSVM algorithm to obtain the final prediction output. (6) Combining predict [??], compute one step prediction by using formula (12). (7) Use formula (8)-(14) of reference [19] to implement the subsequent measurement update. End while End

First, the k-fold cross validation method is used to divide the original data set into k groups. Select the local kernel function and the global kernel function to determine the mixed kernel function. Train this data set with k sub-LIBSVM based on the mixed kernel function. Then, the predictive output is input to the 5th-degree Kalman filter. All parameters of the model are used as the state vector of the system; thus the selection of model parameters is a nonlinear state estimation problem.

In parameter system model (19) and (20), the real value of observation vector y(k) in each iteration is unchanged. The observation vector is the original sample data of the target value vector y(k) = [([y.sub.1], [y.sub.2], ..., [y.sub.N]).sup.T]. We can make the optimal state estimation for the parameter state vector [gamma] according to the observation vectors of real values and predicted output values to obtain a minimum variance between the real values y(k) and predicted output values [??]. 5th-degree cubature Kalman filter algorithm is used to estimate [gamma]. The algorithm of the parameter selection method for support vector regression based on adaptive fusion of mixed kernel function includes two processes: the time update process and the measurement update process as shown in Algorithm 1.

The proposed algorithm combines the mixed kernel function fusion coefficient, the kernel parameter, and penalty parameter C as the state parameter vector then obtains the predicted output of the data set using the k-fold cross validation method based on LIBSVM. Finally, calculate the optimal parameter state vector iteratively by the 5th-degree cubature filtering algorithm. The goal of the proposed algorithm in this paper is to search the optimal state vector [gamma] recursively and obtain the minimum error variance between the real target value of the sample y(k) and the predictive output of the support vector regression [??].

5. Algorithm Analysis

The value of the kernel function is influenced by the data points that are close to each other in the algorithm of support vector regression based on local kernel function, while in the support vector regression based on local kernel function algorithm the data points that are far from each other have an effect on the value of the kernel function. Using only the global kernel function or local kernel function has some limitations in solving practical problems. It often cannot accurately describe the characteristics of the actual sample. Thus, it leads to poor performance of the support vector regression. The mixed kernel function contains two different types of kernel function, the local kernel function and global kernel function. This new kernel function can greatly improve the actual sample properties. The support vector regression based on the mixed kernel function has good learning ability and good generalization ability. However, choosing a mixed kernel function coefficient fusion remains a difficult problem. Prior knowledge of experts and simple cross validation operation are commonly used, but these methods cannot achieve high performance support vector regression.

In this paper, we use a combination of parameters to construct the kernel function. The parameters of the local kernel function, the global kernel function, and the penalty parameter C together form the parameters of the support vector regression. Thus, the mixed kernel function can accurately describe the actual samples according to the specific characteristics of the sample, by adjusting the weighted fusion coefficient of the local kernel function and the global kernel function. The support vector regression based on the mixed kernel function has stronger extrapolation ability due to the highly integrated radial and spherical integral method applied to 5th-degree cubature Kalman filter algorithm. This algorithm has higher parameter estimation accuracy, compared with the UKF algorithm and 3rd-degree cubature Kalman filter algorithm. Therefore, the estimated result of the parameter state vector is more accurate, and the parameters brought into the support vector regression are better, and the predictability of support vector regression is better. We can understand the proposed support vector regression parameter adjustment algorithm in another way. All the state parameter vectors can be regarded as the kernel parameters of the mixed kernel function, including the combination parameter, the parameters of the local kernel function, the parameters of the global kernel function, and the penalty parameter C. The state estimation of the parameter vector is performed based on the 5th-degree cubature Kalman filter with high precision, and the optimal kernel parameter values of the support vector regression are obtained.

6. Simulation Example

6.1. Subjects. We selected the experimental data of rolling bearings for Electrical Engineering Laboratory of Case Western Reserve University for analysis and verification [27]. The measured rolling bearing type is SKF6205. Single point faults are, respectively, arranged on the driving end, the bearing outer ring, the inner ring, and the rolling body with spark erosion technique. The fault depth is 0.2794 mm and diameter 0.1778 mm. The number of balls is 9. The rolling bearing works under four states, including normal, inner ring fault, outer ring fault, and rolling body fault. Acceleration sensors are used to measure vibration signals with the traditional sampling method of signal acquisition and sampling frequency 12 kHz. The data obtained is shown in Figure 3.

6.2. Feature Extraction. The study on extraction is used to represent the fault state characteristic parameters of fault size with 12, as shown in Table 2. We take 50 groups of data from drive end of the vibration data directly. And each set of data has intercepted 4096 sample points. Then, calculate the characteristic parameters of each data.

6.3. Algorithms Comparison. In this simulation, the local kernel function of the mixed kernel function is the RBF kernel function, and the global kernel function is the sigmoid kernel function. The parameter vector of the mixed kernel function is [gamma] = [[p1, p2, [sigma], [lambda], [phi], C].sup.T], while the parameter vector based on the kernel function of a single RBF is [gamma] = [[[sigma], C].sup.T]. In order to illustrate the effectiveness of the proposed algorithm, support vector regression algorithm for single RBF kernel function based on genetic algorithm (RBF-GA-SVR), support vector regression algorithm for single RBF kernel function based on UKF algorithm (RBF-UKF-SVR), support vector regression algorithm for single RBF kernel function based on CKF algorithm (RBF-CKF-SVR), mixed kernel function based on UKF algorithm (MKF-UKF-SVR), and support vector regression algorithm for mixed kernel function based on CKF algorithm (MKF-CKF-SVR) are compared. The prediction results for four states are shown as Figures 4-7.

For the convenience of description, we have the following simplified definitions about the filtering algorithms:

Algo1: MKF-5th-degree-CKF-SVR algorithm

Algo2: MKF-UKF-SVR algorithm

Algo3: RBF-5th-CKF-SVR algorithm

Algo4: RBF-UKF-SVR algorithm

Algo5: RBF-GA-SVR algorithm.

In order to show the robustness of the proposed method in front of typical noisy levels, we chose three different levels of noise to experiment with R1 = 0.1, R2 = 0.3, and R3 = 0.5, respectively. The prediction results of these algorithms are shown in Figures 8-10. Kernel parameter estimation results and the prediction error results are shown in Tables 2 and 3, respectively.

From the simulation results in Figures 4-7 and Tables 3 and 4, it can be seen that the kernel parameter of Algo5 is larger, so it has poor generalization ability and larger prediction error. Compared with the Algo5, the Algo4 algorithm has higher prediction accuracy. This is mainly due to the use of the filtering framework to estimate the kernel parameters. Because of the high accuracy of 5th cubature Kalman filter algorithm to estimate the parameters, kernel parameter values of Algo3 are smaller, and the predictive ability is better than the Algo5 algorithm and Algo4 algorithm. But its accuracy is lower than Algo2 due to the effect of the mixed kernel. The proposed Algo1 algorithm characterizes the sample information by using local and global kernel function information. The Algo1 algorithm has stronger generalization ability, and the prediction error is the least. From the perspective of the fusion coefficient, coefficient of the local kernel function RBF is 0.902. This is mainly because the actual sample prefers to use the local kernel function, but compared with the Algo3, the predicted accuracy of Algo1 algorithm is greatly improved. This is the key role played by the global kernel function in the mixed kernel function, which makes it more complete and accurate to describe actual sample information.

From Figures 8, 9, and 10, the kernel parameter estimation error becomes larger with the increasing noise levels. It is normal that the large noise influences the accuracy of the nonlinear filter. But the estimated errors are within the allowable range.

7. Conclusions

The proposed algorithm parameter selection method for support vector regression based on adaptive fusion of mixed kernel function combines the mixed kernel function fusion coefficient, kernel function parameter, and regression parameters together as the parameters of the state vector and obtains the predicted output of the original data set based on LIBSVM. The fusion coefficients are adaptively adjusted by the 5th-degree Kalman filter, and the kernel function parameters and the regression parameters are selected by using the estimated parameters values. Finally, the prediction of the bearing fault diagnosis experiment shows that the kernel function and the parameters based on the method proposed in this paper can obtain stronger generalization ability of the support vector regression and higher prediction accuracy.

https://doi.org/ 10.1155/2017/3614790

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61403229) and Natural Science Foundation of Zhejiang Province (LY13F030011).

References

[1] A. Widodo and B. Yang, "Support vector machine in machine condition monitoring and fault diagnosis," Mechanical Systems and Signal Processing, vol. 21, no. 6, pp. 2560-2574, 2007.

[2] M. Kang, J. Kim, A. C. Tan, E. Y. Kim, and B. Choi, "Reliable fault diagnosis for low-speed bearings using individually trained support vector machines with kernel discriminative feature analysis," IEEE Transactions on Power Electronics, vol. 30, no. 5, pp. 2786-2797, 2015.

[3] R. Yan, R. X. Gao, and X. Chen, "Wavelets for fault diagnosis of rotary machines: a review with applications," Signal Processing, vol. 96, pp. 1-15, 2013.

[4] V. N. Vapnik, Statistical Learning Theory, Adaptive and Learning Systems for Signal Processing, Communications, and Control, Wiley- Interscience, New York, NY, USA, 1998.

[5] P. Refaeilzadeh, L. Tang, and H. Liu, Cross-validation, Springer-US, 2009.

[6] Z. Shao, M. J. Er, and N. Wang, "An effective semi-cross-validation model selection method for extreme learning machine with ridge regression," Neurocomputing, vol. 151, no. 2, pp. 933-942, 2015.

[7] K. Sopyla and P. Drozda, "Stochastic gradient descent with Barzilai-Borwein update step for SVM," Information Sciences, vol. 316, pp. 218-233, 2015.

[8] T. Villmann, S. Haase, and M. Kaden, "Kernelized vector quantization in gradient-descent learning," Neurocomputing, vol. 147, no. 1, pp. 83-95, 2015.

[9] W. Froelich and J. L. Salmeron, "Evolutionary learning of fuzzy grey cognitive maps for the forecasting of multivariate, interval-valued time series," International Journal of Approximate Reasoning, vol. 55, no. 6, pp. 1319-1335, 2014.

[10] O. J. H. Bosch, N. C. Nguyen, T. Maeno, and T. Yasui, "Managing Complex Issues through Evolutionary Learning Laboratories, " Systems Research and Behavioral Science, vol. 30, no. 2, pp. 116-135, 2013.

[11] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet, "More efficiency in multiple kernel learning," in Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 775-782, June 2007.

[12] J. R. Lee, P. Raghavendra, and D. Steurer, "Lower bounds on the size of semidefinite programming relaxations," in Proceedings of the 47th Annual ACM Symposium on Theory of Computing, STOC 2015, pp. 567-576, June 2015.

[13] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, "A practical guide to support vector classification," Bioinformatics, vol. 1, no. 1, 2003.

[14] D. Whitley, "An executable model of a simple genetic algorithm, " Foundations of Genetic Algorithms, vol. 1519, pp. 45-62, 2014.

[15] F. Kuang, S. Zhang, Z. Jin, and W. Xu, "A novel SVM by combining kernel principal component analysis and improved chaotic particle swarm optimization for intrusion detection, " Soft Computing, vol. 19, no. 5, pp. 1187-1199, 2015.

[16] T. Mu and A. K. Nandi, "Automatic tuning of L2-SVM parameters employing the extended Kalman filter," Expert Systems with Applications, vol. 26, no. 2, pp. 160-175, 2009.

[17] D. Y. Huang and X. Y. Chen, "A novel approach of model selection for SVR," Journal of Fuzhou University, vol. 39, no. 4, pp. 527-532, 538, 2011.

[18] H. Wang, M. Lv, and L. Zhang, "Parameter optimization of SVR based on DRVB-ASCKF," in Proceedings of the International Conference on Estimation, Detection and Information Fusion, ICEDIF 2015, pp. 141-145, January 2015.

[19] B. Jia, M. Xin, and Y. Cheng, "High-degree cubature Kalman filter," Automatica, vol. 49, no. 2, pp. 510-518, 2013.

[20] Y. Li, Research on Multi-core Learning SVM Algorithm and Identification of Pulmonary Nodules, College of Communication Engineering, Jilin University, Jilin, China, 2014.

[21] X. Lin, J. Yang, and C. Z. Ye, "Face recognition technology based on support vector machine," Journal of Infrared and Laser Engineering, vol. 30, no. 5, pp. 318-322, 2001.

[22] S. G. Pour and F. Girosi, "Joint Prediction of Chronic Conditions Onset: Comparing Multivariate Probits with Multiclass Support Vector Machines," in Proceedings of the Symposium on Conformal and Probabilistic Prediction with Applications, Springer International Publishing, 2016.

[23] C.-J. Hsieh, S. Si, and I. S. Dhillon, "A divide-and-conquer solver for kernel support vector machines," in Proceedings of the 31st International Conference on Machine Learning, ICML 2014, pp. 855-870, June 2014.

[24] X. Huang, The study on Kernel in Support Vector Machine [Ph. D. dissertation], Soochow University, 2008.

[25] C. Cortes and V. Vapnik, "Support-vector networks," Machine Learning, vol. 20, no. 3, pp. 273-297, 1995.

[26] C. Chang and C. Lin, "LIBSVM: a Library for support vector machines," ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, article 27, 2011.

[27] "The experimental data of rolling bearings for Electrical Engineering Laboratory of Case Western Reserve University," 2008, https://case.edu/bulletin/.

[28] T. H. Loutas, D. Roulias, and G. Georgoulas, "Remaining useful life estimation in rolling bearings utilizing data-driven probabilistic E-support vectors regression," IEEE Transactions on Reliability, vol 62, no. 4, pp. 821-832, 2013.

[29] J. P. Jacobs, "Bayesian support vector regression with automatic relevance determination kernel for modeling of antenna input characteristics," Institute of Electrical and Electronics Engineers. Transactions on Antennas and Propagation, vol. 60, no. 4, pp. 2114-2118, 2012.

[30] M. Ghaedi, M. R. Rahimi, A. M. Ghaedi, I. Tyagi, S. Agarwal, and V. K. Gupta, "Application of least squares support vector regression and linear multiple regression for modeling removal of methyl orange onto tin oxide nanoparticles loaded on activated carbon and activated carbon prepared from Pistacia atlantica wood," Journal of Colloid and Interface Science, vol. 461, pp. 425-434, 2016.

[31] T. Benkedjouh, K. Medjaher, N. Zerhouni, and S. Rechak, "Health assessment and life prediction of cutting tools based on support vector regression," Journal of Intelligent Manufacturing, vol. 26, no. 2, pp. 213-223, 2015.

Hailun Wang (1, 2) and Daxing Xu (1, 3)

(1) College of Electrical and Information Engineering, Quzhou University, Quzhou 324000, China

(2) Logistics Engineering College, Shanghai Maritime University, Shanghai 200000, China

(3) Department of Automation, Zhejiang University of Technology, Hangzhou 310023, China

Correspondence should be addressed to Daxing Xu; daxingxu@163.com

Received 30 June 2017; Revised 14 September 2017; Accepted 8 October 2017; Published 2 November 2017

Academic Editor: Yuan Yao

Caption: FIGURE 1: Schematic diagram of support vector regression.

Caption: FIGURE 2: Parameter adjustment structure of support vector regression based on adaptive fusion of the mixed kernel function.

Caption: FIGURE 3: Vibration signals for four different types of faults.

Caption: FIGURE 4: Prediction result for state 1.

Caption: FIGURE 5: Prediction result for state 2.

Caption: FIGURE 6: Prediction result for state 3.

Caption: FIGURE 7: Prediction result for state 4.

Caption: FIGURE 8: Prediction result of Algo1 for R1 = 0.1.

Caption: FIGURE 9: Prediction result of MKF 5th-degree CKF-SVR for R2 = 0.3.

Caption: FIGURE 10: Prediction result of MKF 5th-degree CKF-SVR for R3 = 0.5.

TABLE 1: Four types of kernel functions and their characteristics. Kernel function Characteristics Linear kernel function: It is a special case of the kernel K ([x.sub.i], [x.sub.j]) = function. The parameters are few and [x.sub.i] x [x.sub.j]. the speed is fast [28]. Polynomial kernel function: It is a global kernel function. And K ([x.sub.i], [x.sub.j]) = it becomes a linear kernel function [(([x.sub.i] x [x.sub.j]) when q=1. The greater the value of q, + c).sup.q], the higher the dimension of mapping, where c and q are the and the greater the amount of kernel parameters and computation. When q is too large, the satisfy the condition complexity of the learning machine is c [greater than or also increased. The promotion ability equal to] 0, of the support vector regression is q [member of] N. reduced, and it is easy to introduce the phenomenon of overitting [29]. Gauss kernel function RBF kernel function is a strong local (RBF kernel): kernel function, and the external K ([x.sub.i], [x.sub.j]) = pushing ability decreases with the exp (-[[parallel] [x.sub.i] increase of parameters. Compared with - [x.sub.j] the general kernel functions, Gauss [parallel].sup.2] kernel function only needs to / [[sigma].sup.2]), determine a parameter, and where [sigma] > 0. constructing the kernel function model is relatively simple. Therefore, RBF kernel function is currently the most widely used one [30]. Sigmoid kernel function: The theoretical basis of support K ([x.sub.i], [x.sub.j]) = vector regression determines the tanh ([lambda] ([x.sub.i] x global optimal value of the support [x'.sub.j]) + [phi]), where vector regression rather than the [lambda], [phi] are the local minimum value. It also ensures kernel parameters and that it will not cause an satisfy the condition overlearning phenomenon because of [lambda] > 0, [phi] < 0. good generalization ability of unknown samples [31]. TABLE 2: Sensitive features parameters. Quantity symbol Characteristic index T1 Mean T2 Absolute average T3 Peak T4 Square root amplitude T5 Root mean square value T6 Variance T7 Crooked T8 Kurtosis factor T9 Waveform factor T10 Margin factor T11 Peak factor T12 Pulse factor TABLE 3: Results of parameter estimation. Algorithms Parameter vector [gamma] Algo5 [99.992, 99.999] Algo4 [1.683, 27.723] Algo3 [0.681, 18.230] Algo2 [0.912, 0.088, 1.501, 24.342] Algo1 [0.902, 0.098, 0.708, 1.034, 2.127, 73.364] TABLE 4: Results of sample prediction error. Data Statistical Algorithms indicators Algo1 Algo2 Algo3 Algo4 Algo5 Train MAE 0.0053 0.0073 0.0094 0.0156 0.0200 data RMSE 0.0098 0.0146 0.0198 0.0298 0.0399 SD 0.0060 0.0090 0.0110 0.0179 0.0228 MAE 0.0095 0.0128 0.0171 0.0208 0.0274 Test RMSE 0.0111 0.0152 0.0195 0.0241 0.0319 data SD 0.0110 0.0150 0.0203 0.0252 0.0331 Here, MAE represents mean absolute error, RMSE represents mean square error, and SD represents standard deviation.

Printer friendly Cite/link Email Feedback | |

Title Annotation: | Research Article |
---|---|

Author: | Wang, Hailun; Xu, Daxing |

Publication: | Journal of Control Science and Engineering |

Date: | Jan 1, 2017 |

Words: | 7264 |

Previous Article: | A Novel Multimode Fault Classification Method Based on Deep Learning. |

Next Article: | A New Open Loop Approach for Identifying the Initial Rotor Position of a Permanent Magnet Synchronous Motor. |

Topics: |