# A New Feature Extraction Algorithm Based on Orthogonal Regularized Kernel CCA and Its Application.

1. Introduction

Canonical correlation analysis (CCA) is a technique of multivariate statistical analysis, which deals with the mutual relationships of two sets of variables [1-3]. This method extracts the representative variables which are the linear combination of the variables in each group. The relationships between new variables can reflect the overall relationships between two groups of variables .

The orthogonal regularization canonical correlation analysis (ORCCA) algorithm  is that the original formula of CCA algorithm with orthogonal constraints is substituted for CCA conjugate orthogonalization [6, 7]. When the number of samples is less and the sample distribution patterns of different classifications are different, the ORCCA algorithm has the better ability of classification. A suboptimal solution to eigenvalue decomposition problem can be obtained by introducing two regularization parameters . So, the complexity of time and space for the quadratic optimization problem should be considered at the same time. ORCCA algorithm is the same as CCA algorithm that both their goals look for the linear combinations of the variables in each group. But when the nonlinear relationships between the variables exist, ORCCA algorithm cannot extract effectively the comprehensive variables.

In this paper, the kernel method [9-11] is introduced into ORCCA algorithm, and ORKCCA algorithm is presented. The kernel method maps the linear inseparable data in the low-dimensional space into a higher-dimensional space [12, 13]. In the higher-dimensional space, the characteristics of the data can be extracted and analyzed through the linear method. By introducing kernel function, the computation of the orthogonal regularization canonical correlation analysis extends to a nonlinear feature space. Experimental results show that the accuracies of classification of our method in the nonlinear space are significantly improved. The experimental results show ORKCCA is feasible.

2. Orthogonal Regularized CCA Algorithm

Given n pairs of pairwise samples X = [([x.sub.1], [x.sub.2], ..., [x.sub.n]).sup.T] and Y = [([y.sub.1], [y.sub.2], ..., [y.sub.n]).sup.T], where [x.sub.i] [member of] [R.sup.p], [y.sub.i] [member of] [R.sup.q] (i = 1, 2, ..., n). We assume that the samples have been centered. ORCCA algorithm aims at finding a pair of projection directions a and b which satisfy the following optimal problem .

[mathematical expression not reproducible], (1)

The objective function in Equations (1) can be expanded as follows:

[mathematical expression not reproducible] (2)

where [S.sub.xx] = (1/n)[X.sup.T]X, [S.sub.yy] = (1/n)[Y.sup.T]Y, and [S.sub.xy] = (1/n) [X.sup.T]Y.

The optimal model in Equation (1) can be rewritten as

[mathematical expression not reproducible], (3)

According to the Lagrange multipliers method, Lagrange function is as follows:

[mathematical expression not reproducible], (4)

where both [[lambda].sub.1] and [[lambda].sub.2] are Lagrange multipliers.

The solutions to Equation (4) are given as follows:

[mathematical expression not reproducible], (5)

[mathematical expression not reproducible], (6)

where [I.sub.p] and [I.sub.q] denote identity matrices of size p * p and q * q, respectively.

Both [[lambda].sub.1] and [[lambda].sub.2] in Equations (5) and (6) are called regularization parameters. By solving Equation (5), the eigenvalues [[lambda].sub.1.sup.(1)], [[lambda].sub.1.sup.(2)], ..., [[lambda].sub.1.sup.(p)] and their corresponding eigenvectors [a.sub.1], [a.sub.2], ..., [a.sub.p] can be obtained. The eigenvalues [[lambda].sub.1.sup.(2)], [[lambda].sub.1.sup.(2)], ..., [[lambda].sub.2.sup.(q)] and their corresponding eigenvectors [b.sub.1], [b.sub.2], ..., [b.sub.q] can be obtained from Equation (6).

3. Orthogonal Regularized Kernel CCA Algorithm (ORKCCA)

ORCCA algorithm can give the linear relationships between two groups of random variables. But if the linear relationships between two groups of random variables do not exist, the performance of ORCCA will not work well. The kernel method is an effective way to analyze the nonlinear pattern problem. So, the kernel method is introduced into ORCCA algorithm, and ORKCCA algorithm is proposed.

Both [[PHI].sub.x] and [[PHI].sub.y] are nonlinear mappings which map original random variables [x.sub.i] and [y.sub.i] into [[PHI].sub.x]([x.sub.i]) and [[PHI].sub.y] ([y.sub.i]) in P-dimensional space [F.sub.x](P > p) and Q-dimensional space [F.sub.y] (Q > q), i = 1, 2, ..., n. Let a = [[PHI].sub.x] [(X).sup.T]a, b = [[PHI].sub.y] [(Y).sup.T][beta], where [[PHI].sub.x] (X) [member of] [R.sup.nxP], [[PHI].sub.y] (Y) [member of] [R.sup.nxQ] [alpha], [beta] [member of] [R.sup.n].

ORCCA is implemented in higher-dimensional spaces [F.sub.x] and [F.sub.y]. So, Equation (7) can be obtained by substituting a, b, [[PHI].sub.x] ([x.sub.i]), and [[PHI].sub.y] ([y.sub.i]) into Equation (1) as follows:

[mathematical expression not reproducible]. (7)

Expanding the objective function in Equation (7), we get

[mathematical expression not reproducible]. (8)

Applying the kernel trick to Equation (8), [K.sub.x] and [K.sub.y] [member of] [R.sup.nxn] can be computed, namely, [K.sub.x = [[PHI].sub.x] (X) [[PHI].sub.x] [(X).sup.T] = [([[PHI].sub.x][([x.sub.i]).sup.T][[PHI].sub.x]([x.sub.j])).sub.nxn] = [(k ([x.sub.i], [x.sub.j]).sub.nxn] [K.sub.y] = [[PHI].sub.y](Y)[[PHI].sub.y][(Y).sup.T] = [([[PHI].sub.y] [([y.sub.i]).sup.T] [[PHI].sub.y] ([y.sub.j])).sub.nxn] = [(k([y.sub.i], [y.sub.j])).sub.nxn], where k(*,*) is kernel function. Centralization is exerted on [K.sub.x] and [K.sub.y]. The optimal model in which the kernel method is introduced can be given by using Equation (9):

[mathematical expression not reproducible], (9)

where [M.sub.x,y] = (1/n)[K.sup.T.sub.x][K.sub.y], [M.sub.xx] = (1/n)[K.sup.T.sub.x][K.sub.x], and [M.sub.yy] = 1/n[K.sup.T.sub.y][K.sub.y].

According to the Lagrange multiplier method, the Lagrange function is as follows

[mathematical expression not reproducible], (10)

where [[zeta].sub.1] and [[zeta].sub.2] are Lagrange multipliers. Taking the partial derivatives of L' ([alpha], [beta]) with respect to [alpha] and [beta] and letting them zero, we get

[mathematical expression not reproducible], (11)

where [M.sub.xx] and [M.sub.yy] are positive semidefinite matrices and [[zeta].sub.1] and [[zeta].sub.2] are positive numbers.

So, [alpha] and [beta] can be obtained from Equation (11):

[alpha] = [([M.sub.xx] + [[zeta].sub.1][I.sub.p]).sup.-1][M.sub.xy][beta], (12)

[beta] = [([M.sub.yy] + [[zeta].sub.2][I.sub.Q]).sup.-1][M.sup.T.sub.xy][alpha], (13)

where [I.sub.P] and [I.sub.Q] are the identity matrices of size P * P and Q * Q, respectively.

Equations (14) and (15) can be obtained through replacing [alpha] and [beta] with their expressions in Equations (12) and (13), respectively.

[mathematical expression not reproducible], (14)

[mathematical expression not reproducible]. (15)

As like before, both [[lambda].sub.1] and [[lambda].sub.2] in Equations (14) and (15) are called regularization parameters. By solving Equation (14), the eigenvalues [[zeta].sup.(1).sub.1], [[zeta].sup.(2).sub.1], ..., [[zeta].sup.(n).sub.1] and their corresponding eigenvectors [[alpha].sub.1], [[alpha].sub.2], ..., [[alpha].sub.P] can be obtained. The eigenvalues [[zeta].sub.(1).sub.2], [[zeta].sub.(2).sub.2], ..., [[zeta].sub.(Q).sub.2] and their corresponding eigenvectors [[beta].sub.1], [[beta].sub.2], ..., [[beta].sub.Q] can be obtained from Equation (15).

4. Simulation Experiments

In this section, we evaluate our method compared with ORCCA on artificial and handwritten numerals databases.

4.1. Experiment on Artifical Databases. The pairwise samples X and Y are generated from the expressions in Equations (16) and (17), respectively.

[mathematical expression not reproducible], (16)

[mathematical expression not reproducible], (17)

where [theta] obeys uniform distribution on [-[pi], [pi]] and [[epsilon].sub.1] and [[epsilon].sub.2] are Gaussian noise with standard deviation 0.05. The radial basis function k(x,y) = exp(-[[absolute value of x-y].sup.2]/2[[sigma].sup.2]) is chosen as kernel function, where [sigma] = 1.0.

4.1.1. Determining Regularization Parameters. For the selection of the regularization parameters, by far there is no reliable method to determine the optimal values. In this paper, in order to simplify the calculation, let [lambda] = [[lambda].sub.1] = [[lambda].sub.2] and [zeta] = [[zeta].sub.1] = [[zeta].sub.2]. The regularization parameters were chosen from [10.sup.-5], [10.sup.-4], [10.sup.-3], [10.sup.-2], [10.sup.-1], and 1. This method is used in the literature .

According to Equations (16) and (17), 100 pairs of data are randomly generated as the training samples. Canonical variables are calculated from the ORCCA and ORKCCA algorithms for the different values of regularization parameters. The correlation coefficients of canonical variables are sorted by the descending order. Many pairs of canonical variables can be gained from the two algorithms. For the sake of simplicity, the most representative of the former two groups of canonical variables are examined.

The average value of the correlation coefficients of the former two groups of canonical variables is regarded as criterion that judges the regularization parameters is good or not. The larger the average value is, the better the regularization parameters are.

Table 1 lists the average value of the correlation coefficients of the former two groups of canonical variables for the different values of the regularization parameters.

Table 1 shows that the optimal values of the regularization parameters for the ORCCA and ORKCCA algorithms are [10.sup.-3] and [10.sup.-1], respectively. The optimal regularization parameters are used to perform simulations in the next section.

4.1.2. Simulation Experiment 1. According to Equations (16) and (17), 200 pairs of data are randomly generated as the test samples. For the regularization parameters [lambda] = [10.sup.-3] and [zeta] = [10.sup.-1] in the ORCCA and ORKCCA algorithms, the canonical variables are obtained for test samples, respectively. The correlation coefficients of the canonical variables are sorted in the descending order.

Tables 2 and 3 list the correlation coefficients of the first two groups of canonical variables for ORCCA and ORKCCA algorithms. [u.sub.1] and [v.sub.1] denote the first group of canonical variables. [u.sub.2] and [v.sub.2] are the second group of canonical variables.

The experimental results in Tables 2 and 3 show that the correlationships between the same pair of the canonical variables are better than that between the different pairs of canonical variables, especially for nonlinear data.

4.1.3. Simulation Experiment 2. According to Equations (16) and (17), 5 pairs of data are randomly generated as the sample data. Each pair of sample data represents the center data of each class. 100 pairs of data for each class are given by adding Gaussian noise with standard deviation of 0.05 to each class center data. So we have five class data, which contains 100 samples for each class.

100, 175, and 250 pairs of data are chosen from the 500 pairs of the whole data as the training samples, respectively. The rest 400, 325, and 250 pairs of data are the test samples, respectively. The classification experiments based on K-neighbors algorithm are carried out on the test samples data which are preprocessed in the above way. And, the accuracies of classification are given. For the test samples with 400, 325, and 250 pairs of data, the experiments are performed 15 times, respectively. The accuracies of classification for 400, 325, and 250 pairs of data are the averages of the accuracies of classification for the 15 experiments results, respectively. Table 4 gives the accuracies of classification for ORCCA and ORKCCA for the test samples with the different number.

In Table 4, the first column is the numbers of the training samples and the second column and the third column are the accuracies of classification for ORCCA and ORKCCA for the training samples with the different number. The experimental results show that the accuracies of classification for ORKCCA are higher than those for ORCCA. So, the performance of ORKCCA outperforms that of ORCCA for the nonlinear problem. The comparison curves of the accuracies of classification for ORCCA and ORKCCA are given in Figure 1.

4.2. Experiments on Handwritten Numerals Databases. The Concordia University CENPARMI database of handwritten Arabic numerals have 10 classes, that is, 10 digits (from 0 to 9), and 600 samples for each. The first 400 samples are used as the training set, and the remaining samples as the test set in each class. Then, the training samples and the test samples are 4000 and 2000, respectively. The handwritten digital images are preprocessed by the method given in . Four kinds of features are extracted as follows: [X.sup.G] (256-dimensional Gabor transformation feature), [X.sup.L] (121-dimensional Legendre moment feature), [X.sup.P] (36-dimensional Pseudo-Zernike moment feature), and [X.sup.Z] (30-dimensional Zernike moment feature).

For the choice of the regularization parameters, let [lambda] = [[lambda].sub.1] = [[lambda].sub.2] and [zeta] = [[zeta].sub.1] = [[zeta].sub.2]. The regularization parameters were chosen from [10.sup.-5], [10.sup.-3], and 1. The results of our method are compared with the results of ORCCA in order to verify the effectiveness of ORKCCA. Table 5 lists the accuracies of classification for ORCCA and ORKCCA in different feature combinations and regularization parameters. Experimental results show that (1) the classification effect of the two methods is the best as the regularization parameter is 1; (2) the classification accuracies of ORKCCA are higher than that of ORCCA for different features combinations; (3) the classification accuracies of ORKCCA in the regularization parameters [10.sup.-5] and [10.sup.-3] are higher than those of ORCCA in the regularization parameters 1.

5. Conclusions

An orthogonal regularized kernel CCA algorithm for nonlinear problem is presented. By introducing the kernel function, our proposed algorithm is more suitable for solving nonlinear problem. Contrast experiments of ORCCA and ORKCCA are performed on artificial and handwritten numerals databases. Experimental results show that the proposed method outperforms ORCCA for the correlation coefficients of canonical variables and the accuracies of classification on the test data. The experimental results show ORKCCA is feasible.

https://doi.org/10.1155/2018/8745251

Data Availability

The experiments in paper were performed by the author Xi 2 years ago. Some troubles happened to his computer. The data can not be gotten from his computer. I'm sorry that the data is unable to be provided.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors are grateful to the support of the Hainan Provincial Natural Science Foundation (117150) and the Scientific Research Foundation of Hainan Tropical Ocean University (RHDXB201624).

References

 Q. H. Ran, Z. N. Shi, and Y. P. Xu, "Canonical correlation analysis of hydrological response and soil erosion under moving rainfall," Journal of Zhejiang University-SCIENCE A, vol. 14, no. 5, pp. 353-361, 2013.

 B. K. Sarkar and C. Chakraborty, "DNA pattern recognition using canonical correlation algorithm," Journal of Biosciences, vol. 40, no. 4, pp. 709-719, 2015.

 R. R. Sarvestani and R. Boostani, "FF-SKPCCA: kernel probabilistic canonical correlation analysis," Applied Intelligence, vol. 46, no. 2, pp. 438-454, 2016.

 E. Sakar, H. Unver, S. Keskin, and Z. M. Sakar, "The investigation of relationships between some fruit and kernel traits with canonical correlation analysis in ankara region walnuts," Erwerbs-Obstbau, vol. 58, no. 1, pp. 19-23, 2015.

 S. D. Hou and Q. S. Sun, "An orthogonal regularized CCA learning algorithm for feature fusion," Journal of Visual Communication and Image Representation, vol. 25, no. 5, pp. 785-792, 2014.

 X. Shen and Q. Sun, "Orthogonal multiset canonical correlation analysis based on fractional-order and its application in multiple feature extraction and recognition," Neural Process Letters, vol. 42, no. 2, pp. 301-316, 2015.

 Y. H. Yuan, Y. Li, X. B. Shen, Q. S. Sun, and J. L. Yang, "Laplacian multiset canonical correlations for multiview feature extraction and image recognition," Multimedia Tools and Applications, vol. 76, no. 1, pp. 731-755, 2015.

 X. Xing, K. Wang, T. Yan, and Z. Lv, "Complete canonical correlation analysis with application to multi-view gait recognition," Pattern Recognition, vol. 50, pp. 107-117, 2016.

 H. Joutsijoki and M. Juhola, "Kernel selection in multi-class support vector machines and its consequence to the number of ties in majority voting method," Artificial Intelligence Review, vol. 40, no. 3, pp. 213-230, 2013.

 S. Wang, Z. Deng, F. L. Chung, and W. Hu, "From Gaussian kernel density estimation to kernel methods," International Journal of Machine Learning and Cybernetics, vol. 4, no. 2, pp. 119-137, 2013.

 X. Chen, R. Tharmarasa, T. Kirubarajan, and M. Mcdonald, "Online clutter estimation a Gaussian kernel density estimator for multitarget tracking," Radar Sonar and Navigation IET, vol. 9, no. 1, pp. 1-9, 2014.

 O. Taouali, I. Jaffel, H. Lahdhiri, M. F. Harkat, and H. Messaoud, "New fault detection method based on reduced kernel principal component analysis (RKPCA)," International Journal of Advanced Manufacturing Technology, vol. 85, no. 5, pp. 1547-1552, 2016.

 K. Yoshida, J. Yoshimoto, and K. Doya, "Sparse kernel canonical correlation analysis for discovery of nonlinear interactions in high-dimensional data," BMC Bioinformatics, vol. 18, no. 1, pp. 108-118, 2017.

 Z. Hu, Z. Lou, J. Yang, K. Liu, and C. Suen, "Handwritten digital recognition based on multi-classifier combination," Chinese Journal Computers, vol. 22, no. 4, pp. 369-374, 1999.

Xinchen Guo, (1) Xiuling Fan, (2) Xiantian Xi, (3) and Fugeng Zeng [ID] (1)

(1) College of Ocean Information Engineering, Hainan Tropical Ocean University, Sanya, China

(2) Henan Xuehang Education and Information Service Co., Zhengzhou, China

(3) Shanghai Renyi Technology Co., Ltd., Shanghai, China

Correspondence should be addressed to Fugeng Zeng; zengfugeng@foxmail.com

Received 3 April 2018; Revised 21 July 2018; Accepted 23 August 2018; Published 29 October 2018

Caption: Figure 1: Comparison curves of the accuracies of classification for ORCCA and ORKCCA.
```Table 1: The mean values of the correlation coefficients of the
former two groups of canonical variables for the different values of
the regularization parameters from ORCCA and ORKCCA.

Mean values of the
correlation coefficients

Regularization
parameters             ORCCA                ORKCCA

10-5                    0.74                 0.80
10-4                    0.77                 0.84
10-3                    0.82                 0.89
10-2                    0.81                 0.92
10-1                    0.79                 0.93
1                       0.80                 0.92

Table 2: The correlation coefficients of the first two groups
of canonical variables for ORCCA.

[v.sub.1]   [v.sub.2]

[u.sub.1]                0.58        0.14
[u.sub.2]                0.21        0.36

Table 3: The correlation coefficients of the first two groups
of canonical variables for ORkCCA.

[v.sub.1]   [v.sub.2]

[u.sub.1]                0.91        0.08
[u.sub.2]                0.06        0.83

Table 4: Comparisons of the accuracies of classification
for ORCCA and ORKCCA.

Numbers of the     ORCCA(%)   ORKCCA (%)
training samples

100                  65.0        73.1
175                  72.4        77.8
250                  76.5        83.6

Table 5: Comparisons of the accuracies of classification for ORCCA
and ORKCCA in different feature combinations and regularization
parameters.

ORCCA

Feature
combinations          [lambda] = [10.sup.-5]   [lambda] = [10.sup.-3]

[X.sup.G]-[X.sub.L]           0.9314                   0.9300
[X.sup.G]-[X.sup.P]           0.9230                   0.9228
[X.sup.G]-[X.sup.Z]           0.9180                   0.9196
[X.sup.L]-[X.sup.P]           0.9187                   0.9187
[X.sup.L]-[X.sup.Z]           0.9200                   0.9205
[X.sup.P]-[X.sup.Z]           0.7413                   0.7413

ORCCA              ORKCCA

Feature
combinations          [lambda] = 1    [zeta] = [10.sup.-5]

[X.sup.G]-[X.sub.L]      0.9375              0.9625
[X.sup.G]-[X.sup.P]      0.9248              0.9511
[X.sup.G]-[X.sup.Z]      0.9196              0.9482
[X.sup.L]-[X.sup.P]      0.9190              0.9500
[X.sup.L]-[X.sup.Z]      0.9235              0.9574
[X.sup.P]-[X.sup.Z]      0.7525              0.8436

ORKCCA

Feature
combinations          [zeta] = [10.sup.-3]    [zeta] = 1

[X.sup.G]-[X.sub.L]          0.9681             0.9687
[X.sup.G]-[X.sup.P]          0.9525             0.9536
[X.sup.G]-[X.sup.Z]          0.9518             0.9520
[X.sup.L]-[X.sup.P]          0.9533             0.9545
[X.sup.L]-[X.sup.Z]          0.9600             0.9615
[X.sup.P]-[X.sup.Z]          0.8450             0.8450
```