# Discriminant parallel feature fusion based on maximum margin criterion for pattern classification.

ABSTRACT: In this paper, we propose a novel parallel feature fusion
based maximum margin criterion, namely discriminant parallel feature
fusion, for pattern classification. The advantage of algorithm lies in:
1) A constrained optimization problem based on maximum margin criterion
is created to solve the optimal fusion coefficients, which causes that
fused data has the largest class discriminant in the fused feature
space. 2) An unique solution of optimization problem is transformed to
an eigenvalue problem, which causes the proposed fusion strategy to
perform a consistent performance. Besides of the detailed theory
derivation, many experimental evaluations also are presented in this
paper.

Categories and Subject Descriptions I.5.2 [Pattern Recognition]; Classifier design and evaluation: H.5 [Information interfaces and presentation]

General Terms Pattern classification, Feature extraction

Keywords: Parallel feature fusion, maximum margin criterion, discriminant parallel feature fusion

Categories and Subject Descriptors I.5.2 [Pattern Recognition]; Classifier design and evaluation: H.5 [In

1. Introduction

The procedure of pattern classification is consisted of two procedures, feature extraction and classification. In the past research works, many feature extraction methods are developed. Among various feature extraction methods, the dimensionality reduction technique is exciting since the low-dimensional feature representation with high discriminatory power is very important for facial feature extraction, such as principal component analysis (PCA) and linear discriminant analysis (LDA) [1],[2]. Although successful in many cases, these linear methods cannot provide reliable and robust solutions to those face recognition problems with complex face variations since the distribution of face images under a perceivable variation in viewpoint, illumination or facial expression is highly nonlinear and complex. Recently, researchers applied kernel machine techniques to solve the nonlinear problem successfully [3], [4], [5], and accordingly some kernel based methods are developed for face recognition [6], [7], [8]. Followed by feature extraction, the classification criterion is applied to classify the sample. But if many features are extracted for classification, then how to perform a wonderful classification based on the multiple features becomes a crucial problem for pattern classification problem when multiple features are considered. As a very efficient method, data fusion is applied to solve it, which has been widely applied in many areas [9], [10], [11]. Existing fusion methods can be divided into the following three schemes: the first scheme is to integrate all assimilated multiple features into a final decision directly; the second is to combine the individual decisions made by every feature into a global final decision; and the third is to fuse the multiple features to one new feature for classification. In this paper, we devote our attention to the third fusion scheme, i.e., feature fusion. Recently many feature fusion methods for pattern classification were proposed in the lectures [12], [13], [14].

In this paper, we focus on the linear combination fusion, but pay more attention to how to find the fusion coefficients, and propose so called discriminant feature fusion for supervising learning. The proposed discriminant fusion strategy has two advantages: 1) fused data has the largest class discriminant owing to obtaining the fusion coefficients by solving a constrained optimization problem created in the average margin criterion; 2) fusion coefficients are unique owing to they are equal to the elements of the eigenvector of one eigenvalue problem transformed by the above optimization problem.

This paper is organized as follows. In Section 2, the detailed theory derivation is introduced, and results and conclusion are given in Section 3 and 4, respectively.

2. Discriminant parallel feature fusion

In this section, firstly we introduce the basic idea of discriminant parallel feature fusion briefly, and then emphasize the theory derivation of seeking the fusion coefficients in detailed.

Given a sample set {[x.sub.ij]} ( i = 1, 2, ... , C j = 1,2, ... ,[n.sub.i]), and multiple feature sets {[y.sup.m.sub.ij]} (i = 1,2, ... , C j = 1,2 ... , [n.sub.i], m = 1,2, ... ,M), where M denotes the number of multiple features sets, the fused feature with the linear combination can be described as follows.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (1)

where [a.sub.m] (m = 1,2, ... ,M) (m = 1,2, ..., M)) and zij (i = 1,2, ..., C, j = 1,2, ..., [n.sub.i])note the combination fusion coefficients and the fused feature respectively.

Now we focus how to obtain [a.sub.m](m = 1,2, ... M), and our goal is to find such fusion coefficients that they are unique and cause the largest class discriminant in the fused feature space. For supervised learning, we can calculate the average margin distance between two classes [C.sup.l.sub.p] and [C.sup.l.sub.q] in fused feature space R consisted of the fused feature [Z.sup.j.sub.i] = [Z.sup.j.sub.i] [alpha](i = 1,2, ... , C j = 1, 2, ... , [n.sub.i], where [alpha] - [[[a.sub.1], [a.sub.2], ..., [a.sub.M]]].sup.T] and [y.sup.j.sub.i] = [[y.sup.1.sub.ij], [y.sup.2.sub.ij], ... ,[y.sup.M.sub.ij]]. The average margin distance can be define by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (2)

where d([C.sup.l.sub.p],[C.sup.l.sub.q]) denotes the margin distance between pth and qth classes. Given the feature vector [z.sup.j.sub.i] in the dimension-reduced space [F.sup.l], and ([m.sup.l.sub.i] (i = 1,2, ... ,L) and [m.sup.l] denote the mean of every class and the mean of total samples respectively.

Firstly we can calculate d([C.sup.l.sub.p],[C.sup.l.sub.q] (p = 1,2, ... ,L,q - 1,2, ... ,L) of follows

d[C.sup.l.sub.p],[C.sup.l.sub.q] = d[m.sup.l.sub.p],[m.sup.l.sub.q]-S([C.sup.l.sub.p])-S([C.sup.l.sub.q]) (3)

where S [C.sup.l.sub.p](p = 1,2, ... ,L) is the measure of the scatter of the class [C.sub.p] and d ([m.sup.l.sub.p],[m.sup.l.sub.q]) is the distance between the means of two classes. Let [S.sup.l.sub.p] (p = 1,2, ... ,L) denote the within-class scatter matrix of class p, then tr ([S.sup.l.sub.p] (p = 1,2, ... ,L) measures the scatter of the class p can be defined as follows.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (4)

And we can define tr([S.sup.l.sub.B]) and tr([S.sup.l.sub.W] denote the trace of between classes scatter matrix and within classes scatter matrix of dimension-reduced space [F.sub.l] respectively as follows.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (5)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (6)

Hence S([C.sup.l.sub.p]) = tr ([S.sup.l.sub.p]). So

Dis [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (7)

Firstly we use Euclidean distance to calculate d ([m.sup.l.sub.p],[m.sup.l.sub.q] as follows.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (8)

According to equation (5), (6), (8) and (9), it is easy to obtain

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (9)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (10)

Hence 1/2n [L.summation over (p = 1)][L.summation over (q = 1)][n.sub.p][n.sub.q][tr(S.sup.l.sub.p] + tr(S.sup.l.sub.q] = tr(S.sup.l.sub.W]. We can obtain

Dis tr([S.sup.l.sub.B] - tr([S.sup.l.sub.W])

In the previous work in [15], Li applied the maximum margin criterion to feature extraction by maximizing the average margin distance. In this paper, we expect to create an optimization problem based on maximum margin criterion to seek an optimal projection vector a.

Proposition 1. Let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

Then Dis = [[alpha].sup.T]G[alpha].

Proof. From equation (5) and (6), we can obtain

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (11)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (12)

From equation (2) [z.sup.j.sub.i] = [y.sup.j.sub.i][alpha], we can obtain

Let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

easy to obtain

tr([S.sup.[PHI].sub.B])-tr([S.sup.[PHI].sub.W]) = [[alpha].sup.T]G[alpha] (13)

According to Propositions 1, we can obtain Dis = [[alpha].sup.T]G[alpha]

According to the maximum margin criterion and Proposition 1, we can create an optimization problem constrained by the unit vector a, i.e., [a.sup.T] a = 1, as follows. max

max [[alpha].sup.T] G[alpha] (14)

[alpha]

subject to [[alpha].sup.T] [alpha] - 1 = 0 (15)

In order to solve the above constrained optimization equation, we apply a Lagrangian

L ([alpha],[lambda]) = [[alpha].sup.T] G[alpha] - [lambda]([[alpha].sup.T][alpha]-1) (16)

with the multiplier l. The derivative of L (a,l) with respect to the primal variables must vanish, that is

[partial derivative]L([alpha],[lambda]/[partial derivative][alpha] = (G - [lambda]I)[alpha] = 0 (17)

[partial derivative]L([alpha],[lambda]/[partial derivative][alpha] = (1 - [[alpha).sup.T][alpha] = 0 (18)

Hence

G[alpha] = [lambda][alpha] (19)

Proposition 2. Assume a* is one eigenvector of corresponding to the largest eigenvalue l*, then has a maximum value at a*

Proof . For Equation (8), multiply aT at left side of equation, then

[[alpha].sup.T]G[alpha] = [[alpha].sup.T][lambda][alpha] = [lambda][[alpha].sup.T][alpha] = [lambda] (20)

Hence

Dis = [lambda] (21)

From this formulation, it is easy to see that, Dis reaches the largest value [lambda] when reaches the largest value. So Dis has a maximum value at [alpha]*, which is an eigenvector of G corresponding to the largest eigenvalue [lambda]*.

According to Proposition 2, the problem of solving the constrained optimization function is transformed to the problem of solving eigenvalue equation shown in (19). The fusion coefficients are equal to the elements of eigenvector of G corresponding to the largest eigenvalue, while G is a matrix which can be calculated by multiple features.

As above discussion, discriminant feature fusion finds a discriminating fused feature space, in which data has largest class discriminant. And then the fusion coefficients are equal to the elements of eigenvector of an eigenvalue problem corresponding to the largest eigenvalue, and the solution of the eigenvalue problem is unique, so the fusion coefficients are unique.

[FIGURE 1 OMITTED]

3. Experimental results

In our experiments, we implement our algorithm in the two face databases, ORL face database [17] and Yale face database [16]. The ORL face database, developed at the Olivetti Research Laboratory, Cambridge, U.K., is composed of 400 grayscale images with 10 images for each of 40 individuals. The variations of the images are across pose, time and facial expression. The Yale face database was constructed at the Yale Center for Computational Vision and Control. It contains 165 grayscale images of 15 individuals. These images are taken under different lighting condition (left-light, center-light, and right-light), and different facial expression (normal, happy, sad, sleepy, surprised, and wink), and with/without glasses.

In our experiments, to reduce computation complexity, we resize the original ORL face images sized 112 x 92 pixels with a 256 gray scale to 48 x 48 pixels, and some examples are shown in Fig. 1a. We randomly select 5 images from each subject, 200 images in total for training, and the rest 200 images are used to test the performance. Similarly, the images from Yale databases are cropped to the size of 100 x 100 pixels, and some examples are shown in Fig. 1b. Randomly selected 5 images per person are selected as the training samples, while the rest 5 images per person are used to test the performance of the algorithms.

From the theory derivation of discriminant fusion in Section 2, it is easy to predict that the proposed algorithm gives the better performance compared with the classical fusion [4], and here only a set of experiments are implemented for evaluation. Firstly we extract the linear and nonlinear features with PCA and KPCA, and then classify the fused feature of the two features with Fisher classifier. Supposed [y.sup.1.sub.ij] and [y.sup.2.sub.ij] (i = 1,2, ... ,C j = 1,2, ... ,[n.sub.j] denote the linear and nonlinear feature derived from PCA and KPCA respectively, the fused feature, [z.sup.j.sub.i] = [[[y.sup.1.sub.ij][y.sup.2.sub.ij]].sup.T] based on the classical fusion [4], while [z.sup.j.sub.i] = [2.summation over (m = 1)][a.sub.m][y.sup.m.sub.ij]based on discriminant fusion strategy. Here Polynomial kernel and Gaussian kernel with different coefficients are selected for KPCA, and accuracy rate is applied to evaluate the recognition performance.

As Figure 2, 3, 4 and 5 shown, the proposed method gives a higher performance than the classical fusion method [12] under the same kernel parameters for KPCA.

Since the fusion coefficients of the discriminant fusion strategy are obtained by solving the optimization problem based on maximum margin criterion and data has the largest class discriminant in the fused feature space, it is not surprised that discriminant fusion gives a consistently better performance than classical fusion [4]. But besides the above advantages, the following cases are worthy to be considered: 1) Since the maximum margin criterion is used to create the constrained optimization problem, the fusion strategy is only adapted to the supervised learning. 2) The fusion coefficients are obtained by solving one eigenvalue problem, which causes the increasing of time consuming than classical strategy, so it should evaluate the balance of time consuming and classification accuracy. 3) Discriminant fusion strategy is only a linear combination of multiple features with different combination coefficients, so other fusion strategies can be considered to create based on the discriminant analysis.

[FIGURE 2 OMITTED]

[FIGURE 3 OMITTED]

[FIGURE 4 OMITTED]

[FIGURE 5 OMITTED]

4. Conclusions

A novel parallel feature fusion based maximum margin criterion, namely discriminant parallel feature fusion, for pattern classification in this paper. We create a constrained optimization problem based on maximum margin criterion to find the fusion coefficients of the parallel feature fusion and transform the optimization problem into an eigenvalue problem, which brings two advantages, i.e., fused data has the largest class discriminant and fusion coefficients are unique. This paper only proposes a linear combination fusion strategy based on discriminant analysis, and we expect to give a promising idea for other fusion strategy.

Received 10 May 2007; Revised and accepted 1 Nov. 2007

References

[1] Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.(1997). Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Trans. Pattern Analysis and Machine Intelligence. 19 (7) 711-720.

[2] Martinez, A.M., Kak, A.C (2001). PCA versus LDA. IEEE Trans. Pattern Analysis and Machine Intelligence, 23(2) 228-233.

[3] Scholkopf, B. C., Burges., Smola, A. J.(1999). Advances in Kernel Methods--Support Vector Learning. Cambridge, MA: MIT Press.

[4] Batur, A.U., Hayes, M.H. (2001). Linear Subspace for Illumination Robust Face Recognition, Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, Dec.

[5] Ruiz, A., Lopez de Teruel, P. E. (2001). Nonlinear kernel-based statistical pattern analysis. IEEE Trans. Neural Networks, (12) 16-32.

[6] Scholkopf, B., Smola, A., Muller, K. R (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation. 10(5) 1299-1319.

[7] Liu, Qingshan., Lu, Hanqing., Songde, Ma(2004). Improving Kernel Fisher Discriminant Analysis for Face Recognition, IEEE Trans. Pattern Analysis and Machine Intelligence, 14 (1) 42-49.

[8] Lu, J. W.K. Plataniotis, and . Venetsanopoulos, A. N(2003). Face recognition using kernel direct discriminant analysis algorithms, IEEE Trans. Neural Network. 14 (1) 117-126.

[9] Taropa, E., Srini, V.P., Lee, Won-Jongand ., Han, Tack-Don (2006). Data Fusion Applied on Autonomous Ground Vehicles. The 8th International Conference Advanced Communication Technology. 744-749.

[10] Gunatilaka, A.H., Baertlein, B.A (2001). Feature-level and decision-level fusion of noncoincidently sampled sensors for land mine detection, IEEE Trans. Pattern Analysis and Machine Intelligence. 23 (6) 577-589.

[11] Lo, D., Goubran, R.A., Dansereau, R.M. (2005). Robust joint audio-video talker localization in video conferencing using reliability information-II: Bayesian network fusion, IEEE Trans. Instrumentation and Measurement. 54 (4) 1541-1547.

[12] Liu, Chengjun Wechsler (2001). A shape-and texture-based enhanced Fisher classifier for face recognition, IEEE Trans. Image Processing 10 (4) 598-608.

[13] Xinhua, Zhang. (1998). An information model and method of feature fusion, International Conference on Signal Processing. 2. 1389-1392.

[14] Battiti, R.(1994). Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Network.5 (4) 537-550.

[15] Li, Haifeng., Jiang, Taoand., Zhang, Keshu(2006). Efficient and Robust Feature Extraction by Maximum Margin Criterion, IEEE Trans. Neural Networks 17 (1) 157-165.

[16] Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J(1997). Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Trans. Pattern Analysis and Machine Intelligence. 19 (7) 711-720.

[17]Ferdinando Samaria, Andy Harter(1994). Parameterisation of a Stochastic Model for Human Face Identification.

Jun-Bao Li (1), Shu-Chuan Chu (2), Jeng-Shyang Pan (3)

(1) Department of Automatic Test and Control Harbin Institute of Technology China junbaolihit@hotmail.com

(2) Department of Information Management Cheng Shiu University Taiwan scchu@bit.kuas.edu.tw

(3) Department of Electronic Engineering National Kaohsiung University of Applied Sciences Taiwan jspan@cc.kuas.edu.tw

Jun-Bao Li received the B.Sc. and M.Sc. degree from Harbin Institute of Technology (HIT), Harbin, P. R. China in 2002 and 2004, respectively. He is currently working toward the Ph.D. degree in the Measurement Technology and Instrument, in HIT, Harbin, P. R. China. His research interests are mainly in pattern recognition and image processing.

Shu-Chuan Chu received the Ph.D. degree in School of Informatics and Engineering, Flinders University of South Australia. She is an assistant Professor of Dept. of Information Management, Cheng Shiu University, Taiwan. His research interests are mainly in Data Mining, Computational Intelligence,Information Hiding, and Signal Processing.

Jeng-Shyang Pan received the B. S. degree in Electronic Engineering from the National Taiwan University of Science and Technology, Taiwan in 1986, the M. S. degree in Communication Engineering from the National Chiao Tung University, Taiwan in 1988, and the Ph.D. degree in An Adaptive Online Recursive Learning Algorithm for Least Squares SVM Classifiers Electrical Engineering from the University of Edinburgh, U.K. in 1996. Currently, he is a Professor in the Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Taiwan. Professor Pan has published more than 50 journal papers and 120 conference papers. He joints the editorial board for LNCS Transactions on Data Hiding and Multimedia Security, Springer, International Journal of Knowledge-Based Intelligent Engineering Systems, IOS Press, and International Journal of Hybrid Intelligent System, Advanced Knowledge International. He is the Co-Editors-in-Chief for International Journal of Innovative Computing, Information and Control. His current research interests include data mining, information security and image processing.

Categories and Subject Descriptions I.5.2 [Pattern Recognition]; Classifier design and evaluation: H.5 [Information interfaces and presentation]

General Terms Pattern classification, Feature extraction

Keywords: Parallel feature fusion, maximum margin criterion, discriminant parallel feature fusion

Categories and Subject Descriptors I.5.2 [Pattern Recognition]; Classifier design and evaluation: H.5 [In

1. Introduction

The procedure of pattern classification is consisted of two procedures, feature extraction and classification. In the past research works, many feature extraction methods are developed. Among various feature extraction methods, the dimensionality reduction technique is exciting since the low-dimensional feature representation with high discriminatory power is very important for facial feature extraction, such as principal component analysis (PCA) and linear discriminant analysis (LDA) [1],[2]. Although successful in many cases, these linear methods cannot provide reliable and robust solutions to those face recognition problems with complex face variations since the distribution of face images under a perceivable variation in viewpoint, illumination or facial expression is highly nonlinear and complex. Recently, researchers applied kernel machine techniques to solve the nonlinear problem successfully [3], [4], [5], and accordingly some kernel based methods are developed for face recognition [6], [7], [8]. Followed by feature extraction, the classification criterion is applied to classify the sample. But if many features are extracted for classification, then how to perform a wonderful classification based on the multiple features becomes a crucial problem for pattern classification problem when multiple features are considered. As a very efficient method, data fusion is applied to solve it, which has been widely applied in many areas [9], [10], [11]. Existing fusion methods can be divided into the following three schemes: the first scheme is to integrate all assimilated multiple features into a final decision directly; the second is to combine the individual decisions made by every feature into a global final decision; and the third is to fuse the multiple features to one new feature for classification. In this paper, we devote our attention to the third fusion scheme, i.e., feature fusion. Recently many feature fusion methods for pattern classification were proposed in the lectures [12], [13], [14].

In this paper, we focus on the linear combination fusion, but pay more attention to how to find the fusion coefficients, and propose so called discriminant feature fusion for supervising learning. The proposed discriminant fusion strategy has two advantages: 1) fused data has the largest class discriminant owing to obtaining the fusion coefficients by solving a constrained optimization problem created in the average margin criterion; 2) fusion coefficients are unique owing to they are equal to the elements of the eigenvector of one eigenvalue problem transformed by the above optimization problem.

This paper is organized as follows. In Section 2, the detailed theory derivation is introduced, and results and conclusion are given in Section 3 and 4, respectively.

2. Discriminant parallel feature fusion

In this section, firstly we introduce the basic idea of discriminant parallel feature fusion briefly, and then emphasize the theory derivation of seeking the fusion coefficients in detailed.

Given a sample set {[x.sub.ij]} ( i = 1, 2, ... , C j = 1,2, ... ,[n.sub.i]), and multiple feature sets {[y.sup.m.sub.ij]} (i = 1,2, ... , C j = 1,2 ... , [n.sub.i], m = 1,2, ... ,M), where M denotes the number of multiple features sets, the fused feature with the linear combination can be described as follows.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (1)

where [a.sub.m] (m = 1,2, ... ,M) (m = 1,2, ..., M)) and zij (i = 1,2, ..., C, j = 1,2, ..., [n.sub.i])note the combination fusion coefficients and the fused feature respectively.

Now we focus how to obtain [a.sub.m](m = 1,2, ... M), and our goal is to find such fusion coefficients that they are unique and cause the largest class discriminant in the fused feature space. For supervised learning, we can calculate the average margin distance between two classes [C.sup.l.sub.p] and [C.sup.l.sub.q] in fused feature space R consisted of the fused feature [Z.sup.j.sub.i] = [Z.sup.j.sub.i] [alpha](i = 1,2, ... , C j = 1, 2, ... , [n.sub.i], where [alpha] - [[[a.sub.1], [a.sub.2], ..., [a.sub.M]]].sup.T] and [y.sup.j.sub.i] = [[y.sup.1.sub.ij], [y.sup.2.sub.ij], ... ,[y.sup.M.sub.ij]]. The average margin distance can be define by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (2)

where d([C.sup.l.sub.p],[C.sup.l.sub.q]) denotes the margin distance between pth and qth classes. Given the feature vector [z.sup.j.sub.i] in the dimension-reduced space [F.sup.l], and ([m.sup.l.sub.i] (i = 1,2, ... ,L) and [m.sup.l] denote the mean of every class and the mean of total samples respectively.

Firstly we can calculate d([C.sup.l.sub.p],[C.sup.l.sub.q] (p = 1,2, ... ,L,q - 1,2, ... ,L) of follows

d[C.sup.l.sub.p],[C.sup.l.sub.q] = d[m.sup.l.sub.p],[m.sup.l.sub.q]-S([C.sup.l.sub.p])-S([C.sup.l.sub.q]) (3)

where S [C.sup.l.sub.p](p = 1,2, ... ,L) is the measure of the scatter of the class [C.sub.p] and d ([m.sup.l.sub.p],[m.sup.l.sub.q]) is the distance between the means of two classes. Let [S.sup.l.sub.p] (p = 1,2, ... ,L) denote the within-class scatter matrix of class p, then tr ([S.sup.l.sub.p] (p = 1,2, ... ,L) measures the scatter of the class p can be defined as follows.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (4)

And we can define tr([S.sup.l.sub.B]) and tr([S.sup.l.sub.W] denote the trace of between classes scatter matrix and within classes scatter matrix of dimension-reduced space [F.sub.l] respectively as follows.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (5)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (6)

Hence S([C.sup.l.sub.p]) = tr ([S.sup.l.sub.p]). So

Dis [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (7)

Firstly we use Euclidean distance to calculate d ([m.sup.l.sub.p],[m.sup.l.sub.q] as follows.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (8)

According to equation (5), (6), (8) and (9), it is easy to obtain

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (9)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (10)

Hence 1/2n [L.summation over (p = 1)][L.summation over (q = 1)][n.sub.p][n.sub.q][tr(S.sup.l.sub.p] + tr(S.sup.l.sub.q] = tr(S.sup.l.sub.W]. We can obtain

Dis tr([S.sup.l.sub.B] - tr([S.sup.l.sub.W])

In the previous work in [15], Li applied the maximum margin criterion to feature extraction by maximizing the average margin distance. In this paper, we expect to create an optimization problem based on maximum margin criterion to seek an optimal projection vector a.

Proposition 1. Let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

Then Dis = [[alpha].sup.T]G[alpha].

Proof. From equation (5) and (6), we can obtain

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (11)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (12)

From equation (2) [z.sup.j.sub.i] = [y.sup.j.sub.i][alpha], we can obtain

Let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

easy to obtain

tr([S.sup.[PHI].sub.B])-tr([S.sup.[PHI].sub.W]) = [[alpha].sup.T]G[alpha] (13)

According to Propositions 1, we can obtain Dis = [[alpha].sup.T]G[alpha]

According to the maximum margin criterion and Proposition 1, we can create an optimization problem constrained by the unit vector a, i.e., [a.sup.T] a = 1, as follows. max

max [[alpha].sup.T] G[alpha] (14)

[alpha]

subject to [[alpha].sup.T] [alpha] - 1 = 0 (15)

In order to solve the above constrained optimization equation, we apply a Lagrangian

L ([alpha],[lambda]) = [[alpha].sup.T] G[alpha] - [lambda]([[alpha].sup.T][alpha]-1) (16)

with the multiplier l. The derivative of L (a,l) with respect to the primal variables must vanish, that is

[partial derivative]L([alpha],[lambda]/[partial derivative][alpha] = (G - [lambda]I)[alpha] = 0 (17)

[partial derivative]L([alpha],[lambda]/[partial derivative][alpha] = (1 - [[alpha).sup.T][alpha] = 0 (18)

Hence

G[alpha] = [lambda][alpha] (19)

Proposition 2. Assume a* is one eigenvector of corresponding to the largest eigenvalue l*, then has a maximum value at a*

Proof . For Equation (8), multiply aT at left side of equation, then

[[alpha].sup.T]G[alpha] = [[alpha].sup.T][lambda][alpha] = [lambda][[alpha].sup.T][alpha] = [lambda] (20)

Hence

Dis = [lambda] (21)

From this formulation, it is easy to see that, Dis reaches the largest value [lambda] when reaches the largest value. So Dis has a maximum value at [alpha]*, which is an eigenvector of G corresponding to the largest eigenvalue [lambda]*.

According to Proposition 2, the problem of solving the constrained optimization function is transformed to the problem of solving eigenvalue equation shown in (19). The fusion coefficients are equal to the elements of eigenvector of G corresponding to the largest eigenvalue, while G is a matrix which can be calculated by multiple features.

As above discussion, discriminant feature fusion finds a discriminating fused feature space, in which data has largest class discriminant. And then the fusion coefficients are equal to the elements of eigenvector of an eigenvalue problem corresponding to the largest eigenvalue, and the solution of the eigenvalue problem is unique, so the fusion coefficients are unique.

[FIGURE 1 OMITTED]

3. Experimental results

In our experiments, we implement our algorithm in the two face databases, ORL face database [17] and Yale face database [16]. The ORL face database, developed at the Olivetti Research Laboratory, Cambridge, U.K., is composed of 400 grayscale images with 10 images for each of 40 individuals. The variations of the images are across pose, time and facial expression. The Yale face database was constructed at the Yale Center for Computational Vision and Control. It contains 165 grayscale images of 15 individuals. These images are taken under different lighting condition (left-light, center-light, and right-light), and different facial expression (normal, happy, sad, sleepy, surprised, and wink), and with/without glasses.

In our experiments, to reduce computation complexity, we resize the original ORL face images sized 112 x 92 pixels with a 256 gray scale to 48 x 48 pixels, and some examples are shown in Fig. 1a. We randomly select 5 images from each subject, 200 images in total for training, and the rest 200 images are used to test the performance. Similarly, the images from Yale databases are cropped to the size of 100 x 100 pixels, and some examples are shown in Fig. 1b. Randomly selected 5 images per person are selected as the training samples, while the rest 5 images per person are used to test the performance of the algorithms.

From the theory derivation of discriminant fusion in Section 2, it is easy to predict that the proposed algorithm gives the better performance compared with the classical fusion [4], and here only a set of experiments are implemented for evaluation. Firstly we extract the linear and nonlinear features with PCA and KPCA, and then classify the fused feature of the two features with Fisher classifier. Supposed [y.sup.1.sub.ij] and [y.sup.2.sub.ij] (i = 1,2, ... ,C j = 1,2, ... ,[n.sub.j] denote the linear and nonlinear feature derived from PCA and KPCA respectively, the fused feature, [z.sup.j.sub.i] = [[[y.sup.1.sub.ij][y.sup.2.sub.ij]].sup.T] based on the classical fusion [4], while [z.sup.j.sub.i] = [2.summation over (m = 1)][a.sub.m][y.sup.m.sub.ij]based on discriminant fusion strategy. Here Polynomial kernel and Gaussian kernel with different coefficients are selected for KPCA, and accuracy rate is applied to evaluate the recognition performance.

As Figure 2, 3, 4 and 5 shown, the proposed method gives a higher performance than the classical fusion method [12] under the same kernel parameters for KPCA.

Since the fusion coefficients of the discriminant fusion strategy are obtained by solving the optimization problem based on maximum margin criterion and data has the largest class discriminant in the fused feature space, it is not surprised that discriminant fusion gives a consistently better performance than classical fusion [4]. But besides the above advantages, the following cases are worthy to be considered: 1) Since the maximum margin criterion is used to create the constrained optimization problem, the fusion strategy is only adapted to the supervised learning. 2) The fusion coefficients are obtained by solving one eigenvalue problem, which causes the increasing of time consuming than classical strategy, so it should evaluate the balance of time consuming and classification accuracy. 3) Discriminant fusion strategy is only a linear combination of multiple features with different combination coefficients, so other fusion strategies can be considered to create based on the discriminant analysis.

[FIGURE 2 OMITTED]

[FIGURE 3 OMITTED]

[FIGURE 4 OMITTED]

[FIGURE 5 OMITTED]

4. Conclusions

A novel parallel feature fusion based maximum margin criterion, namely discriminant parallel feature fusion, for pattern classification in this paper. We create a constrained optimization problem based on maximum margin criterion to find the fusion coefficients of the parallel feature fusion and transform the optimization problem into an eigenvalue problem, which brings two advantages, i.e., fused data has the largest class discriminant and fusion coefficients are unique. This paper only proposes a linear combination fusion strategy based on discriminant analysis, and we expect to give a promising idea for other fusion strategy.

Received 10 May 2007; Revised and accepted 1 Nov. 2007

References

[1] Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.(1997). Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Trans. Pattern Analysis and Machine Intelligence. 19 (7) 711-720.

[2] Martinez, A.M., Kak, A.C (2001). PCA versus LDA. IEEE Trans. Pattern Analysis and Machine Intelligence, 23(2) 228-233.

[3] Scholkopf, B. C., Burges., Smola, A. J.(1999). Advances in Kernel Methods--Support Vector Learning. Cambridge, MA: MIT Press.

[4] Batur, A.U., Hayes, M.H. (2001). Linear Subspace for Illumination Robust Face Recognition, Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, Dec.

[5] Ruiz, A., Lopez de Teruel, P. E. (2001). Nonlinear kernel-based statistical pattern analysis. IEEE Trans. Neural Networks, (12) 16-32.

[6] Scholkopf, B., Smola, A., Muller, K. R (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation. 10(5) 1299-1319.

[7] Liu, Qingshan., Lu, Hanqing., Songde, Ma(2004). Improving Kernel Fisher Discriminant Analysis for Face Recognition, IEEE Trans. Pattern Analysis and Machine Intelligence, 14 (1) 42-49.

[8] Lu, J. W.K. Plataniotis, and . Venetsanopoulos, A. N(2003). Face recognition using kernel direct discriminant analysis algorithms, IEEE Trans. Neural Network. 14 (1) 117-126.

[9] Taropa, E., Srini, V.P., Lee, Won-Jongand ., Han, Tack-Don (2006). Data Fusion Applied on Autonomous Ground Vehicles. The 8th International Conference Advanced Communication Technology. 744-749.

[10] Gunatilaka, A.H., Baertlein, B.A (2001). Feature-level and decision-level fusion of noncoincidently sampled sensors for land mine detection, IEEE Trans. Pattern Analysis and Machine Intelligence. 23 (6) 577-589.

[11] Lo, D., Goubran, R.A., Dansereau, R.M. (2005). Robust joint audio-video talker localization in video conferencing using reliability information-II: Bayesian network fusion, IEEE Trans. Instrumentation and Measurement. 54 (4) 1541-1547.

[12] Liu, Chengjun Wechsler (2001). A shape-and texture-based enhanced Fisher classifier for face recognition, IEEE Trans. Image Processing 10 (4) 598-608.

[13] Xinhua, Zhang. (1998). An information model and method of feature fusion, International Conference on Signal Processing. 2. 1389-1392.

[14] Battiti, R.(1994). Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Network.5 (4) 537-550.

[15] Li, Haifeng., Jiang, Taoand., Zhang, Keshu(2006). Efficient and Robust Feature Extraction by Maximum Margin Criterion, IEEE Trans. Neural Networks 17 (1) 157-165.

[16] Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J(1997). Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Trans. Pattern Analysis and Machine Intelligence. 19 (7) 711-720.

[17]Ferdinando Samaria, Andy Harter(1994). Parameterisation of a Stochastic Model for Human Face Identification.

Jun-Bao Li (1), Shu-Chuan Chu (2), Jeng-Shyang Pan (3)

(1) Department of Automatic Test and Control Harbin Institute of Technology China junbaolihit@hotmail.com

(2) Department of Information Management Cheng Shiu University Taiwan scchu@bit.kuas.edu.tw

(3) Department of Electronic Engineering National Kaohsiung University of Applied Sciences Taiwan jspan@cc.kuas.edu.tw

Jun-Bao Li received the B.Sc. and M.Sc. degree from Harbin Institute of Technology (HIT), Harbin, P. R. China in 2002 and 2004, respectively. He is currently working toward the Ph.D. degree in the Measurement Technology and Instrument, in HIT, Harbin, P. R. China. His research interests are mainly in pattern recognition and image processing.

Shu-Chuan Chu received the Ph.D. degree in School of Informatics and Engineering, Flinders University of South Australia. She is an assistant Professor of Dept. of Information Management, Cheng Shiu University, Taiwan. His research interests are mainly in Data Mining, Computational Intelligence,Information Hiding, and Signal Processing.

Jeng-Shyang Pan received the B. S. degree in Electronic Engineering from the National Taiwan University of Science and Technology, Taiwan in 1986, the M. S. degree in Communication Engineering from the National Chiao Tung University, Taiwan in 1988, and the Ph.D. degree in An Adaptive Online Recursive Learning Algorithm for Least Squares SVM Classifiers Electrical Engineering from the University of Edinburgh, U.K. in 1996. Currently, he is a Professor in the Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Taiwan. Professor Pan has published more than 50 journal papers and 120 conference papers. He joints the editorial board for LNCS Transactions on Data Hiding and Multimedia Security, Springer, International Journal of Knowledge-Based Intelligent Engineering Systems, IOS Press, and International Journal of Hybrid Intelligent System, Advanced Knowledge International. He is the Co-Editors-in-Chief for International Journal of Innovative Computing, Information and Control. His current research interests include data mining, information security and image processing.

Printer friendly Cite/link Email Feedback | |

Author: | Li, Jun-Bao; Chu, Shu-Chuan; Pan, Jeng-Shyang |
---|---|

Publication: | Journal of Digital Information Management |

Article Type: | Report |

Geographic Code: | 9CHIN |

Date: | Apr 1, 2008 |

Words: | 3143 |

Previous Article: | Smoke detection for early fire-alarming system based on video processing. |

Next Article: | Co-training for search-based automatic image annotation. |

Topics: |