# Vinegar identification by ultraviolet spectrum technology and pattern recognition method.

[sections]1. IntroductionThere are many varieties of vinegar in the market nowadays, but their qualities are not same, and there are not quick and valid methods to identify them. For a long period, people identify them from some sense index such as color, smelling, taste, style and some simple quantity index by experience; these methods are of subjectivity and unilateralism by all appearances.

Zhang Shunping et. al. measured vinegar by electronic nose technology, analyzing the comparability and similitude degree of vinegar at the aspect of savour, class and acidity by clustering and principal component analysis method, he also recognised the vinegar by probability neural network, the accuracy of recognition is 94.4% [1].

Ultraviolet Spectrum technology is a new measure technique, since there are different components in different matter systems and the unsaturation degree of the components are not same, the UV abs curves of the matter system are different. We can identify the matter systems by comparing their UV abs curves. In our previous works [2], we took the similitude degree of the curve as index, test the recurring, stability and otherness of the ultraviolet spectrum method, we concluded that this method can identify the vinegars. In this paper, we further study the method of vinegar identification, we process and analyzed the data of UV abs curves by pattern recognition method, such as Euclid(Mahalanobis) distance, linear discriminant analysis, principal component analysis, hybrid discriminant analysis and BP neural network. We recognize 5 kinds of vinegar samples, the accuracy of recognition is 100% when using Euclid(Mahalanobis) distance, principal component analysis, hybrid discriminant analysis ([lambda] = 0, [eta] = 1) and BP.

[sections]2. The material and data

2.1. Experiment material

Vinegars: (a)Black Rice spicy vinegar, (b) Jiajia mature vinegar, (c)Shuita mature vinegar, (d)Xiaoerhei grain spicy vinegar, (e)Zhenjiang spicy vinegar, are all bought from Yangling Guomao Supermarket.

Self-made vinegar: the samples are from the Practice Factory of College of Food Science and Engineering, Northwest [lambda] and F University.

Reagent: Glacial acetic acid, sodium hydroxide, are all analysis pure reagent and made in China.

Water for experiment: distilled water.

2.2. Instrument for experiment:

BUCHI Rotavapor R-200 circumrotate evaporation instrument(BUCHI Company), UV-2550 double beam of ultraviolet--visible light photometer(Japan).

2.3. Condition of experiment and data

With the scanning wavelength range at 245~330nm, samplinging interval at 0.5nm, aperture width at 0.5nm, dilution ratio of evaporated liquor at 1:6, evaporation temperature at 45[degrees]C, quality thickness of reference fluid (glacial acetic acid) at 45g/L, we scan the vinegars with ultraviolet spectrum at different storage time, and obtain the data, see [2].

We can see that, the UV abs curves of vinegar which have same brand are very similar, while the UV abs curves of vinegar which have different brand are of great difference. So we consider processing and analyzing the data by using the method of pattern recognition, and identify the vinegar.

With samplinging interval at 0.5nm, a vinegar sample is a vector with 171 dimension, while there are only 7 samples for one kind of vinegar, the dimension of sample vectors is far more than the number of samples, it is probably to appear severe warp when using statistical methods, so we take the interval as 5nm, and the dimension of sample vectors is now 18.

[sections]3. process and analyze data by pattern recognition method

we randomly choose 5 samples from each kind of vinegar samples for training, the remained 2 samples for testing.

3.1. Euclid(Mahalanobis) distance method

This method is to classify the original samples. Let the ith training sample of the kth kind of vinegar be [X.sup.k.sub.i] = [([x.sup.k.sub.i1], [x.sup.k.sub.i2], ..., [x.sup.k.sub.im]).sup.T], k = 1, 2,..., 5,i = 1, 2,..., 5, n = 18 , we calculate the mean vector (centre) of each training sample class:

[m.sub.k] = 1/5 [5.summation over (i=1)] [x.sup.k.sub.i] (i)

and the covariance matrix:

[s.sub.k] = 1/5 [5.summation over (i=1)] ([x.sup.k.sub.i] - [m.sub.k]) [([x.sup.k.sub.i] - [m.sub.k]).sup.T] (2)

For each testing samples y, we calculate the Euclid distance between y and each centre:

[d.sup.E.sub.k = [[[(y - [m.sub.k]).sup.T] (y - [m.sub.k])].sup.1/2] (3)

and the Mahalanobis distance:

[d.sup.M.sub.k = [[[(y - [m.sub.k]).sup.T] [s.sup.-1.sub.k](y - [m.sub.k])].sup.1/2] (4)

At last, assign the testing sample y to the 'nearest' class [mu] as the following equation:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5)

We test 10 testing samples by MATLAB7.4, the result is as Table 1:

3.2. Hybrid Discriminant Analysis(HDA) [3]

3.2.1. Method

HDA is to project the original samples data x (including training and testing samples) into a one-dimension subspace by the following linear transformation:

y = [w.sup.T] x (6)

Then the classifying is done in the subspace, the testing speed is quicker, and the accuracy of classification is still 100%. HDA is a method based on LDA and PCA. It integrates both discriminant and descriptive information simultaneously, controls the balance between LDA and PCA, it also provides a 2-D parameter space for searching. The objective function of HDA is:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (7)

where [lambda], [eta] are tow parameters ranged from 0 to 1,

[S.sub.b] = [5.summation over (k=1)] 5([m.sub.k] - m) [([m.sub.k] - m).sup.T] (8)

is the between-class scatter matrix, and m = 1/25 [SIGMA]x = 1/25 [[summation].sup.5.sub.k=1] 5[m.sub.k] is the mean vector of all training samples, while

[S.sub.[omega]] [5.summation over (k=1)] [S.sub.k], and [S.sub.k] = [5.summation over (i=1)] ([x.sup.k.sub.i] - [m.sub.k]) [([x.sup.k.sub.i] - [m.sub.k]).sup.T] (9)

is the within-class scatter matrix.

[S.sub.[SIGMA]] is the covariance matrix of all training samples, [S.sub.[SIGMA]] = 1/25 [SIGMA](x - m) [(x - m).sup.T] ,I is unit matrix. According to the Lagrange function method, the solution w is the largest eigenvector (corresponding the largest eigenvalue) of [[(1 - [eta])[S.sub.[omega]] + [eta]I].sup.-1] [(1 - [lambda])[S.sub.b] + [lambda][S.sub.[SIGMA]]].

We project all the training and testing samples as (6), let the projection of training sample [x.sup.k.sub.i] be [y.sup.k.sub.i], then the projection centre of each class in subspace is [m.sub.k] = 1/5 [[summation].sup.5.sub.i=1] [y.sup.k.sub.i] , let y be the projection of a testing sample, we calculate the distance between y and the projection centre of each class [d.sub.k] = [absolute value of y - [m.sub.k]] , assign the testing sample into the 'nearest' class.

3.2.2. Result and Discussion

Let [lambda], [eta] be the different parameters between 0 and 1, by using MATLAB7.4, we test 10 testing samples, the result is as Table 2:

We have seen that when [lambda] = 1, [eta] = 1 or [lambda] = 0, [eta] = 1 ,the result is very satisfying, but it is poor when [lambda] = 0, [eta] = 0. In fact, when [lambda] = 0, [eta] = 0 , (7) becomes:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (10)

This is the objective function of LDA, the solution w is now the largest eigenvector of [S.sup.-1.sub.[omega]] [S.sub.b]. Since the estimate of the scatter matrix [S.sub.[omega]] and [S.sub.b] is based on samples, LDA may not do well in small sample set problem. This conclusion is proved by the result of Table 2.

While [lambda] = 0, [eta] = 1, (7) becomes as following:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (11)

The solution w is now the largest eigenvector of between-class scatter matrix [S.sub.b], this shows that (6) m[a.sub.k]es the scatter degree among each projection class largest, so the distance among each projection class is largest, the result of classification is good of course.

Let [lambda] = 1, [eta] = 1 , then (7) becomes as following:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (12)

is the objective function of PCA , w , which is the solution of the objective function, is the eigenvector associated with the largest eigenvalue of the covariance matrix [S.sub.[SIGMA]]. PAC is statistical analysis method, it can remove the relativity between the elements of the vector, so the components of the transformed vector are disrelated, and be arranged in the order that the corresponding variance are decreased. PAC is superior to LDA in dealing with the small sample set problems because it captures the descriptive information of the data in the projected space. This can be shown in Table 2.

For further analysis, we calculate the nonzero eigenvalues and the accumulated variance cover rate of the covariance matrix.

From the Table 3, we can see that the accumulated variance cover rate of the largest eigenvalue has arrived 93.22%. This shows that the variance contribution rate of the first principal component of the transformed vector has arrived 93.22%, so we obtain the satisfying results by only using the first principal component for classification.

The eigenvector associated with the largest eigenvalue is w ,

w =(-0.2976 -0.3487 -0.3781 -0.3691 -0.3676 -0.3157 -0.3499 -0.3105 -0.2069 -0.1092 -0.0457 -0.0125 0.0005 0.0034 0.0030 0.0010 0.0004 0.0001)

We can see that the absolute value of the former 10 elements are bigger than that of the latter 8 elements. The latter 8 elements almost go to zero. These elements correspond to the UV absorbance values which has the wavelength of 295~ 330nm. This shows that it is the absorbency values whose UV wavelength is 245 ~295nm that mainly impact the first principal component, while the impact which generated by the absorbency value with UV wavelength of 295 ~ 330nm can be ignored. From table 4, the average UV curves of 5 kinds of vinegar within the wavelength range of 245 ~ 295nm are of much difference, and within the wave length range of 295 ~ 330nm, the difference is minimal. So in the experiment, we can reduce the range of the scan UV wavelength to 245 ~ 295nm.

3.3. BP Neural Network Method [4]

BP Neural Network method use error back-propagation algorithm. The data of the given sample and with ambiguity relationship can be effective classified. We designed a single hidden layer BP neural network. Since the dimension of the sample vector is 18, so the number of input layer nodes n is 18; As we experiment with 5 kinds of vinegar, the numbers of out layer nodes m is 5 and the hidden layer nodes is [square root of (m + n + a)], where a [member of] [1,10] is a constant.

Let the training samples of Black Rice spicy vinegar, Jiajia mature vinegar, Shuita mature vinegar, Xiaoerhei grain spicy vinegar and Zhenjiang spicy vinegar be the input vectors, and the corresponding output vector be (1,0,0,0,0), (0,1,0,0,0), (0,0,1,0,0), (0,0,0,1,0), (0,0,0,0,1) respectively. In the test, we classify the test sample in the class k if the kth element of the output vector is maximum.

Using sigmoid function [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] as the active function, for the weight and the threshold, we use gradient descent momentum learning method, let the momentum coefficient be [eta] = 0.1 and the learning rate be [epsilon] = 0.1.

Input the 25 training samples into the net and let training goal error be 0.01. The training results of MATLAB7.4 shows that when there is 6 hidden notes, the goal error meets our qualifications at 0.00776696 only after 16 times training. Then we input the 10 test samples of given class into the net, compute the number of the samples that are classified correctly ,we can obtain the accuracy of the classification, see Table 5.

Conclusion

We studied the small sample sets of 5 kinds of vinegar, by circumrotating evaporation and ultraviolet spectrum scanning, under the conditions of wavelength at 245~330nm, dilution ratio of evaporated liquor at 1:6, evaporation temperature at 45[degrees] C, mass concentration of refernce at 45g/L. The ultraviolet spectrum curves of vinegar at different storage time are obtained, the data are processed and analyzed by the method of pattern recognition, such as Euclid(Mahalanobis) distance, linear discriminant analysis, principal component analysis, hybrid discriminant analysis and BP neural network. The experiment result shows that the accuracy of recognition was 100% when using Euclid(Mahalanobis) distance, principal component analysis, hybrid discriminant analysis ([lambda] = 0, [eta] = 1) and BP, furthermore, we can reduce the scanning wavelength range of ultraviolet spectrum into 245~295nm. These methods can be effective ways to identify vinegar. The reason of poor recognition accuracy of LDA is that we use the small samples set.

In the future, we will try to identify vinegar by support vector machine [5], [6].

References

[1] Zhang Shunping, Zhang Qinyi, Li Dengfeng, etc. Research on Vinegars Identification by Electronic Nose, Chinese Journal of Sensors and Actuators, 19(2006), 104-107.

[2] Xie Huadong, Bu Lijun, Li Zhixi, New Method of Vinegar Detection Based on Ultraviolet Fingerprint Technology, Journal of Machine, 1(2009).

[3] Jie Yu, Qi Tian, Ting Rui, Huang, Integrating Discriminant and Descriptive Information for Dimension Reduction and Classification, IEEE Transactions on Circuits and Systems for Video Technology, 17(2007), No. 3, 372 - 377.

[4] Exploring Centre of Feisi Science and Technology Production, Theory of Neural Network and Matlab7 application, Beijing: Publishing House of Electronics Industry, 18(2005),103.

[5] John Shawe-Taylor, Nello Cristianini, Kernel Methods for Pattern Analysis, China

Machine Press, 2005, 215-237.

[6] Deng Naiyang, Tian Yingjie, New Method in Data Digging-Support Vector Machine, Publishing House of Science, 2005.

Huali Zhao ([double dagger]), Zhixi Li ([dagger], 1), Xuemei Yang ([dagger], 2) and Baoan Clien ([dagger])

[dagger] College of Mathematics and Information Science, Xianyang Normal University, Xianyang 712000, P.R. China

[double dagger] College of Food Science and Engineering, Northwest [lambda] and F University, Yangling, 712100, P.R.China E-mail: zhl029@163.com

(1) This research is supported by scientific research project(No.2005K03-G03)of Shaanxi province.

(2) This research is supported by scientific research project(No.09JK809)of Shaanxi province.

Table 1: The Identification Accuracy of Euclid(Mahalanobis) Distance Name Zhenjiang Xiaoerhei Shuita of spicy grain spicy mature vinegar vinegar vinegar vinegar Euclid distance 100% 100% 100% Mahalanobis 100% 100% 100% distance Name Jiajia Black Rice of mature spicy vinegar vinegar vinegar Euclid distance 100% 100% Mahalanobis 100% 100% distance Table 2: The Identification Accuracy of HAD with Different Parameters Name Zhenjiang Xiaoerhei Shuita of spicy grain spicy mature vinegar vinegar vinegar vinegar x = 1, [eta] = 1 100% 100% 100% x = 0, [eta] = 1 100% 100% 100% x = 0, [eta] = 0 100% 100% 50% Name Jiajia Black Rice of mature spicy vinegar vinegar vinegar x = 1, [eta] = 1 100% 100% x = 0, [eta] = 1 100% 100% x = 0, [eta] = 0 100% 50% Table 3: The Identification Accuracy of BP Neural Network 1 2 3 4 5 Eigenvalues 20.6706 1.4026 0.046 0.0245 0.0141 Accumulated 93.22% 99.54% 99.75% 99.86% 99.92% Cover Rate 6 7 8 9 Eigenvalues 0.0125 0.0043 0.0001 0.0001 Accumulated 99.98% 100% 100% 100% Cover Rate Table 4: The average of the five kinds of vinegar wavelet 245 250 255 260 265 shuita 0.493 0.3582 0.3758 0.4662 0.6036 jiajia 3.8294 4.3478 4.7018 4.6418 4.7018 zhenjiang 0.2518 0.544 0.878 1.267 1.716 xiaoerhei 0.648 0.3778 0.3382 0.3934 0.4968 black rice 0.0488 0.1994 0.2752 0.3256 0.3626 wavelet 290 295 300 305 310 shuita 0.669 0.5866 0.5104 0.4202 0.32 jiajia 1.6392 0.8504 0.4188 0.2172 0.1288 zhenjiang 1.8616 1.397 1.0034 0.7098 0.4908 xiaoerhei 0.3738 0.2894 0.2264 0.1744 0.1264 black rice 0.3126 0.2638 0.2172 0.1716 0.128 wavelet 270 275 280 285 shuita 0.7336 0.8278 0.8432 0.7798 jiajia 4.1678 4.6024 4.119 2.8532 zhenjiang 2.1162 2.3998 2.4702 2.2708 xiaoerhei 0.5884 0.6388 0.617 0.5178 black rice 0.3858 0.4016 0.3926 0.3592 wavelet 315 320 325 330 shuita 0.2094 0.106 0.0608 0.039 jiajia 0.0758 0.0454 0.0296 0.0228 zhenjiang 0.3058 0.1506 0.0886 0.0688 xiaoerhei 0.081 0.042 0.0264 0.0204 black rice 0.0852 0.044 0.0282 0.02 Table 5: The Identification Accuracy of BP Neural Network Xiaoerhei Black Name Zhenjiang grain Shuita Jiajia Rice of spicy spicy mature mature spicy vinegar vinegar vinegar vinegar vinegar vinegar Identification 100% 100% 100% 100% 100% Accuracy

Printer friendly Cite/link Email Feedback | |

Author: | Huali, Zhao; Zhixi, Li; Xuemei, Yang; Baoan, Chen |
---|---|

Publication: | Scientia Magna |

Article Type: | Report |

Geographic Code: | 9CHIN |

Date: | Sep 1, 2009 |

Words: | 2888 |

Previous Article: | An integral identity involving the Hermite polynomials. |

Next Article: | ([sigma], [tau])-derivations on Jordan ideals. |

Topics: |