Printer Friendly

Multi dimension classification method for incomplete data based on Bayesian network learning algorithm.

1. Introduction

With the rapid development of information technology and the popularity of computers and the Internet, the ability to generate and collect data from all walks of life is rapidly increasing. How to effectively organize and utilize these data to extract useful information and knowledge has become a major issue in the current information society. Machine learning and data mining is the theory and technology to research and solve this problem. These theories and technologies have been widely used in today's world of science and technology and economy. Classification is to construct a classification function or classification model (also called classifier) based on a predetermined set of data sets, and map the non-categorical data into the process of a given class. Bayesian classification method to establish in Bayesian statistics and Bayesian networks based on, can effectively deal with the incomplete data, and with the model could explain. Bayesian networks has the advantages of high precision, and is considered to be one of the optimal classification model. Study in-depth study of the naive Bayes can produce good classification effect, found that as long as the class posterior probability estimation values of the order and the true class posterior probability values of the order, you can get the correct classification results, and a posteriori probability of specific estimates regardless of the value (Nava, M., Quhe, R., Palazzesi, F., Tiwary, P., & Parrinello, M., 2015). Semi Naive Bayesian classifier, the attribute variable is divided into several groups, the related attribute variables are divided into a group, the attribute variables in different groups are considered to be conditional independent (De Vries, J. W., Hoogmoed, W. B., Groenestein, C. M., Schroder, J. J., Sukkel, W., De Boer, I. J. M., & Koerkamp, P. G., 2015). K- Dependency Bias classifier, which allows each attribute variable in addition to the class node in addition to the K parent node (Ejarque, M., Mir-Coll, J., Gomis, R., German, M. S., Lynn, F. C., & Gasa, R., 2016). At the same time, tree augmented naive Bayesian classifier, in Tan, class variables no parent node and each attribute variables to class variables and most another attribute variables of the parent node and all variables constitute the whole network structure is tree structure (Sonmez, E., Aydin, E., Turkez, H., Ozbek, E., Togar, B., Meral, K., ... & Cacciatore, I., 2016). The BAN and GBN using conditional independence test network learning method is improved, so that the classification effect is improved (Acosta-Cabronero, J., Betts, M. J., Cardenas-Blanco, A., Yang, S., & Nestor, P. J., 2016). At the same time, the super parent Bias classifier is proposed, the model assumes that there is a property variable as the other attributes of the common parent node (called super father) (Hyakusoku, H., Sano, D., Takahashi, H., Hatano, T., Isono, Y., Shimada, S., ... & Oridate, N., 2016). Recently, the qualifying double level Bayesian classification model, attribute set points for the two sub set, the attributes of a subset of attributes as another sub concentrated parent node, and classification effect is better in the Tano (Wang, L., Duan, Q., Yang, F., & Wen, S., 2015). The redundant or irrelevant attributes of incomplete data can not only reduce the classification efficiency but also can seriously damage the classification effect(Mora, A. D., & Fonseca, J. M., 2014). Based on the packing method, two selective incomplete data classifiers are proposed. First, the classification effect is more prominent than the incomplete data classifier RBC and search results are good and the relative low complexity of the best pre search method, constructed a selective incomplete data classifier SRBC. Compared with the efficient RBC and SRBC, DBCI can not only obtain significantly higher classification accuracy, but also significantly reduce the number of redundant and irrelevant attributes. 2

2. Bias Network

Consider a set of U={[X.sub.1], [X.sub.2], ..., [X.sub.n]}, which is composed of a finite number of discrete random variables, each of which is a finite number of [X.sub.i]. Bias network is used to represent the probabilistic dependency graph model of variable [X.sub.1], [X.sub.2], ..., [X.sub.n]. Formally, a Bayesian network is a two tuple B = <G,[THETA]>. Among them, G is a directed acyclic graph, nodes in the graph and the corresponding random variables [X.sub.1], [X.sub.2], ..., [X.sub.n]. The conditional dependencies between variables are expressed in a graph, which expresses the conditional independence assumption: when a variable of the parent node of a given variable is valued, the variable is independent of its non-successor nodes. [THETA] is about the network of local conditional probability of the parameter set [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is the node [X.sub.i] of the parent node set p[[alpha].sub.i] configuration for the p[[alpha].sub.i], [X.sub.i] value is the conditional probability of [x.sub.i]. (Wang, D., Wright, M., Elumalai, N. K., & Uddin, A., 2016).

[FIGURE 1 OMITTED]

From the chain rule of probability, the joint probability distribution of the U determined by the Bayesian network B is (Zhang, Y., Hu, X., Chen, L., Huang, Z., Fu, Q., Liu, Y.,... & Chen, Y., 2016):

P([X.sub.1], [X.sub.2], ..., [X.sub.n]) = [[PI].sup.n.sub.i=1] P([X.sub.i]|[X.sub.1], [X.sub.2], ..., [X.sub.i=1]) = [[PI].sup.n.sub.i=1] p ([x.sub.i]|p[[alpha].sub.i]) (1)

For a set of specific values <[X.sub.1], [X.sub.2], ..., [X.sub.n]> of the variable tuple <[x.sub.1], ..., [x.sub.n]>, the joint probability value P ([x.sub.1], ..., [x.sub.n]) can be calculated by the following formula:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (2)

Usually, the Bayesian network is constructed in 3 steps:

The first step is to determine the variables associated with the problem and determine all possible values of each variable according to the observed values.

The second step, to determine the structure of the Bayesian network: domain knowledge or through learning through the data to determine the dependency relationship between variables, the establishment of a depict variables conditional independence relations of directed acyclic graph (DAG). For each variable [X.sub.i], if there is a subset of [[PI].sub.i] [subset or equal to]{[X.sub.1], [X.sub.2], ..., [X.sub.i-1]} such that [X.sub.i] and {[X.sub.1], [X.sub.2], ..., [X.sub.i-1]} \ [[PI].sub.i] are conditionally independent, that is (Rossi, A., Pederiva, F., Santos, R., Wood, S., & Humphrey, G., 2015):

P([X.sub.i] |[X.sub.1],[X.sub.2], ..., [X.sub.i-1]) = P([X.sub.i] |[[PI].sub.i]), i = 1,2,..., n (3)

From formula (1) and (3),

P([X.sub.1], [X.sub.2], ..., [X.sub.n]) = [[PI].sup.n.sub.i=1] P([X.sub.i]|[[PI].sub.i]) (4)

If P[[alpha].sub.i] is used to represent the parent node set of the variable [X.sub.i], then,

P ([X.sub.1], [X.sub.2], ..., [X.sub.n]) = [[PI].sup.n.sub.i=1] P([X.sub.i]|P[[alpha].sub.i]) (5)

Thus, the [[PI].sub.i] can be used as the parent node set of [X.sub.i]. Therefore, in order to determine the structure of the Bayesian network, the need to: (1) the variable [X.sub.1], [X.sub.2], ..., [X.sub.n] in order to sort the order of a certain order; (2) to meet the formula (3) of variable subset [[PI].sub.i] (i = i, 2,..., n).

The third step, to determine the parameters of Bayesian networks: the specified or through learning to get the parameters of the local conditional probability set [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Obviously, second steps and third steps are the key to construct Bayesian networks. These steps may need to be repeated alternately, and a simple sequential execution is often difficult to get accurate Bayesian networks.

3. Bayesian Network Learning Under the Condition of Incomplete Data

3.1. Parameter Learning Under the Condition of Complete Data

In general, it is difficult to deal with the problem of parameter learning in incomplete data by means of accurate method, and some approximate methods are usually adopted (Zivadinov, R., Cerza, N., Hagemeier, J., Carl, E., Badgett, D., Ramasamy, D. P.,... & Ramanathan, M., 2016).

EM algorithm is one of the most commonly used methods of parameter learning in incomplete data. It gets more and more optimized parameters by "seeking" and "taking the maximum" two processes. "Expectation" process calculation of incomplete sample data in each event under the conditions of sufficient statistics expectation value; and the "maximum" using the "expectation" process to obtain the expected sufficient statistics to find the likelihood values (or experimental values) the maximum number of parameters. A likelihood value (or a posteriori) reaches the local maximum, stop iteration. The specific steps of the EM algorithm are as follows:

1. set an initial value for the [[theta].sub.m] (which can be set randomly).

2. to calculate the expected full statistics of each event in the incomplete data set.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (6)

Wherein, N is the number of examples in the data set A, [y.sub.l] is the first instance of the data set L, may contain missing values. When all variables in [X.sub.i] and P[[alpha].sub.i] in the [y.sub.l] are not missing value, P([x.sup.k.sub.i],P[[alpha].sup.j.sub.i]|[y.sub.l],[[theta].sub.m],m) or 1 ([x.sup.k.sub.i],P[[alpha].sup.j.sub.i] configuration in the [y.sub.l]) or ([x.sup.k.sub.i], P[[alpha].sup.j.sub.i] configuration not in the [y.sub.l]). Otherwise, you can use the current parameters to seek the probability value.

3. using the current expectation sufficient statistics to make the maximum likelihood estimation of the parameters of the likelihood (or posterior):

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (7)

If the maximum a posteriori calculation:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (8)

Wherein, [r.sub.i] is the number of B values, [[alpha].sub.ijk] is a priori parameters. Iterative second steps and third steps, [[theta].sub.m] can converge to the maximum likelihood (or posterior) local maximum.

3.2. Structural Learning Under Incomplete Data

In the incomplete data under the condition of the network structure learning to than in the complete data is much more complex. At this time, scoring function cannot generally be decomposed into only with partial structure factor, resulting in the network structure of evaluation as complete data as simple, and the need to perform some of the reasoning process. And when the optimal parameters are given to the network structure, it is necessary to use the EM algorithm or the gradient descent method to optimize the parameters. Therefore, the network structure learning under the condition of incomplete data has a very high computational complexity. In this regard, the most influential research work is the SEM Friedman algorithm. EM algorithm will be the network structure of the greedy search process and the use of EM algorithm to carry out the process of parameter evaluation alternately. The main advantage of this algorithm is that in the search process, only the selected network structure parameters are estimated by EM algorithm, and the EM algorithm is not used in the network which is not selected. After that, the network of the parameters is used to evaluate all the candidate structures. So every evaluation of the current network of the "neighbor set", only call a EM algorithm, greatly reducing the computational complexity. If the maximum likelihood estimation of a candidate structure is higher than the current network structure, the candidate structure is used to replace the current structure. The process has been carried out until a better network cannot be found (Eldridge, W. J., Sheinfeld, A., Rinehart, M. T., & Wax, A., 2016).

4. Packing Method

Packing method is one of the main methods of attribute selection. Figure 2 describes the process of using the wrapper to select the attributes.

The attribute selection algorithm based on the packing method uses the given classification algorithm (such as the Naive Bayesian classifier) to evaluate the attribute subset, and then search the attribute space (set by the attribute subset). For each candidate attribute subset S, the classification accuracy rate F (S) is used as the evaluation value of s on the S. In the search for the attribute space, it is necessary to determine the following aspects: the search direction, the initial attribute subset, the initial optimal evaluation, the search mode and the termination condition (Soliman, R., Fouad, E., Belghith, M., & Abdelmageed, T., 2016).

[FIGURE 2 OMITTED]

Search direction in general there are three, one is before to search: according to the attributes of some evaluation indicators have been selected to select the best attribute is added to the selected attributes, the selected attribute increasing; second is to search: according to some evaluation indicators constantly from the temporary retention of properties delete attributes, to retain the property will continue to reduce. Three is a two-way search: the current attribute subset of the same time the implementation of the two operations, which is not to add the attributes of the selected and delete temporarily retained properties.

Set the initial attribute subset of common ways: one is set to the empty set, in the former to search generally used in this setting; the second is arranged as a collection of the entire property, in the post to the search often use this setting; the third is randomly assigned to a subset of attributes, in the bidirectional search often with this setup. In order to start the search process to be carried out, the initial optimal evaluation should not be too high, generally set to 0 or the maximum value of the maximum category of training (Li, J., Li, W., Yin, H., Zhang, B., & Zhu, W., 2015).

There are a variety of search methods; in general there are exhaustive search, heuristic search and random search. The exhaustive search to evaluate all subsets of attributes, generally do not need to terminate condition. This search method only applies to a very small number of attributes. Heuristic search is the vast majority of cases the search way, usually need to be arranged termination conditions to avoid exhaustive search. Random search is a relatively new search method, in which the candidate attribute subset is randomly generated. In order to get a good search results, and sometimes the candidate attribute subset plus some limit parameters. The random search often set a maximum number of cycles, in order to avoid the degradation of the exhaustive search.

5. Selective Incomplete Data Classifier Srbc

To packaging method based on construct selective classifiers for incomplete data. It is necessary to determine such aspects: the incomplete data classifier, search mode, search direction, initial attribute subset, and optimal initial evaluation value and termination conditions.

5.1. Selection of Incomplete Data Classifier

The classification accuracy of the classifier constructed on a subset of attributes is used as an evaluation index of the subset of the attributes. Thus, it is easy to construct a classifier for each attribute subset, so that the computational complexity of the whole process is very high. Therefore, based on the packaging method to build a selective incomplete data classifier, it should be required that the efficiency of the incomplete data classifier to be relatively high. In addition, the classification accuracy rate of course the higher the better (Ciancanelli, M. J., Abel, L., Zhang, S. Y., & Casanova, J. L., 2016).

For existing capable of dealing with incomplete data for the classifier,, such as naive Bayes classifier and C4.5 decision tree, the incomplete data is usually simply discarded contain missing values of data items, or for different variables are respectively provided with a specific value. These two kinds of incomplete data processing methods, it is often difficult to produce an ideal classification results. Therefore, the classifier based on the incomplete data processing method is not an ideal choice for the construction of the selective incomplete data classifier based on packaging method.

In the use of probabilistic parameter optimization method to construct incomplete data classifier, the EM algorithm and Gibbs sampling algorithm are used to approximate the optimization algorithm, and these algorithms require MAR (at Random missing) hypothesis. Unfortunately, there is no way that a particular data set whether to satisfy the Mar assumption, and when this assumption is not met, these approximate optimization algorithm in terms of accuracy will be significantly decreased, so to construct the classifier accuracy will decline. In addition, such as EM algorithm, Gibbs sampling algorithm is to estimate the parameters through the loop iteration; the complexity of the algorithm is generally higher. Therefore, this method is not suitable for constructing a selective incomplete data classifier, regardless of the efficiency of the algorithm and the classification results.

In addition, the RBC classifier proposed by Ramoni and Sebastiani is a Bias classifier constructed directly from the incomplete data set. The classifier does not require the missing data to satisfy the MAR hypothesis, and has high classification efficiency and classification performance. Therefore, RBC is an ideal choice for constructing selective classifiers based on packing method.

5.2. Determinations of Other Factors

When selecting the search method, the computational complexity of the packing method is very high, so it is better to choose the heuristic search method. The two main kinds of heuristic search method, mountain climbing method and best first search method, mountain climbing search efficiency to slightly higher, but stability and search results as best first search method, hill climbing method it is easy to fall into local extreme point. As a result, we use the best first search method to construct the classifier with incomplete data.

Packaging method to classifiers built on a subset of attributes classification accuracy as the evaluation index on a subset of the attributes and properties of sub concentrated attribute number less, in the subset construction classifier used in less time. From the point of view of the search direction, before to search general to the empty set as the initial selection of the attribute subset, followed by adding attributes, concentrate was evaluated in terms of the properties of sub attributes the number gradually increased. Thus, the forward search is more efficient than the backward search and the bidirectional search. In this way, we decide to use forward optimal search method. Corresponding, the termination condition is the best first search method is the default termination condition: if continuous t (t is given in advance of a parameter) of the previous optimal attributes subset expansion did not further improve the highest classification accuracy, the search process is over.

In order to make the search process more compact, set the initial subset of attributes for a single attribute subset (i.e., a single attribute constitute a subset of), the RBC classifier constructed in this subset of classification accuracy is higher than that in other single attribute subset. At the same time, the corresponding classification accuracy is used as the initial value.

In conclusion, through the above analysis, we choose to use the RBC classifier and SRBC Bayes Classifiers (Robust), which is based on the selective incomplete data classifier, which is based on the packing method.

5.3. Srbc Algorithm Description

Given incomplete data set D as a training set, assuming that n has a total of D attributes. [OMEGA] = {[A.sub.1], [A.sub.2], ..., [A.sub.n]} is the entire set of attributes, [S.sub.b] represents the current best subset. Q for a queue, used to store once is the best attribute subset and its corresponding classification accuracy, [S.sub.h] is the Q of the head node corresponding to the attribute subset. F (S) indicates the classification accuracy of RBC on the attribute subset [F.sub.max] is the highest classification accuracy for the current S. Threshold T is used to control the search process is stop parameters, namely if continuous t time of Q head node expansion did not further improve the highest classification accuracy, the search process is over. Figure 3 describes the construction process of the SRBC.

Algorithm SRBC can be described as follows:

1. Initialization: set the parameter T; make the integer t [left arrow] 0, attribute [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] the current highest classification accuracy [F.sub.max] [left arrow] F ({[A.sub.i]}) attribute subset {[A.sub.s]} together with F ({[A.sub.i]}) as a node to join the queue Q.

[FIGURE 3 OMITTED]

2. When the t < T executes the steps (3), (4) and (5), otherwise, the execution steps (6).

3. Take out the Q header node, remember its corresponding attribute subset for [S.sub.h], so added = false (added mark in the extension of the head node of the Q, whether to add a new node to the Q); for each attribute A [member of][OMEGA]- [S.sub.h], if the [S.sub.h] [union]{ A} has not been evaluated, and F ([S.sub.h] [union] {A}) > [F.sub.max]. Then make added = true, [S.sub.b] = [S.sub.h] [union] {A}, [F.sub.max] = F ([S.sub.h] [union] {A}) and t=O; [S.sub.b] together with the F([S.sub.b]) as a new node, and in accordance with the classification accuracy rate from high to low in the order to join the queue Q.

4. If added = false, then t [left arrow] t +1.

5. Go to step (2) to continue.

6. Construct a RBC classifier on the final subset of the attributes of the [S.sub.b].

It should be pointed out that, since the RBC classifier only considers the character type attribute variables, the selective classifier constructed on the basis of SRBC can only deal with the attribute of this type. Discretization of the data set that contains the numeric attributes.

6. Comparison Test

This experiment compares the SRBC with the classification efficiency and the efficiency of the incomplete data classifier RBC and DBCI. Compare the classification accuracy of these 12 standard incomplete data sets, and investigate the screening effect of SRBC on redundant attributes and irrelevant attributes, that is, to investigate the attributes set selected on each data set by SRBC.

The test data set is the incomplete data set of the UCI machine learning knowledge base. In the implementation of the algorithm SRBC, the parameters of the Weka T system to take the default value of T=5. When evaluating the subset of each attribute, the Weka system is used for the 5--fold cross validation. Numerical attribute, using the Weka system of numerical data discretization of the procedures for discrete. In order to compare the classification accuracy of SRBC, RBC and DBCI, the average classification accuracy of the three classifiers on each data set is compared in the form of a graph.

[FIGURE 4 OMITTED]

In addition, RBC, DBCI and SRBC are listed in table 10. The average classification accuracy and the corresponding standard deviation of 10 times of 1 weight cross validation on each data set are also listed. In order to compare the overall situation of the classification results, the total average and standard deviation of the classification accuracy on the 12 data sets are listed at the bottom of the table. In each data set, the higher classification accuracy is in bold. In order to investigate the effect of SRBC on the selection of redundant and irrelevant attributes, table 2 lists the number of attributes selected on each data set by SRBC.

From Figure 4 and table 1 can be seen, SRBC in all experimental data set on the 10 data sets, the classification accuracy rate is significantly higher than the RBC and DBCI classification accuracy. The average accuracy of SRBC on the 12 data sets was 4.34% and 4.25% higher than that of RBC and DBCI, respectively. Especially in the data set Cancer L., SRBC classification accuracy than RBC and DBCI classification accuracy rate is higher than 24.19%.

The reason why the data set Cancer L. classification accuracy will be improved so much, in addition to the role of the algorithm SRBC itself is also related to the characteristics of the data set Cancer L. itself. Cancer L. a total of 32 examples, and the number of attributes but there are 56. In general, when the instance number and the number of attributes than is too small, to estimate the class conditional probability of each attribute variables, and the probability of the class variable estimation will become very imprecise and use these estimates to obtain the classification results will be very accurate. When the SRBC delete partial attribute to attribute reduction of several, relatively speaking, is equivalent to the number of instances is increased, so that the probability estimates become more precise, also makes the classification accuracy may be improved. At this time, the superiority of SRBC will be more significant.

Also through the comparison of the three classifiers in each data set of standard deviation can be found, in 12 data from nine data sets SRBC standard deviation less than RBC, is also on nine data sets, the standard from the contrast of DBCI low. The average value of the standard deviation of SRBC on the 12 data sets is also lower than that of RBC and DBCI. This shows that the classification performance of RBC is more stable than SRBC and DBCI.

By investigating the properties of SRBC in Table 2, we can find that SRBC can significantly reduce the number of attributes in almost every data set of 12 data sets. Especially in the data set Arrhythmia contains 279 attributes, SRBC only select the 11 attributes. Overall, the total number of 12 data sets contains 602, and the total number of attributes selected by the SRBC is 83. Therefore, SRBC can simplify the data set to a large extent, which can significantly improve the efficiency of the classifier.

7. Conclusions

With the development of modern information technology, especially the Internet technology, a large number of high dimensional data represented by text data are emerging. And naive Bayes is simple and efficient, is suitable for dealing with the high dimensional data, and the attribute selection is very sensitive, so for the study of selective Bayesian classification algorithm for high dimensional data is a very important research topic. In the past, although many researchers have studied the attribute selection of high dimension data, especially the text data, the research work of Bias classifier is very little. The redundant or irrelevant attributes of incomplete data can not only reduce the classification efficiency but also can seriously damage the classification effect. Based on the packing method, two selective incomplete data classifiers are proposed. First, the classification effect is more prominent than the incomplete data classifier RBC and search results are good and the relative low complexity of the best pre search method, constructed a selective incomplete data classifier SRBC. Compared with the efficient RBC and SRBC, DBCI can not only obtain a significantly higher classification accuracy, but also significantly reduce the number of redundant and irrelevant attributes. The incomplete data classifier constructed in this paper is very important for the classification of incomplete high dimensional data SRBC.

Recebido/Submission: 11/9/2015

Aceitacao/Acceptance: 1/11/2015

References

Acosta-Cabronero, J., Betts, M. J., Cardenas-Blanco, A., Yang, S., & Nestor, P. J. (2016). In Vivo MRI Mapping of Brain Iron Deposition across the Adult Lifespan. The Journal of Neuroscience, 36(2), 364-374.

Ciancanelli, M. J., Abel, L., Zhang, S. Y., & Casanova, J. L. (2016). Host genetics of severe influenza: from mouse Mx1 to human IRF7. Current Opinion in Immunology, 38, 109-120.

De Vries, J. W., Hoogmoed, W. B., Groenestein, C. M., Schroder, J. J., Sukkel, W., De Boer, I. J. M., & Koerkamp, P. G. (2015). Integrated manure management to reduce environmental impact: I. Structured design of strategies. Agricultural Systems, 139, 29-37.

Ejarque, M., Mir-Coll, J., Gomis, R., German, M. S., Lynn, F. C., & Gasa, R. (2016). Generation of a Conditional Allele of the Transcription Factor Atonal Homolog 8 (Atoh8). PloS one, 11(1), e0146273.

Eldridge, W. J., Sheinfeld, A., Rinehart, M. T., & Wax, A. (2016). Imaging deformation of adherent cells due to shear stress using quantitative phase imaging. Optics Letters, 41(2), 352-355.

Hyakusoku, H., Sano, D., Takahashi, H., Hatano, T., Isono, Y., Shimada, S.,... & Oridate, N. (2016). JunB promotes cell invasion, migration and distant metastasis of head and neck squamous cell carcinoma. Journal of Experimental & Clinical Cancer Research, 35(1), 1.

Li, J., Li, W., Yin, H., Zhang, B., & Zhu, W. (2015). Effect of cadmium on TET enzymes and DNA methylation changes in human embryonic kidney cell. Zhonghua yu fang yi xue za zhi Chinese journal of preventive medicine, 49(9), 822-827.

Mora, A. D., & Fonseca, J. M. (2014). Metodologia para a detecao de artefactos luminosos em imagens de retinografia com aplicacao em rastreio oftalmologico. RISTI Revista Iberica de Sistemas e Tecnologias de Informacao, 2014(13), 51-63.

Nava, M., Quhe, R., Palazzesi, F., Tiwary, P., & Parrinello, M. (2015). De Broglie swapping Metadynamics for quantum and classical sampling.Journal of chemical theory and computation, 11(11), 5114-5119.

Rossi, A., Pederiva, F., Santos, R., Wood, S., & Humphrey, G. (2015). Angiomyxoma In accessory hepatic lobe. APSP Journal of Case Reports,7(1), 10.

Soliman, R., Fouad, E., Belghith, M., & Abdelmageed, T. (2016). Conventional hemofiltration during cardiopulmonary bypass increases the serum lactate level in adult cardiac surgery. Annals of Cardiac Anaesthesia, 19(1), 45.

Sonmez, E., Aydin, E., Turkez, H., Ozbek, E., Togar, B., Meral, K.,... & Cacciatore, I. (2016). Cytotoxicity and genotoxicity of iron oxide nanoparticles: An in vitro biosafety study. Archives of Biological Sciences, (00), 6-6.

Wang, D., Wright, M., Elumalai, N. K., & Uddin, A. (2016). Stability of perovskite solar cells. Solar Energy Materials and Solar Cells, 147, 255-275.

Wang, L., Duan, Q., Yang, F., & Wen, S. (2015). The origin and onset of acute venous thrombus. Int J Clin Exp Med, 8(11), 19804-19814.

Zhang, Y., Hu, X., Chen, L., Huang, Z., Fu, Q., Liu, Y.,... & Chen, Y. (2016). Flexible, hole transporting layer-free and stable CH 3 NH 3 PbI 3/PC 61 BM planar heterojunction perovskite solar cells. Organic Electronics, 30, 281-288.

Zivadinov, R., Cerza, N., Hagemeier, J., Carl, E., Badgett, D., Ramasamy, D. P.,... & Ramanathan, M. (2016). Humoral response to EBV is associated with cortical atrophy and lesion burden in patients with MS.Neurology-Neuroimmunology Neuroinflammation, 3(1), e190.

Songjuan Zhang (1)

zsj120481@126.com

(1) College of Computer and Information Engineering, Nanyang Institute of Technology, 473004, Nanyang, China.

DOI: 10.17013/risti.17B.27-39
Table 1--Classification Accuracy of RBC, DBCI and SRBC

                           Classification accuracy

Serial    Data set         RBC
number

1         Annealing        95.96 [+ or -] 0.31

2         Arrhythmia       72.77 [+ or -] 0.89

3         Audiology        67.99 [+ or -] 0.79

4         B.caneer         97.11 [+ or -] 0.11

5         Bridges          61.62 [+ or -] 2.20

6         Credit           86.18 [+ or -] 0.40

7         Cylinder         71.36 [+ or -] 0.46

8         Echocardiogram   98.36 [+ or -] 0.87

9         Horse-colic      85.20 [+ or -] 0.59

10        L.cancer         56.13 [+ or -] 1.60

11        Mushroom         95.96 [+ or -] 0.02

12        Vote             90.25 [+ or -] 0.19

Total                      81.57 [+ or -] 0.72
average
or sum

          Classification accuracy

Serial    DBCI                  SRBC
number

1         92.74 [+ or -] 0.30   91.59 [+ or -] 0.12

2         73.13 [+ or -] 0.70   75.01 [+ or -] 0.62

3         68.64 [+ or -] 0.89   76.53 [+ or -] 0.42

4         97.02 [+ or -] 0.06   97.31 [+ or -] 0.11

5         64.00 [+ or -] 1.61   66.10 [+ or -] 1.04

6         85.70 [+ or -] 0.37   86.65 [+ or -] 0.30

7         75.37 [+ or -] 1.19   76.02 [+ or -] 0.55

8         97.26 [+ or -] 0.02   97.26 [+ or -] 0.01

9         83.71 [+ or -] 0.54   88.09 [+ or -] 0.39

10        56.13 [+ or -] 1.62   80.32 [+ or -] 3.86

11        95.93 [+ or -] 0.02   99.68 [+ or -] 0.04

12        90.18 [+ or -] 0.25   96.31 [+ or -] 0.00

Total     81.65 [+ or -] 0.63   85.91 [+ or -] 0.62
average
or sum

Table 2--Numbers of Selected Attributes Of SRBC

Data set      Original    Number of    Data set
              attribute   selected
              number      attributes

Annealing     38          8            Cylinder
Arrhythmia    279         11           Echocardiogram
Audiology     70          12           Horse-colic
B.caneer      10          9            L.cancer
Bridges       12          6            Mushroom
Credit        15          10           Vote

Data set      Original    Number of
              attribute   selected
              number      attributes

Annealing     39          9
Arrhythmia    12          3
Audiology     27          5
B.caneer      56          5
Bridges       22          3
Credit        16          3
COPYRIGHT 2016 AISTI (Iberian Association for Information Systems and Technologies)
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2016 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Zhang, Songjuan
Publication:RISTI (Revista Iberica de Sistemas e Tecnologias de Informacao)
Date:Mar 30, 2016
Words:5441
Previous Article:Research on QoS flow control method based on OpenFlow technology.
Next Article:On the stereoscopic multi-dimension network platform of ideological and political work in university.
Topics:

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters