Printer Friendly

Automatic ontology construction through decision tree classification techniques.

INTRODUCTION

Hepatitis, the fifth most death causing diseases after heart disease, stroke, chest disease and cancer (Mougiakakou, Valavanis and Nikita, et al, 2009) causes 1.5 million death worldwide each year. Various risk factors for Hepatitis includes blood transfusions, tattoos and piercing, drug abuse, haemodialysis, health workers, and sexual contact with hepatitis carrier (Shankaracharya, Kumari and Vidyarthi, 2012). Early stage diagnosis is very difficult in general population due to the lack of regular routine check up as well as awareness. Hence diagnosis totally depends on visual task done by expert doctors based on their expertise. Hence ontology construction for Hepatitis diagnosis can help serve this situation.

Ontology is defined as a formal explicit specification of conceptualization of a domain and its relationships (Asuncion Gomez-Perez, Mariano Fernandez-Lopez and Oscar Corcho, 2004). Though ontology has many-fold applications, construction of ontology structure still remains a complex task. The first step in ontology construction is building a framework. The main components of core of building an ontology structure includes domain identification concepts and relationships identification in the domain of interest. Taxonomical Hierarchy structure, organized by Super-sub concept relationship which includes Classes, Sub-Classes, Super-Classes, Category definition and inter-relationship definition is created. Hence it would be highly useful if the process of Ontology building could be automated.

In this proposed method, automatic ontology construction is attempted through data mining techniques and subsequently diagnosing hepatitis. This paper is organized as follows: Section 1 presents the study of related work. Section 2 details on the proposed methodology for automatic ontology constructions using classification techniques. Section 3 concludes the paper and also gives direction on the future research in this area.

1. Related Work:

Only a few attempts have been made to automatically group the concepts, some of which are summarized below COBWEB clustering algorithm was adopted by software agents to automatically generate concepts for music domain (Clerkin, Cunningham and Hayes, 2002). Structured knowledge was created for gene-product using iterative statistical information extraction in combination with nearest neighbour clustering (Blaschke, and Valencia, 2002). Formal Concept Analysis was used to formally abstract data as conceptual structures (Quan, Hui, Fong and Cao, 2004). A further refinement to Formal Concept Analysis was made in (Ganter, Stumme and Wille, 2005) by incorporating fuzzy logic in it to deal with uncertainties in data and interpret the concept hierarchy. The fuzzy formal concept analysis was used in automatic generation of ontology for scholarly semantic web. TextOntEx constructs ontology from natural domain text using semantic pattern-based approach, and analyze natural domain text to extract candidate relations, and map them into meaning representation to facilitate ontology representation (Wuermli, Wrobel, Hui and Joller, 2003). Based on the data mining outputs from rule sets and decision trees, Ontologies were built automatically. RDF, RDF-S and DAML+OiL were used for defining Ontologies (Dahab, Hassan and Rafea, 2007).

In this work automatic ontology building is attempted through the rules generated by the classification algorithm. The following section discusses the Hepatitis dataset and the proposed methodology.

2. Proposed Method of Automatic ontology construction:

The dataset used for processing the proposed methodology is detailed here. This hepatitis disease dataset deals with whether patients with hepatitis will either live or die. The used data source in this study was taken from UCI machine learning repository. The purpose of the dataset is to predict the presence or absence of hepatitis disease given the results of various medical tests carried out on a patient.

The proposed framework is given in Fig 1. The proposed framework consists of domain identification, data pre-processing, Building Decision tree and OWL construction.

[FIGURE 1 OMITTED]

Domain Identification:

For the construction of the current ontology framework, the domain is chosen as hepatitis dataset, which is obtained from UCI machine learning repository [http://archive.ics.uci.edu/ml/datasets/Hepatitis]. The dataset contains 155 instances distributed between two classes (die, live) die with 32 instances and live with 123 instances. There are 19 features or attributes, 13 attributes are binary while 6 attributes with 6-8 discrete values and some missing data. The goal of the dataset is to forecast the presence or absence of hepatitis virus.

Data Pre-Processing:

The hepatitis domain data is given as input to the data pre-processing step. To make data processing interoperable between WEKA and OWL, the blank spaces in the attribute names are removed. This dataset contains some missing values. The existing classifiers itself had procedure for handling the missing values. In the case of J48 classifier, any split on an attribute with missing value will be done with weights proportional to frequencies of the observed non-missing values (Ian Witten, Eibe Frank and Mark AFrank's, 2005). The pre-processed data is considered further for classification.

Classification:

Classification is used to classify data into predefined class labels. Class in classification is the attribute or feature in a data set, in which users are most interested. Classification can be used to diagnose hepatitis and prognosis based on symptoms and health conditions (Shomona Gracia Jacob et al., 2012). Decision tree learning is one of the most widely used techniques for classification (Geetha Ramani and Jacob, 2013). In the present study, three different state-of-art supervised machine learning algorithms namely J48, Rep Tree and Random tree algorithm were analyzed. J48 implements C4.5 decision tree learning algorithm (Quinlan, 1993- Esposito, Malerba and G. Semeraro, 1997). In this proposed method J48 algorithm serves to be the best one with the highest accuracy of 89.13% through the validation procedure namely viz. percentage split with the proposition of 70-30 (70% of the data is used for training and 30% of the data) is utilized for testing. The results are tabulated in Table 2. Hence rules (in the form Decision tree) generated through J48 classification algorithm (Shomona Gracia Jacob and Geetha Ramani, 2012) was used for building the Ontology structure.

WEKA Decision Tree:

A WEKA Decision tree was evolved and serialized into dot format using WEKAAPI to read the document and create the ontology using the OWLAPI. We used J48 decision tree algorithm to discover and extract knowledge from structure data. Then we build ontology from the generated decision tree.

OWL Construction:

OWL construction using Java in Eclipse .We integrates the Java with OWLAPI. The implementation of this work was carried out using WEKA 3.6.10, an open source data mining tool and Protege 5, open source tool for ontology framework creation. To extend the decision tree used to automatically construct of extend branch and terminal branch of ontology. The process of construction is given as Pseudo code.
Pseudo code:

Begin
Loop for each (object e of edges-array)
Edge e1 = e as Edge
Node head = Nodes(e1.head) as Node
Node tail = Nodes(e1.tail) as Node
if(head. degree is greater than 1)
//extend branch
owl:extendBranch(head,tail,e1);
 else
//build terminal branch
   owl:terminateBranch(head,tail,e1);
 end if
End for
// Generating OWL structure--Extending at head node.
    superCls = tail.label_with tail.ID;
    subCls = head.label with head.ID;
//node specific descriptor
   Class node_descriptor = tail.label with tail.ID;
//generalized descriptor
   OWLClass descriptor =tail.label;
//apply label annotations
   RDFSLabel:superCls,tail.label with tail ID);
   RDFSLabel:subCls,head.label with head.ID);
//make tail node parent of head node
   Subclass(supercls,subcls);
//make descriptor child of descriptor class.
   SubClass of node_descriptor);

// Generating OWL strcuture--Terminating at leaf node
   superCls = tail.label with tail.id
   node_descriptor = tail.label with tail.ID;
   descriptor = tail.label;
generalcategory = new (owl:Class("Category");
//apply label annotations
   RDFSLabel of superCls,of tail.label and tail.ID;
   RDFSLabelof node_descriptor,of tail.label with
   tail.ID;
   RDFSLabel of (descriptor,t.label);
//make tail node parent of head node
  SubClass(superCls, scategory);
//make descriptor child of descriptor class.
  SubClass(new owl:Class("Descriptor"),n_descriptor);
  SubClass(n_descriptor,descriptor);
  SubClass(generalcategory, category);
  SubClass(category, scategory);


The automatic ontology construction of hierarchical structure, consisting of a set of classes organized in a structured manner to represent the domain's salient classes, a set of slots associated to classes to describe their properties and relationships, and a set of instances of those classes. In OWL, classes are interpreted as sets of sub classes. The hierarchy structure of Hepatitis is depicted in Fig 2. Automatically construct a OWL structure generation, which consists of nodes.

[FIGURE 2 OMITTED]

[FIGURE 3 OMITTED]

In Extend branch represents head node denoted as Descriptor and Terminal branch represents the tail node denoted as Category. Each and every node assigned as super class (superCls) and sub class (subCls) with their node identification (ID) value like N0,N1...so on. In extend branch, super class will create a tail label (tail.label) and tail identification (tail.ID). Subsequently sub class, create a head label (head.label) and head identification (head.ID). Based on the domain, Ascites_N0 is assigned as a root node with the head Label(Ascites) with head.ID (N0). Super class of the node Ascites_N0 have two subclass as Spiders_N1 and Albumin_N24. Similarly Sub Classes are assigned to other Super Classes.

In Terminal branch, have predict the presence (Die) and absence (Live) of the disease. Each branch in the decision tree may have a set of leaves. Each leaf in the decision tree represents a classification rule as well as target class (Category). Based on the two branches, Automatic construction ontology from the extracted knowledge represented in the decision tree was shown in the Fig 2.

Each and every node has, is-a hierarchy relationship between the Super class and Subclass. The building ontology structure can visualize in OWLViz. The Fig 3 shows the OWL visualize tree .The automatic ontology construction for Hepatitis disease through J48 would assist the medical practitioners to great extent. The automatic ontology construction provided the optimal solution and could provided efficient inference.

3. Conclusion:

Hepatitis is one of the most death causing diseases whose diagnosis still remains challenging for the medical practitioners. Thus, application of computational approaches for Hepatitis prognosis is of great demand.. Since manual Ontology construction is a complex task, automatic ontology construction techniques are sought. In this paper, automatic ontology construction is attempted through the decision trees generated by the classification techniques. To get the optimal decision tree, different classification algorithms were investigated, out of which J48 performed the best yielding an accuracy of 89.13%. The generated rules are used for automatic construction of Ontology .The evolved ontology will have optimal set of important Descriptor and Category that will aid the diagnosis of Hepatitis. This method of Ontology structure construction would be great help to the medical community as well as various domain areas.

ARTICLE INFO

Article history:

Received 12 October 2014

Received in revised form 26 December 2014

Accepted 1 January 2015

Available online 25 February 2015

REFERENCES

Asuncion Gomez-Perez, Mariano Fernandez-Lopez, Oscar Corcho, 2004. Ontological Engineering: With Examples from the Areas of Knowledge Management, E-commerce and the Semantic Web, Springer, 1: 5-10.

Blaschke, C., A. Valencia, 2002. Automatic Ontology Construction from the Literature", Genome Informatics, 13: 201-213.

Clerkin, P., P. Cunningham and C. Hayes, 2002. Ontology Discovery for the Semantic Web Using Hierarchical Clustering, Trinity College Dublin, Ireland.

Dahab, M.Y., H. Hassan and A. Rafea, 2007. TextOntoEx: Automatic ontology construction from natural English text, Expert Systems with Applications, Elsevier, 34: 1474-1480. DOI:10.1038/npre.2012.7093.1:Posted2.

Esposito, F., D. Malerba and G. Semeraro, 1997. A comparative Analysis of Methods for Pruning Decision Trees, IEEE transactions on pattern analysis and machine intelligence, 19(5): 476-491.

Ganter, B., G. Stumme, R. Wille, (Eds.), 2005. Formal Concept Analysis: Foundations and Applications. Lecture Notes in Artificial Intelligence, Springer-Verlag, 3626.

Geetha Ramani, R. and S.G. Jacob, 2013. Improved classification of Lung cancer tumors based on structural and physicochemical properties of proteins using data mining models. PLoS ONE, 8(3): e58772.

Ian H. Witten, Eibe Frank and Mark AFrank's textbook, 2005. Data Mining Practical Machine Learning Tools and Techniques, 2nd. Ed.

Mougiakakou, G.S., K.I. Valavanis, A. Nikita, et al. 2009. Diagnostic Support Systems and Computational Intelligence: Differential Diagnosis of Hepatic Lesions from Computed Tomography Images", IGI.

Quan, T.T., S.C. Hui, A.C.M. Fong and T.H. Cao, 2004. Automatic generation of ontology for scholarly semantic Web. In: Lecture Notes in Computer Science, 3298: 726-740.

Quinlan, J.R., 1993. C4.5:programs for machine learning: Morgan Kaufmann Publishers Inc., 302.

Shankaracharya, Kumari, S., S.A. Vidyarthi, 2012. Development of java based graphical user interface for diagnosis of hepatitis using mixture of expert, Nature proceeding.

Shomona Gracia Jacob and R. Geetha Ramani, 2012. Evolving efficient classification rules from Cardiotocography data through data mining methods and techniques. European Journal of Scientific Research, 78(3): 468-480.

Shomona Gracia Jacob, R. Geetha Ramani and P. Nancy, 2012. Efficient classifier for classification of Hepatitis C Virus clinical data through data mining algorithms and techniques. Proceedings of the International Conference on Computer Applications, Techno Forum Group, Pondicherry, India, 27-31.

UCI Machine Learning Repository, [Online]. Available: http://archive.ics.uci.edu/ml/datasets/Hepatitis.

Wuermli, O., A. Wrobel, S.C. Hui and J.M. Joller, 2003. Data Mining For Ontology Building: Semantic Web Overview, Diploma Thesis-Dep. of Computer science, Nanyang Technological University.

(1) R. Geetha Ramani and (2) S. Siva Sankari

(1) Associate Professor, Department of Information Science and Technology, College of Engineering, Anna University, Chennai-600025, India.

(2) Research Scholar, Department of Information Science and Technology, College of Engineering, Anna University, Chennai-600025, India.

Corresponding Author: R. Geetha Ramani, Associate Professor, Department of Information Science and Technology, College of Engineering, Anna University, Chennai-60025, India.
Table 1: Features of Hepatitis Dataset.

No     Feature Name        Domain values of Feature

1           Age            10,20,30,40,50,60,70,80
2           Sex                  Male, Female
3         Steroid                  Yes, No
4       Antivirals                 Yes, No
5         Fatigue                  Yes, No
6         Malaise                  Yes, No
7        Anorexia                  Yes, No
8        Liver big                 Yes, No
9       Liver firm                 Yes, No
10    Spleen palpable              Yes, No
11        Spiders                  Yes, No
12        Ascites                  Yes, No
13        Varices                  Yes, No
14       Bilirubin      0.39,0.80,1.20,2.00,3.00,4.00
15     Alk phosphate       3,38,01,20,16,02,00,250
16         SGOT            13,10,02,00,30,04,00,500
17        albumin          2.1,3.0,3.8,4.5,5.0,6.0
18        PROTIME         10,20,30,40,50,60,70,80,90
19       HISTOLOGY                 Yes, No

Table 2: Performance comparison of various classifiers.

Classifiers    Accuracy

J48             89.13%
Rep Tree        82.6%
Random Tree     78.2%
COPYRIGHT 2015 American-Eurasian Network for Scientific Information
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2015 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Ramani, R. Geetha; Sankari, S. Siva
Publication:Advances in Natural and Applied Sciences
Article Type:Report
Date:Jun 1, 2015
Words:2403
Previous Article:A novel medical support system for the social ecology of cervical cancer: a research to resolve the challenges in pap smear screening and prediction...
Next Article:Optimal clustering architecture to maximize sensor network lifetime.
Topics:

Terms of use | Copyright © 2017 Farlex, Inc. | Feedback | For webmasters