Printer Friendly

7DO: a model for ontology complexity evaluation.

1. Introduction

Ontologies are becoming increasingly important in artificial intelligence, software engineering, bioinformatics, library science, information system architecture, software agents, e-commerce, natural language processing, information query systems, knowledge management and Semantic Web applications as a form of knowledge representation about the world or some part of it. Ontologies are often defined as an explicit specification of a conceptualization [1]. A more technical definition of ontology describes it as an engineering artefact (abstract model) that provides a simplified view of a particular domain of concern and defines formally the concepts, relations, and the constraints on their use [2]. The advantages of developing and using ontologies include more effective information retrieval and analysis processes, allow communication and knowledge sharing over a domain of interest in an unambiguous way, and encourage knowledge reuse.

Though there are several knowledge representation languages available for modelling domain ontologies, the Web Ontology Language (OWL) [3] is already being used as a de facto standard ontology description language. Available ontologies are very diverse in size, quality, coverage, level of detail and complexity. Therefore it is important to evaluate important characteristics of ontologies, which would help ontology developers to design and maintain ontologies as well as help ontology users to choose the ontologies that best meet their needs [4]. As more ontologies are being developed and maintained, the issues of ontology evolution [5] also become important

Several authors have proposed using technical ontology characteristics for ontology evaluation. The OntoMetric [6] framework for ontology evaluation consists of 160 characteristics spread across five dimensions: content of the ontology, language, development methodology, building tools, and usage costs. A framework for comparing ontology schemas described in [7] is based on the following groups of ontology characteristics: design process, taxonomy, internal concept structure and relations between concepts, axioms, inference mechanism, applications, and contribution. The OntoQA [8] approach assesses quality of both ontology schemas as well as of populated ontologies (knowledge bases) through a set of metrics. These metrics can highlight key characteristics of an ontology schema as well as its population. Also a set of ontology cohesion metrics have been proposed by [4].

The novelty of this paper is a model and a collection of technical metrics (adopted or newly proposed) for evaluation of the structural complexity of ontologies. The structure of the paper is as follows. Section 2 presents a 7DO model for evaluation of complexity of OWL ontologies. Section 3 describes the complexity metrics used at different dimensions of the 7DO model. Section 4 describes the application of the proposed metrics for ontology evolution research. Finally, Section 5 presents conclusions.

2. 7DO Model for Ontology Evaluation

Domain ontologies, especially ontologies specified using OWL, are increasingly used in the process of developing information system architectures, e-Learning software, Semantic Web applications, web services [9]. So, OWL ontologies are not only documents, but also software development artefacts, too. OWL ontologies are based on XML schemas. The fact that XML schemas are software artefacts, which claim an increasingly central role in software construction projects, has been noted by [10].

Ontologies are complex artefacts, which combine structural information about domain concepts, different kinds of their relationships, classification of concepts into different hierarchies, logic reasoning on the properties and restrictions of concepts and their relationships. Therefore, we need not a single, but a collection of complexity measures for evaluation of complexity of ontology description artefacts at different ontology dimensions.

Here we distinguish between:

1) first-order properties, or characteristics, which are derived directly from the ontology description itself using simple mathematical actions such as counting, e.g., file size (count of symbols in a file) or number tags in an XML document; and

2) second-order properties or metrics, which can not be derived directly from artefacts, but are calculated from first-order properties.

Complexity is one of such metrics. Complexity metrics may be helpful for reasoning about ontology structure, understanding the relationships between different parts of ontologies, comparing and evaluating ontologies. There are many definitions of what complexity is, so there can be many different complexity metrics. Therefore, the selection of a particular complexity metric is always a subjective matter.

The common approach to measure the complexity of XML schema documents is to count the number of schema elements. Certainly, the complexity of ontology can be measured by the size of ontology (expressed in terms of file size in KB, or Lines of Code), the number of concepts in ontology, or the number of markup elements required to describe ontology. However, we do not consider size as a definitive metric of ontology complexity. First, small things can be complex, too. Second, size does not indicate the quality of ontology, but rather the scope of its domain, because a complex domain requires a larger number of concepts and their relationships to describe domain knowledge than a simple one. The metrics that measure schema's complexity by counting the number of each component do not give sufficient information about complexity of a given schema and the complexity of each independent component. Therefore, we focus on adopting or proposing new complexity metrics for ontology evaluation that are scale-free, i.e., are independent of the size of ontology.

Based on these considerations we propose a Seven Dimension Ontology (7DO) model to evaluation of OWL ontologies. The model has the following dimensions, which represent different views on ontology complexity:

1) Text: Ontology as text (sequence of symbols) with unknown syntax and structure. The only thing known is that this text describes a domain of our interest.

2) Metadata: Ontology as annotated domain knowledge. Domain knowledge is represented as a collection of domain artefacts with attached annotation metadata (labels, names, comments). Such separation of data and metadata is a first step towards creation of ontology.

3) Structure: Ontology as a structured document specified in a markup language (XML). Such document describes different domain entities as elements and properties of these entities as attributes. Separation of entities from their properties is a first analytical step towards understanding of a domain.

4) Algorithm: Ontology as a high-level program specification (algorithm), which describes a sequence of specific reasoning steps over domain knowledge. The transition from one step to other step is a functional operation specified as an XML element. An operation may have one or more operands specified as XML attributes. (The view on a markup document as a program specification is not new, e.g., XSLT is a XML-based functional programming language for XML document transformation).

5) Hierarchy: Ontology as taxonomy of things (domain concepts) arranged in a hierarchical structure. Such structure consists of classes related by subtype/supertype (inheritance/generalization) relationships. Hierarchy can be modelled in an object-oriented way using UML class diagrams, which can be used to represent ontology [11]. However such ontology has no semantics.

6) Metamodel: Ontology as a domain data metamodel described using Resource Description Framework (RDF) Schema. The RDF data model describes domain knowledge in terms of subject-predicate-object expressions. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. Such expressions describe domain knowledge formally using first-order logic.

7) Logic: Ontology as a domain knowledge representation specified using OWL. Domain knowledge is expressed in terms of a set of individuals (classes), a set of property assertions which relate these individuals to each other, a set of axioms which place constraints on sets of individuals, and the types of relationships permitted between them. Axioms provide semantics by allowing systems to infer additional information based on the data explicitly provided using Description Logics (DL). DL are decidable fragments of first-order logic, which are used to represent the domain concept definitions in a structured and formally well-understood way.

The 7DO model is summarized in Table 1.

3. Complexity Metrics at Different Dimensions of 7DO Model

We propose using the following complexity metrics for evaluating complexity at different dimensions of ontology in the 7DO model:

1) Text dimension: Relative Kolmogorov Complexity

Kolmogorov Complexity [12] measures the complexity of an object by the length of the smallest program that generates it. We have an object x and a description system [phi] that maps from a description w to this object. Kolmogorov Complexity [K.sub.[phi]](x) of an object x is the size of the shortest program in the description system [phi] capable of producing x on a universal computer:

(1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Kolmogorov Complexity [K.sub.[phi]](x) is the minimal size of information required to generate x by an algorithm. Unfortunately, it cannot be computed in the general case and must be approximated. Usually, compression algorithms are used to give an upper bound to Kolmogorov Complexity.

Suppose that we have a compression algorithm [C.sub.i]. Then, a shortest compression of w in the description system [phi] will give the upper bound to information content in x:

(2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The semantics-free complexity of OWL ontology O can be evaluated using the Relative Kolmogorov Complexity (RKC) metric, which can be calculated using a compression algorithm C as follows:

RKC = [parallel]C(O)[parallel]/[parallel]O[parallel], (3)

where [parallel]O[parallel] is the size of ontology O, and [parallel]C(O)[parallel] is the size of compressed ontology O.

A high value of RKC means that there is a high variability of text content, i.e., high complexity. A low value of RKC means high redundancy, i.e., the abundance of repeating fragments in text.

2) Metadata dimension: Annotation Richness Ontology O can be defined as a collection of statements on domain concepts with corresponding annotations (metadata) expressed symbolically: O = <(s,m)|s,m [member of] [[summation].sup.*]>, where s is a statement, m is the metadata of s, and [[summation].sup.*] is a string of symbols from alphabet [summation]. For the evaluation of ontology complexity at the metadata dimension, we propose using the Annotation Richness (AR) metric:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4)

where [parallel]O[parallel] is the size of ontology O, and [parallel]m[parallel] is the size of metadata in ontology O.

A higher value of the AR metric means that ontology contains more metadata and its description is more complex.

3) Structure dimension: Structural Nesting Depth

An XML document D can be defined as a collection of elements D=(e|e[member of]E). Each element e is a 3-tuple e=(l,A,E), where l is the label of the element, A is the set of the attributes of the element, and E is a set of the nested elements. The complexity of an XML document can be evaluated using the depth of the document's structure tree. For characterizing complexity of the XML document's structure, we propose the Structural Nesting Depth (SND) metric:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5)

where d is the largest depth of the XML document, [N.sub.e] is the total number of elements in an XML document, and [n.sub.e](i) is the number of elements at document depth i.

The SND metric is a combination of breadth and depth measures [13] for XML documents, and indicates the depth of the broadest part of the XML document tree.

4) Algorithm dimension: Normalized Difficulty

A functional program specification S is a sequence of functions S=([florin]|[florin][member or]F), where [florin]:(a,a[member of]A) [right arrow] A is a specific function (operator) that may have a sequence of operands as its arguments, and A is a set of function arguments (operands). For XML documents we accept that operations are specified as XML elements, and operands are specified as XML attributes.

We derive the number of distinct operators [n.sub.1], [n.sub.1] = |F|, the number of distinct operands [n.sub.2], [n.sub.2], = |A|, the total number of operators [N.sub.1], [N.sub.1] = |S|, the total number of operands [N.sub.2], [N.sub.2] = [summation over ([florin][member of]S)] |A|.

For evaluating ontology complexity at the algorithm dimension we introduce the Normalized Difficulty (ND) metric, which is a normalized ratio of Halstead Difficulty and Volume metrics [14]:

ND = [n.sub.1][N.sub.2]/([N.sub.1]+[N.sub.2])([n.sub.1]+[n.sub.2]) (6)

A high value of the ND metric means that ontology is highly complex with many distinct classes and relationships between them.

5) Hierarchy dimension: Subclassing Richness

Concept hierarchy (taxonomy) H is a 4-tuple H=(V,E,L,R), where V is a set of nodes (vertices) representing domain concepts, E is set of directed edges representing semantic relationships between concepts, L is a set of labels denoting different types of semantic relationships such as aggregation, generalization etc., and R is a set of constraints defined over nodes and edges to constrain these relationships.

Concept hierarchies provide a static modelling capability that is well suited for representing ontologies, so the structural complexity of a concept hierarchy (such as described using UML class diagram) is one of the most important measures to evaluate the quality of ontologies [15]. Here we assume that concept hierarchy is described using RDF schema. To evaluate the complexity of taxonomical relationships in ontology, the Subclassing Richness (SR) metric is used:

SR = [n.sub.SC]/[n.sub.C] + [n.sub.SC], (7)

where [n.sub.SC] is a number of sub-class (SC) relationships {rdfs:subClassOf}, and [n.sub.C] is a number of classes (C) {Class, Thing, Nothing} in the concept hierarchy.

The SR metric reflects the distribution of information across different levels of the ontology. A low SR value indicates a vertical ontology, which might reflect a very detailed type of knowledge that the ontology represents. A high SR value indicates a horizontal (flat) ontology, which means that ontology represents a wide range of general knowledge.

6) Metamodel dimension: Relationship Richness

Ontology described using an RDF schema is a graph G = <{C [union] L},P,[S.sup.C],[S.sup.P]>, where C is a set of nodes labelled with a class name, L is a set of nodes labelled with a data type (literals), [S.sup.C] is a subsumption between classes C , P is a set of arcs of the form < [c.sub.1], p, [c.sub.2] > , where [c.sub.1] [member of] C, [c.sub.2] [member of] C [union] L, p is a property name, and [S.sub.P] is a subsumption between properties P.

Main RDFS constructs for the description of ontologies are committed for describing resource class hierarchies {rdfs:subClassOf} and resource property relationships {rdfs:subPropertyOf, rdfs:domain, rdfs:range}. To evaluate complexity of relationships defined by the RDF schema constructs of the OWL ontology the

Relationship Richness (RR) metric is adopted from the OntoQA metric collection [8]:

RR = [n.sub.P]/[n.sub.P] + [n.sub.SC], (8)

where [n.sub.P] is the number of relationships (P) defined in the schema, and [n.sub.SC] is the number of subclasses (SC) (i.e., inheritance relationships).

The RR metric reflects the diversity of relationships in the ontology. An ontology that contains many relations other than class-subclass relations is richer than taxonomy with only sub-classing relationships.

7) Ontology dimension: Logic Richness

The ontology structure O, proposed by [8], can be described by a a 6-tuple O:={C, P, A, [H.sup.C], prop, att}, where C is a set of concepts (classes), P is a set of relationships, A is a set of attributes, [H.sup.C] , [H.sup.C] [??] C x C, is a concept hierarchy (taxonomy), prop: P[right arrow]C x C is a function that relates concepts non-taxonomically, att: A[right arrow]C is a function that relates concepts with literal values.

OWL language syntax has the following groups of constructs for describing non-taxonomic relationships between domain concepts:

-- classes (C) {Class, Thing, Nothing}, and -- properties (P) {rdf:Property, DatatypeProperty, ObjectProperty, FunctionalProperty, SymmetricProperty, AnnotationProperty, TransitiveProperty, InverseFunctionalProperty, OntologyProperty}.

The non-taxonomic relationships are:

-- class restrictions (CR) {Restriction}, -- property restrictions (PR) {rdfs:domain, rdfs:range}, -- equalities (E) {differentFrom, distinctMembers, equivalentClass, equivalentProperty, sameAs}, -- class axioms (CA) {oneOf, dataRange, disjointWith}, -- class expressions (CE) {complementOf, intersectionOf, unionOf}.

Class restrictions are used to restrict individuals that belong to a class. Property restrictions identify restrictions to be placed on how properties can be used by instances of a class. Equalities identify equalities/inequalities between classes and properties. Axioms are used to associate class and property identifiers with either partial or complete specifications of their characteristics, and to give other information about classes and properties. Class expressions are used to perform Boolean logic operations over class hierarchies.

The complexity of taxonomical relationships is defined at the hierarchy and metamodel dimensions of the 7DO model. For complexity of first-order logic relationships between concepts and properties we propose using a Logic Richness (LR) metric defined as follows:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (9)

where [n.sub.x]--is a number of objects x in ontology O. The LR metric reflects the diversity and complexity of logic relationships in the ontology.

8) Cumulative complexity of ontology

The Cumulative Complexity (CC) of ontology in the 7DO model is calculated as an arithmetic mean of dimensions' complexities:

CC = RKC + AR + SND + ND + IR + RR + LR/7. 100% (10)

All complexity metrics of the 7DO model satisfy the Non-negativity, Null Value, Symmetry Module Monotonicity, Disjoint Module Additivity properties of complexity metrics defined by [16]. Furthermore, all metric values are scaled to (0,1) range, which is convenient for comparison and aggregation of metric values. The 7DO model metrics are summarized in Table 2.

4. Case Study in Ontology Evolution

We performed complexity analysis of the SWETO [17] ontology. SWETO is a general purpose ontology that covers domains including publications, affiliations, geography and terrorism. We analyzed 5 versions of the SWETO ontology developed in 2003-2004. The size of the SWETO ontology was measured using Lines of Code (LOC) metric (Figure 1) and the number of classes (Figure 2).

The measurement of complexity metrics was performed by a PHP script that parses the XML-based OWL ontology and computes the complexity metrics based on the predefined XML, RDF and OWL primitives. The Relative Kolmogorov Complexity metric was calculated using the standard PHP ARCHIVE_ZIP library. The results are presented in Figure 3.

From the results we can see, that the SWETO ontology has grown from version to version linearly (size: R = 0.97; number of classes: R = 0.95). The complexity metric values (except RR) have remained flat. The value of the RR metric has decreased pointing to the introduction of relationships other than sub-classing relationships such as restrictions on class properties.

[FIGURE 1 OMITTED]

[FIGURE 2 OMITTED]

[FIGURE 3 OMITTED]

From the results shown in Figures 1-3 we can make two conclusions:

1) The size of an evolved ontology tends to grow linearly;

2) The complexity of an evolved ontology tends to remain constant.

Note that the Second Law of Lehman [18] for software evolution claims that complexity of an evolved software program tends to increase. The claim has been supported empirically for numerous software development projects [19]. It seems that the complexity of ontologies as a domain description artefact depends only upon the domain of ontology and stays flat as ontologies are being evolved. However, more research is needed on ontologies from different domain to confirm this observation. This research can be considered as a first step towards discovery and formulation of the Ontology Evolution Laws.

5. Conclusions

The presented 7DO model for evaluation of the structural complexity of ontology descriptions can be used for comparison and ranking of ontologies within the same domain, as well as for investigating ontology evolution issues. The proposed set of complexity metrics can be used by knowledge engineers, ontology designers, and ontology users.

The advantages of the ontology evaluation using the proposed metrics of the 7DO model are as follows: 1) Computation is easy and straightforward, only XML parser is required. 2) The 7DO model is ontology content-independent. 3) Metrics are reusable and domain-independent. 4) Metrics are scale-free, i.e., independent of an ontology size.

However for deeper ontology analysis, the metric-based evaluation should be combined with the expert-based evaluation of non-technical and content related ontology characteristics such as completeness or consistency. A suite of benchmark ontologies should be developed (or gathered), with which the results of ontology evaluation could be compared.

References

[1] T.R. Gruber. A translation approach to portable ontologies. Knowledge Acquisition, 5:199-220, 1993.

[2] N. Guarino. Formal Ontology and Information Systems. In Proc. of First Int. Conf. on Formal Ontologies in Information Systems (FOIS), Trento, Italy, pp. 3-15, 1998.

[3] World Wide Web Consortium. OWL Web Ontology Language Reference. W3C Recommendation 10 Feb, 2004.

[4] H. Yao, A.M. Orme, and L. Etzkorn. Cohesion Metrics for Ontology Design and Application. Journal of Computer Science 1(1): 107-113. Science Publications, 2005.

[5] L. Stojanovic. Methods and Tools for Ontology Evolution. PhD thesis, University of Karlsruhe, 2004.

[6] A. Lozano-Tello and A. Gomez-Perez. ONTOMETRIC: a method to choose the appropriate ontology. Journal of Database Management 15, 1-18, 2004.

[7] N. Noy and C. Hafner. The state of the art in ontology design: A survey and comparative review. AI Magazine, 18(3):53-74, 1997.

[8] S. Tartir, I.B. Arpinar, M. Moore, A. Sheth, B. Aleman-Meza. OntoQA: Metric-Based Ontology Quality Analysis. IEEE ICDM 2005 Workshop on Knowledge Acquisition from Distributed, Autonomous, Semantically Heterogeneous Data and Knowledge Sources, Houston, TX, USA, November 27, 2005, pp. 45-53.

[9] M. Hepp, P. De Leenheer, A. de Moor, and Y. Sure (Eds.). Ontology Management: Semantic Web, Semantic Web Services, and Business Applications. Springer, 2007.

[10] J. Visser. Structure metrics for XML Schema. In J.C. Ramalho et al. (eds.). Proc. of XATA 2006. Univ. of Minho, 2006.

[11] S. Cranfield. UML and the Semantic Web. Proc. of the International Semantic Web Working Symposium, Palo Alto, USA, 2001.

[12] M. Li and P. Vitanyi. An Introduction to Kolmogorov Complexity and Its Applications, 2nd Edition, Springer Verlag, 1997.

[13] R. Lammel, S.D. Kitsis, and D. Remy. Analysis of XML Schema Usage. Proc. of XML 2005, International Digital Enterprise Alliance, Atlanta, November 2005.

[14] M.H. Halstead. Elements of Software Science. New York, NY: Elsevier, 1977.

[15] D. Kang, B. Xu, J. Lu, W.C. Chu. A complexity measure for ontology based on UML. Proc. of 10th IEEE International Workshop on Future Trends of Distributed Computing Systems FTDCS 2004, 26-28 May 2004, pp. 222 - 228.

[16] L.C. Briand, S. Morasca, V.R. Basili. Property-Based Software Engineering Measurement. IEEE Trans. Software Eng. 22(1): 68-86, 1996.

[17] B. Aleman-Meza, C. Halaschek, A. Sheth, I.B. Arpinar, and G. Sannapareddy, SWETO: Large-Scale Semantic Web Test-bed. Proc. of 16th Int. Conf. on Software Engineering & Knowledge Engineering, Banff, Canada, pp. 490-493, 2004.

[18] M.M. Lehman, J.F. Ramil, P. Wernick, D.E. Perry, and W.M. Turski. Metrics and Laws of Software Evolution --The Nineties View. IEEE METRICS 1997, p. 20.

[19] W. Scacchi. Understanding Open Source Software Evolution. In N.H. Madhavji, M.M. Lehman, J.F. Ramil, and D. Perry, (eds.), Software Evolution and Feedback, John Wiley and Sons, New York, 2006.

Robertas Damasevicius

Software Engineering Department,

Kaunas University of Technology,

Studentu 50-415, LT-51368, Kaunas, Lithuania

email: robertas.damasevicius@ktu.lt
Table 1: Summary of the 7DO model

 Analyzed
Dimension Artefacts format Reasoning

Text Symbols TXT Syntax-free
Metadata Data, metadata XML Semantics-free
Structure Elements, attributes XML
Algorithm Operators, operands XML
Hierarchy Classes, relationships RDF
Metamodel Subjects, objects, RDF First-order logic
 predicates
Logic Class and property OWL
 restrictions class
 expressions, axioms

Table 2: Summary of ontology dimension complexity metrics

Dimension Metric Subjects of measurement

Text Relative Kolmogorov Object: OWL file
 Complexity Program: compressed OWL file

Metadata Annotation Richness Data: XML elements, attributes
 Metadata: attribute values,
 labels, comments

Structure Structural Nesting Depth: level of XML document
 Depth
 Elements: number of tags at
 different document levels

Algorithm Normalized Difficulty Operators: XML tags
 Operands: attributes of XML
 tags

Hierarchy Subclassing Richness Concepts: Classes
 Relationships: subclass
 relationships

Metamodel Relationship Richness Subclass relationships, other
 relationships

Ontology Logic Richness Class and property restrictions,
 equalities, class axioms,
 class expressions
Dimension Meaning for ontology

Text High variability of content

Metadata Provision of human-readable information on domain concepts

Structure Complexity of document's structure

Algorithm Uniqueness of classes and relationships between them

Hierarchy Detailness of domain knowledge

Metamodel Complexity of relationships between domain concepts

Ontology Complexity of logic
COPYRIGHT 2009 University of the West of Scotland, School of Computing
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2009 Gale, Cengage Learning. All rights reserved.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:Damasevicius, Robertas
Publication:Computing and Information Systems
Date:Feb 1, 2009
Words:4053
Previous Article:Bees collective dynamics.
Next Article:Immersive learning and assessment with quizHUD.

Terms of use | Copyright © 2015 Farlex, Inc. | Feedback | For webmasters