Printer Friendly
The Free Library
19,585,946 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Advanced technologies for contents sharing, exchanging, and searching in e-learning systems.


From our experience in the e-learning field (De Pietro, 2002), we can say that a great part of developers and researchers' efforts is focused on building web-based information systems (Fraternali, 1999) for guaranteeing asymmetric A difference between two opposing modes. It typically refers to a speed disparity. For example, in asymmetric operations, it takes longer to compress and encrypt data than to decompress and decrypt it. Contrast with symmetric. See asymmetric compression and public key cryptography.  and symmetric No difference in opposing modes. It typically refers to speed. For example, in symmetric operations, it takes the same time to compress and encrypt data as it does to decompress and decrypt it. Contrast with asymmetric.

(mathematics) symmetric - 1.
 hypermedia hypermedia: see hypertext.


The use of hyperlinks, regular text, graphics, audio and video to provide an interactive, multimedia presentation. All the various elements are linked, enabling the user to move from one to another.
 communication. According to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 this point of view, e-learning systems can be considered a particular subset A group of commands or functions that do not include all the capabilities of the original specification. Software or hardware components designed for the subset will also work with the original.  of a bigger category of systems created for storing, managing, and producing data and contents.

**********

What we can generally observe is that the elaboration pipeline is based on a multi-tier model, made up by a storing domain (database and/or multimedia data warehouse), an elaboration and presentation domain (made up of a web server), and by a client application (generally a web browser The program that serves as your front end to the Web on the Internet. In order to view a site, you type its address (URL) into the browser's Location field; for example, www.computerlanguage.com, and the home page of that site is downloaded to you. ). The weak point we have observed, is that data are stored in the system and they can be outputted only in html format: this constitutes a big limitation both for the intersystem communication-collaboration and for the integration among Internet and the other communication networks--for example 2,5G and 3G mobile networks--(Huber, Weiler & Brand, 2000). In the usual client/server communication, the only thing that system has to do is provide an interface for allowing the request/response flow from/to client application. The border of the system is the web server, where there is the binding between data and the presentation form for the final user. Only inside the system, knowing metadata, is there the opportunity to build data aggregation or extract a part of the contents from an existing data aggregation, respecting semantic correctness. E-learning systems are very big knowledge stores, so that can be useful in that they can share and exchange data with one another: similar to what happens in database fields, where the Online Analytical Processing Online Analytical Processing, or OLAP (IPA: /ˈoʊlæp/), is an approach to quickly provide answers to analytical queries that are multidimensional in nature.  (OLAP (OnLine Analytical Processing) Decision support software that allows the user to quickly analyze information that has been summarized into multidimensional views and hierarchies. OLAP tools are used to perform trend analysis on sales and financial information. ) systems and data warehouses were built for managing and integrating nonhomogeneous data sources. For web information systems, there is also the need for architectures and technologies that allow the sharing of data and contents among different platforms. To achieve this target, we need both the binding between data and the presentation form, and to find a way for transporting data and metadata through network protocols. In this context we can see two levels of information managing and elaboration, in fact to usual intra-system data processing data processing or information processing, operations (e.g., handling, merging, sorting, and computing) performed upon data in accordance with strictly defined procedures, such as recording and summarizing the financial transactions of a  it is added inter-system data exchanging, sharing, and processing. The Internet can become a free space, developing an infinite series infinite series

In mathematics, the sum of infinitely many numbers, whose relationship can typically be expressed as a formula or a function. An infinite series that results in a finite sum is said to converge (see convergence). One that does not, diverges.
 of clustering layers among systems connected through it. According to this idea, we think that for e-learning systems, research activity must focus its attention on protocols and technologies that allow content sharing among systems, so it becomes fundamental to define a paradigm for expressing semantics semantics [Gr.,=significant] in general, the study of the relationship between words and meanings. The empirical study of word meanings and sentence meanings in existing languages is a branch of linguistics; the abstract study of meaning in relation to language or  and relationships of data using a metadata paradigm, and develop procedures and algorithms for semantic analysis Semantic analysis may refer to:
  • Semantic analysis (computer science)
  • Semantic analysis (informatics)
  • Semantic analysis (linguistics)
 and classification. For example, it can be very useful to identify the subject that a lesson deals with, using learning object's metadata as the semantic fingerprint fingerprint, an impression of the underside of the end of a finger or thumb, used for identification because the arrangement of ridges in any fingerprint is thought to be unique and permanent with each person (no two persons having the same prints have ever been . We can use classification technologies, not only for analyzing one single document while it is transferred from one system to an other, but also for matching a search pattern of a query that is sent by a client or by a remote system; for example, we can search and calculate the set of documents that best matches a query pattern in the least redundant way. Obviously, also traditional client/server architectures An environment in which the application processing is divided between client workstations and servers. It implies the use of desktop computers interacting with servers in a network in contrast to processing everything in a large centralized mainframe. See client/server.  can draw an advantage from using technologies and algorithms of semantics analysis and classification, because the last ones can be very useful for developing advanced search engines or advanced tools for knowledge management inside an e-learning system.

For the reasons explained before, in this article we discuss the way of using the algorithms and the knowledge developed in the field of patterns classification to analyze specific meta-information of learning objects and allow content exchanging and sharing among e-learning systems. The basic idea is to use keywords labels bound to documents for classifying and for aggregating learning objects. First we deal with the IMS (1) See IP Multimedia Subsystem.

(2) (Information Management System) An early IBM hierarchical DBMS for IBM mainframes. IMS was widely implemented throughout the 1970s under MVS and continues to be used under z/OS.
 meta-data model and the problem of classification, from fuzzy set Fuzzy sets are sets whose elements have degrees of membership. Fuzzy sets have been introduced by Lotfi A. Zadeh (1965) as an extension of the classical notion of set. In classical set theory, the membership of elements in a set is assessed in binary terms according to a bivalent  theory to neural networks neural network or neural computing, computer architecture modeled upon the human brain's interconnected system of neurons. Neural networks imitate the brain's ability to sort out patterns and learn from trial and error, discerning and extracting  classifiers; in the last part we discuss some classification algorithms and some methodologies for matching pattern during searching operation.

METADATA

The problem of binding data and metadata has always been present especially on the Internet, where information is exchanged very frequently and interoperability The capability of two or more hardware devices or two or more software routines to work harmoniously together. For example, in an Ethernet network, display adapters, hubs, switches and routers from different vendors must conform to the Ethernet standard and interoperate with each other.  must be guaranteed. In HyperText Markup Language (hypertext, World-Wide Web, standard) Hypertext Markup Language - (HTML) A hypertext document format used on the World-Wide Web. HTML is built on top of SGML. "Tags" are embedded in the text. A tag consists of a "<", a "directive" (in lower case), zero or more parameters and a ">".  (HTML) language one way that helps search engines in the classification of hypertexts is to use the tag <META>, where it's inserted, it creates a set of keywords that specify what that document deals with. When Extensible Markup Language See XML.

(language, text) Extensible Markup Language - (XML) An initiative from the W3C defining an "extremely simple" dialect of SGML suitable for use on the World-Wide Web.

http://w3.org/XML/.
 (XML XML
 in full Extensible Markup Language.

Markup language developed to be a simplified and more structural version of SGML. It incorporates features of HTML (e.g., hypertext linking), but is designed to overcome some of HTML's limitations.
) was developed, the possibility of building a set of tags and collecting them using a namespace A collection of names for a particular purpose. Typically, each name is unique. For example, tables in a relational database must all have unique names. A Windows workgroup that uses the original NetBIOS naming system requires a different "made-up" name for each computer and printer in , led developers of web contents to create descriptors that could express the structure, the organization, and the semantics of documents. In the e-learning field, researchers and developers very early on understood the need of defining a common language for specifying meta-information, so many independent working groups and consortia have produced a series of paradigms. For example, Resources Description Framework (RDF (Resource Description Framework) A recommendation from the W3C for creating meta-data structures that define data on the Web. RDF is designed to provide a method for classification of data on Web sites in order to improve searching and navigation (see Semantic Web). ) (Beckett, 2003), is a language for expressing metadata of web resources; there are a lot of RDF extensions that add XML elements for describing some details that characterize documents specifically (e.g., Dublin Core--DC) (Powell, 2003). Today, a great part of this research activity is met in IMS ProjectC (by IMS Global The IMS Global Learning Consortium (usually known as IMS) is a non-profit standards organization concerned with establishing interoperability for learning systems and learning content and the enterprise integration of these capabilities.  learning consortium), making up the Learning Resource Metadata Information Model; the last one uses XML language for building a structure, where contents are labeled by tags that have a unique and precisely understandable meaning. Knowledge of this metadata standard then is enough for understanding with precision and without interpretation errors, semantic information about any kind of learning object. Obviously, metadata tags can be used in either XML structured text files or in flat table form, constituting the attributes of the database's table. The most important role of IMS metadata is visible using some communication protocols such as Extensible Markup (text) markup - In computerised document preparation, a method of adding information to the text indicating the logical components of a document, or instructions for layout of the text on the page or other information which can be interpreted by some automatic system.  Language-Remote Procedure Call (XML-RPC (XML Remote Procedure Call) A message-based protocol based on XML for accessing services over the Internet. An XML-RPC message is passed to the target server in an HTTP POST request. For more information, visit www.xml-rpc.com. See XMLP, XML and RPC. ), where we can transmit a descriptor (1) A word or phrase that identifies a document in an indexed information retrieval system.

(2) A category name used to identify data.

(operating system) descriptor
 of the learning object to other systems and platforms: the common agreement about the meaning of meta-tags makes the data fully understandable and exportable.

IMS standards define technologies for building not only e-documents but also roles and functions that will be used in e-learning platforms. Considering the subject of this article we are interested particularly in the IMS Learning object model that is defined in the IMS Learning Resource Meta-Data Best Practices and Implementation Guide (available at www.imsglobal.org). The XML learning object descriptor has a tree structure, where the root is Learning Object Metadata Learning Object Metadata is a data model, usually encoded in XML, used to describe a learning object and similar digital resources used to support learning. The purpose of learning object metadata is to support the reusability of learning objects, to aid discoverability, and to  (LOM (1) (LAN On Motherboard) Refers to building the Ethernet circuits directly on the motherboard rather than requiring that a separate network adapter be plugged in.

(2) (Lights Out Management) See lights out server room.
) (the object) and on the first level there are these subelements: general, lifecycle, metametadata, technical, educational, rights, relationship, annotation 1. (programming, compiler) annotation - Extra information associated with a particular point in a document or program. Annotations may be added either by a compiler or by the programmer. , and classification; each tag of these contains a subhierarchy of elements. In the classification of LOM contents, very useful fields are description, keyword, and coverage (inserted into a general subtree), and the whole classification subtree; both general and classification contain a subelement called keyword and we think this is a very good candidate for assuming the important role of a semantic fingerprint of the document. Other meta-information are less adept for this role, for example, general.description or classification.description give informal and human level semantic specification (based on human natural language) and they are not suitable for direct elaboration by software procedures. Classification.taxonpath.taxon taxon (pl. taxa), in biology, a term used to denote any group or rank in the classification of organisms, e.g., class, order, family.  permits classification through taxonomic tax·o·nom·ic   also tax·o·nom·i·cal
adj.
Of or relating to taxonomy: a taxonomic designation.



tax
 paths that are made up by ordered lists In HTML an ordered list
    ..
is a HTML element for a list of items where each item is automatically prefixed by an indication of its position in the list.

An unordered list
    ..
 of keywords, where the first one is the more general entry in the classification and the last keyword of the list is the more specific class; examples of taxonomic path are Physics/Acoustic/Instrument/Stethoscope or Medicine/Diagnostic/Instrument/Stethoscope, which defines two distinct taxonomies for the same object, stethoscope stethoscope (stĕth`əskōp') [Gr.,=chest viewer], instrument that enables the physican to hear the sounds made by the heart, the lungs, and various other organs. The earliest stethoscope, devised by the French physician R. T. H. . This kind of classification is very useful, but it has one big limitation: it must be specified in one big taxonomic classification of all human knowledge and all the systems must agree about it; if we consider the enormous quantity of classes and subclasses that we can build for classifying every object that constitutes a lesson, we can understand easily that this type of approach is not suitable for general classification, which should always be understandable and accepted by every system on the Web.

For the reasons we have explained before, we think the most suitable field that can be used for classification purpose is keyword, because it permits expressing in a synthetic way, the semantic context of a document. A group of keywords hasn't got any hierarchical information as in the case of a taxonomic path, because keywords are at the same level, and the classification is made by measuring semantic distance between keywords sets. This analyzing process is very simple and fast and, at same time, is very general and we can apply it easily to intersystem communication; for this reason in the remainder of this article we are going to develop algorithms and technologies for semantic analysis based on keywords sets. It's very important to stress that a fundamental aspect is to build the keywords set for every document; there are two possibilities: using a parser A routine that analyzes a continuous flow of text-based input and breaks it into its constituent parts. See parse.

(language) parser - An algorithm or program to determine the syntactic structure of a sentence or string of symbols in some language.
 that assigns them automatically, or asking the provider for data for inserting them. We think the second solution is more suitable than the first one in the case of e-learning contents. We have to consider that there are a lot of digital documents that can't be parsed because they are not in textual format. Moreover, seeing that the data producers and providers are teachers, it's very important to give them the faculty of assigning semantic labels. They are competent in the subjects they teach and, the keyword sets they choose will surely be better than a label that, for example, can be chosen by an automatic process that extracts the most frequently used words in the body, title, or abstract of a document. Teachers of a particular subject agree on the language that can be used for describing contents and data that deal with their teaching field, so each keyword chosen is given the right semantic value.

CLASSIFICATION AND PATTERNS RECOGNITION

We'll use keywords labels for classifying documents, but first we will discuss the problem of classifying an object through a pattern that labels it. In this case, the pattern is directly the keywords set, so we don't have the problem of extracting, from the analyzed object, a group of parametric properties that constitutes a mathematical abstract model to submit to the elaboration process of the classification algorithm. This aspect is very important because the biggest problem of pattern recognition is to build a pattern that correctly represents the object that we want to classify, in fact it can happen that some important details are deleted or distorted when it's extracted from a set of properties.

Pattern recognition is a research field that deals with Artificial Intelligence (Khanna, 1991). Very interesting results have been obtained implementing the recognition algorithm using neural networks. For this reason we think it's very useful analyzing how neural algorithms produce a classification on output when they receive in input a pattern of properties. Among the most famous and used neural networks, we focus our attention on the Hopfield and Maxnet ones, because, in our opinion, they constitute a very interesting example that can lead us to develop a keywords-based classification procedure.

HOPFIELD NEURAL NETWORKS

A neural network can be represented by an oriented o·ri·ent  
n.
1. Orient The countries of Asia, especially of eastern Asia.

2.
a. The luster characteristic of a pearl of high quality.

b. A pearl having exceptional luster.

3.
 graph, where every arch connects the output of one node (also called neuron neuron, specialized cell in animals that, as a unit of the nervous system, carries information by receiving and transmitting electrical impulses.
neuron
 or nerve cell

Any of the cells of the nervous system.
) to the input of another one; a weight is associated to every arch, so the connection between two nodes is weighted by it. A Neural network needs learning procedures that sets the weight of the arches to a specific value: in other words Adv. 1. in other words - otherwise stated; "in other words, we are broke"
put differently
, we can say that the weights are the memory of the network and they are set using patterns that correspond to known classes, which constitute the taxonomy taxonomy: see classification.
taxonomy

In biology, the classification of organisms into a hierarchy of groupings, from the general to the particular, that reflect evolutionary and usually morphological relationships: kingdom, phylum, class, order,
 of the classification. The output of every node is given by a function, linear or nonlinear A system in which the output is not a uniform relationship to the input.

nonlinear - (Scientific computation) A property of a system whose output is not proportional to its input.
, that receives in put in the weighted out puts of the nodes connected to it.

In the Hopfield network (artificial intelligence) Hopfield network - (Or "Hopfield model") A kind of neural network investigated by John Hopfield in the early 1980s. The Hopfield network has no special input or output neurons (see McCulloch-Pitts), but all are both input and output, and all are connected  all the neurons Neurons
Nerve cells in the brain, brain stem, and spinal cord that connect the nervous system and the muscles.

Mentioned in: Speech Disorders
 are connected to one another; if we label every node as [x.sub.i] and if we suppose the weight that connects the output of the node [x.sub.j] to the node [x.sub.i] is called [t.sub.ij], we can say that the output value [o.sub.i] of the node [x.sub.i] is given by (Pao, 1989):

(1) [o.sub.i] = [V.sub.i.sup.0] if [summation summation n. the final argument of an attorney at the close of a trial in which he/she attempts to convince the judge and/or jury of the virtues of the client's case. (See: closing argument)  over (j[not equal to]i)][.sup.t] ij * o j < [U.sub.i]

[o.sub.i] = [V.sub.i.sup.1] if [summation over (j[not equal to]i)][.sup.t] ij * o j >= [U.sub.i]

In (1) [U.sub.i] is a threshold value of node i, [V.sub.i.sup.0] and [V.sub.i.sup.1] may be 0 and 1 or -1 and +1, or any other desired pair of values. The set [o.sub.1] ... [o.sub.N] (N is the total number of neurons) constitutes a pattern that can also be considered to be state vector
  • A quantum state vector fully specifies any quantum mechanical state in which a quantum mechanical system can be.
  • A geographical state vector specifies the position and velocity of an object in space.
 of the system; the weights of the arches are fixed using m known patterns:

(2) [t.sub.ij] = [m.summation over (S=1)] (2[o.sub.i.sup.s] - 1) (2[o.sub.j.sup.s] - 1)

[t.sub.ii] = 0

In (2) we supposed that [V.sub.i.sup.0] and [V.sub.i.sup.1] are respectively 0 and 1; [o.sub.j.sup.s] is the output of node j when the system is in the stable state [o.sub.s] and each [o.sub.s] will be a stable point of the system and it'll correspond to a specific class in the taxonomy we are using. When we want to classify a N-dimensional pattern, we assign its values to [o.sub.1] ... [o.sub.N] outputs and we apply rules explained in (1) recursively until we'll come to a stable state and [o.sub.1] ... [o.sub.N] won't vary any longer; it's very important to stress that nodes are interrogated and updated (1) in a stochastic By guesswork; by chance; using or containing random values.

stochastic - probabilistic
 and asynchronous Refers to events that are not synchronized, or coordinated, in time. The following are considered asynchronous operations. The interval between transmitting A and B is not the same as between B and C. The ability to initiate a transmission at either end.  manner. It can be shown (Pao, 1989) that, in the Hopfield network, when we start from a generic state vector in N-dimensional space For the video game development company see n-Space. For the short story collection, see N-Space (short story collection).

In mathematics, an n-dimensional space is a topological space whose dimension is n (where n is a fixed natural number).
 and we run a cycle applying (1), the network evolves to the nearest stable state. From what we have said, it can be easily understood that a Hopfield network is similar to a nonrising surface where there are holes; if we insert a ball on it, we'll see the ball fall into the nearest hole. The biggest problem we observe in Hopfield networks is linked to the presence of "spontaneous" stable points that we haven't inserted using the procedure described in (2); it generally happens when the number of neurons is not enough compared to the number of desired stable points: if we want good behaviour of our neural network, given n neurons and using (2), we can store no more than n/(4 log(n)) stable states (Pao, 1989). Moreover, studies have shown that where there is a big number of stable points, classification can be wrong, especially when there are stable points very near one another. For obtaining better control of the classification procedure, we can use another kind of neural network, called Maxnet.

MAXNET NEURAL NETWORKS

Maxnet network classifies a pattern on the basis of the Hamming distance (data) Hamming distance - The minimum number of bits that must be changed in order to convert one bit string into another.

Named after the mathematician Richard Hamming.
 between the class exemplar ex·em·plar  
n.
1. One that is worthy of imitation; a model. See Synonyms at ideal.

2. One that is typical or representative; an example.

3. An ideal that serves as a pattern; an archetype.

4.
 vector and the input pattern. Given two N-dimensional 0-1 binaries, vectors x and u, Hamming distance is defined in this way (Pao, 1989):

(3) (Ham min g_dis tan ce(x,u) = N-[N.summation over (i=1)] ([u.sub.i], * [x.sub.i])

This quantity is precisely the number of instances which u and x do not agree in the corresponding positions. Maxnet is layered architecture An architecture in which data moves from one defined level of processing to another. Communications protocols are a primary example. See OSI model. : in the first layer, for every class exemplar pattern [u.sub.j] it is calculated [N.summation over (i=1)][u.sub.ji] * [x.sub.i] (Pao, 1989) and it's stored in a neuron; in the second layer a cycle is implemented for discovering what is the class exemplar pattern that best matches the input pattern x (for this reason this kind of network is called Maxnet). In the first layer there are as many nodes as classes and every node is associated with a class of taxonomy; given the input vector x for the neuron j, the latter's output will be [N.summation over (i=1)][u.sub.ji] * [x.sub.i].

In the second subnet (SUBNETwork) A logical division of a local area network, which is created to improve performance and provide security. To enhance performance, subnets limit the number of nodes that compete for available bandwidth.  there is the same number of nodes as the first one and every node is associated with a class, so, ([t.sub.jk] the connection weight from node j to node k), we take (Pao, 1989):

(4) [t.sub.jk] = 1 if j = k

[t.sub.jk] = -[epsilon] if j [not equal to] k

with j and k varying between 1 and M (that is the number of feasible classes and the number of nodes in each layer) and [epsilon] < 1/M. In this subnet, processing proceeds iteratively (5) and the output of a generic node j at run t+1 of the cycle is given in terms the output values at run t, by the relationship (Pao, 1989):

(5) [[mu].sub.i] (t + 1) = [f.sub.1]([[mu].sub.j] (t) - [epsilon] * [summation over (k [not equal to] j)] [[mu].sub.k] (t))

where

(6) [f.sub.1] (a) [alpha] a if a > 0

[f.sub.1] (a) = 0 if a [less than or equal to] 0

The initial output value of a generic node j in the second subnet is assumed to be equal to the output value of node j in the first layer (we remember that they are associated to the same class j of taxonomy); it can be shown that applying iteratively (5) and (6), after a finite number of steps it will be greater than zero only the output of the node that represents the class that best matches the input pattern.

SEMANTIC CLASSIFICATION

Maxnet networks teach us that it's very important to fix a way for measuring the "distance" between patterns; in other words we can say that it's fundamental to develop one mathematical model
Note: The term model has a different meaning in model theory, a branch of mathematical logic. An artifact which is used to illustrate a mathematical idea is also called a mathematical model and this usage is the reverse of the sense explained below.
 for expressing "how far" apart two objects are. In this case, input patterns that we want to classify are keywords set and not 0-1 binaries vectors, so Hamming distance can't be used, for this reason we have developed a particular kind of metrics metrics Managed care A popular term for standards by which the quality of a product, service, or outcome of a particular form of Pt management is evaluated. See TQM.  that we can call "semantic metrics." We can use it both in the intrasystem elaboration process and in the intra-systems communication; in the latter case we can transmit the learning objects using XML-RPC protocol and the receiving system can analyze and classify the documents, processing semantic keywords labels through the algorithm we are going to explain.

The Semantic Metrics and Semantic Homogeneity Homogeneity

The degree to which items are similar.
 Index

When we define new metrics in a generic space we must care about its correctness and, moreover, it's very important to develop a simple and fast procedure for calculating it. Introducing metrics in a semantic space, we try to give a quantitative and synthetic value to something that is generally very far from mathematical and numerical models; when we measure distance between an exemplar pattern, that represents a precise class, and a generic pattern, we express the membership degree of the latter one to the former class, using a numeric numeric

see numerical.


numeric cluster
see ten-key pad.
 value. The idea that membership function is not Boolean, but it can express a value taken from a set, discrete or continuous, of a real number, is not new, and it can be found in the Fuzzy fuzz·y  
adj. fuzz·i·er, fuzz·i·est
1. Covered with fuzz.

2. Of or resembling fuzz.

3. Not clear; indistinct: a fuzzy recollection of past events.

4.
 Theory; formally we can say that given a class X of object x, then a fuzzy set A in X is a set of ordered pairs In mathematics, an ordered pair is a collection of two not necessarily distinct objects, one of which is distinguished as the first coordinate (or first entry or left projection) and the other as the second coordinate (second entry,  (Klir, Clair, & Yuan Yuan (yüän), river, 540 mi (869 km) long, rising in S Guizhou prov. and flowing generally NE to Donting lake, Hunan prov., SE China. Navigation above Changde is limited by rapids to small craft. , 1997):

(7) A = {(x, [[mu].sub.A] (x))| x [member of] X}

The entity [[mu].sub.A] (x) is called membership function, the value of which is the grade of membership of x in X. [[mu].sub.A] (x) maps X to the membership space M. If M contains only two points, 0 and 1, then A is not fuzzy. If the range of the values of the membership function is a subset of nonnegative non·neg·a·tive  
adj.
Of, relating to, or being a quantity that is either positive or zero.

Adj. 1. nonnegative - either positive or zero
 real numbers with a finite upper bound and if this upper bound is unity, the fuzzy set is called normal.

In our case, one simple way for implementing a normal membership function is looking at the number of keywords that matches when we compare the input pattern and the exemplar pattern of one class. If we have an input pattern I, made up of [I.sub.1] ... [I.sub.N] keywords, and we want to know its membership degree to class C, that is represented by the keywords set E, made up of [E.sub.1] ... [E.sub.p] elements, the matching function [f.sub.M] () is defined in the ordered couples ([I.sub.i], [E.sub.j]) in this way:

(8) [f.sub.M] ([I.sub.i], [E.sub.j]) = 1 if [I.sub.i], [E.sub.j]

[f.sub.M] ([I.sub.i], [E.sub.j]) = 0 if [I.sub.i] [not equal to] [E.sub.j] i = 1 ... N, j = 1 ... P

Through [f.sub.M] () we can also define the membership function [[mu].sub.C] (I) of the class C

(9) [[mu].sub.C] (I) = [1/[N*P]] [N.summation over (i=1)] [p.summation over (j=1)] [f.sub.M] ([I.sub.i], [E.sub.j])

[[mu].sub.C] (I) is our semantic metrics and it permits us to express, through a real number, the grade of membership of I to a generic class C; the full membership is indicated by the unity value, zero indicates the nonmembership instead. An important details is that this kind of metrics is based both on the matching keywords couples ([I.sub.1], [E.sub.i]) and on the couples that don't match, so, for example, if {[I.sub.1] ... [I.sub.N]} is a subset of {[E.sub.1] ... [E.sub.p]}, then the grade of membership is lower than unity; this characteristic is very useful because if we have two classes [C.sub.1] = {[E.sub.1] ... [E.sub.p]}, [C.sub.2] = {[F.sub.1] ... [F.sub.R]} and an input pattern I = {[I.sub.1] ... [I.sub.N]}, with I [subset] [C.sub.1] [subset] [C.sub.2], we can surely say that I belongs to [C.sub.1] because it is more specific than [C.sub.2], that is more general class.

Semantic metrics can be also used for calculating the semantic homogeneity [d.sub.M] (A,B) between two keywords set A and B

(10) [d.sub.M] (A, B) = [[mu].sub.A] (B) = [[mu].sub.B] (A)

Either A or B can be considered the exemplar keywords set of the class, because both the choices will produce the same results.

SEARCHING ALGORITHMS

Semantic metrics allow us to analyze a document semantically, through its keywords label; this characteristic can be used when systems exchange documents with one another and, so, they need procedures for classifying data that are coming from outside systems, according to their own taxonomy. Another important use of metrics is in the development of an advanced search engine: e-learning systems can receive data requests having the form of keywords sets, so they need a manner for analyzing requests and for building data aggregations that match the queries, the more feasible the better. Obviously, this kind of schema can be also used in usual client server/application, for providing advanced searching tools to learners or for knowledge management processes that behave inside the system. We want to stress that while in the classification of a document the semantic metrics is used for choosing the class that is semantically nearly to it, in the searching procedure instead, the membership degree to a class is used for organizing the group of documents that matches an input query in the best way.

Best Choice Searching

The simplest use we can do with the semantic metrics and the semantic homogeneity index, is to determine the document that best matches the query pattern; in this particular case all the documents of the system constitute the taxonomy and the keywords set in this query is the pattern that must be classified. This way of doing is very similar to classification procedure in the Maxnet network, the unique difference is the use of semantic metrics instead of Hamming distance. This searching algorithm is very easily developable, but it is not the best solution because it sometimes only partially matches the keywords set in the query, even if data requested are entirely present in the system; for example, it can happen that a group of documents, when they unite their keywords labels, is closer to the target expressed in the query compared to the single document chosen with the best matching rule.

Greedy greed·y  
adj. greed·i·er, greed·i·est
1. Excessively desirous of acquiring or possessing, especially wishing to possess more than what one needs or deserves.

2.
 Searching

The response to a keywords target can be built using, for example, an iterative it·er·a·tive  
adj.
1. Characterized by or involving repetition, recurrence, reiteration, or repetitiousness.

2. Grammar Frequentative.

Noun 1.
 algorithm that, at each step, chooses the document that matches the request in the best way and adds it to a final solution, later, calculated on a new keywords target, it repeats these steps recursively on the last one. This kind of procedure is greedy (Moret & Shapiro, 1991), because at each step it takes the best feasible choice, so we can define this algorithm as "greedy searching." A simple greedy algorithm A greedy algorithm is any algorithm that follows the problem solving metaheuristic of making the locally optimum choice at each stage with the hope of finding the global optimum.  is shown in Figure 1.

In the worst case, the maximum number of iterations that this algorithm runs is equal to the number of keywords contained in the query Q, moreover, from a calculation point of view, the hardest part is when we build the set D in the second step: on that point we have to calculate [d.sub.M] () a number of times equal to the number of documents stored in the system (that can be very high). The procedure we have explained outputs a solution S that has got the following important property when we compare it to the target query Q: it doesn't have another solution that matches a number of keywords of Q greater than one matched by S; in fact, the algorithm stops when it doesn't have any form of matching between the target and the labels of the documents stored in the system (the "Check D" conditional box implements this behaviour). However we can't guarantee that it doesn't exist a solution that matches Q in the same way as S matches, but that is less redundant than S. If we want to care about redundancy we'll be able to modify the choosing of the document ([L.sub.j]) we add at each step.

[FIGURE 1 OMITTED]

Selective Greedy Searching

For reducing redundancy, in the second step of the algorithm we can choose [L.sub.j] | [d.sub.M] (T, [L.sub.j]) [greater than or equal to] d [for all] d [member of] D, avoiding it introduces data that are redundant compared to a calculated partial solution, so we have to develop a procedure for measuring the level of redundancy. Let it be V={[union][L.sub.R]|[L.sub.R] [member of] [S.sup.K-1]}, where [S.sup.k-1] is the partial solution calculated after the iteration One repetition of a sequence of instructions or events. For example, in a program loop, one iteration is once through the instructions in the loop. See iterative development.

(programming) iteration - Repetition of a sequence of instructions.
 (K-1); we can suppose V is made up by P keyword elements V={[K.sub.1] ... [K.sub.p]} and L' is the keyword label of the document we would add to the solution, then an index [i.sub.R], for calculating the redundancy level, is the following:

(11) [i.sub.R] (V, L') = [p.summation over (j-1)] v([K.sub.j], L')

where v([K.sub.j], L') is equal to 1 if [K.sub.j] [member of] L' else it is 0. Defined [i.sub.R], we can modify the greedy algorithm as shown in Figure 2.

In this way we don't choose one among the [L.sub.j] that best matches the target [T.sub.K] randomly, but we choose one that is least redundant among them compared to [S.sub.K-1]; the elaboration cost is higher than the preceding release of the algorithm, but the final solution is surely better, because redundancy is reduced. This algorithm, however, doesn't give the best solution to redundancy; for showing this we can look at the following example: we suppose there is a target set T = {[K.sub.1][K.sub.2] ... [K.sub.6]} and moreover the system stores four documents that have these keywords sets respectively

[D.sub.1] = {[K.sub.1], [K.sub.2], [K.sub.3], [K.sub.4]} [D.sub.2] = {[K.sub.4], [K.sub.5], [K.sub.6]} [D.sub.3] = {[K.sub.1], [K.sub.2], [K.sub.3]}

The selective greedy algorithm will produce the solution [S.sub.greedy] = {[D.sub.1], [D.sub.3]}, but the solution S = {[D.sub.2], [D.sub.3]} has the same level of matching as the first one, but it's less redundant because it doesn't repeat the keyword [K.sub.4] in both of the sets that make it up.

Exponential 1. (mathematics) exponential - A function which raises some given constant (the "base") to the power of its argument. I.e.

f x = b^x

If no base is specified, e, the base of natural logarthims, is assumed.
2.
 Searching

If we want to obtain the best matching and the least redundant solution, we'll have to develop a procedure that will build all the feasible subsets of the set that stores all the documents of the system (operation that requires an exponential computational cost), choosing a solution S according to the following rules:

1. it must not have a solution that matches the target better than S;

2. S must be the least redundant among the best matching solutions; and

3. it must not have a solution with the same level of matching and redundancy as S, that is made up by less numbers of documents than S.

In order to maintain the second rule of this list, we have to implement a specific index for measuring redundancy of a set of keywords sets. Let it be I={[L.sub.1],... [L.sub.p]} a set of keywords sets [L.sub.j] and Q={[union][L.sub.j]|[L.sub.j], [member of] I} = {[K.sub.1] ... [K.sub.M]}, then

[bar.i.sub.R] (I) = ([M.summation over (i=1)] [p.summation over (j=1)] [delta]([K.sub.i], [L.sub.j])) - ||Q||

where a [delta]([K.sub.i], [L.sub.j]) is equal to 1 if [K.sub.i] [member of] [L.sub.j], else it's equal to 0; ||Q|| is the number of distinct key-words contained by Q.

Given the set L={[L.sub.1],... [L.sub.N]} of the keywords labels of all the documents stored in the system, and the variable V that stores the temporary solution, the algorithm that implements this procedure, starts a cycle for building all the feasible subsets of L and, every run of the cycle, produces the subset S and it executes the decisional step described in Figure 3. At each step this algorithm verifies whether the just created subset is better than the solution calculated during the preceding iterations, verifying the three rules listed.

Differences Between Selective Greedy and Exponential Searching

We can observe the exponential algorithm gives the best feasible result according to matching the keywords target and to reducing redundancy, but it has a very big computational cost. For showing this, we suppose the following target set T={[K.sub.1], [K.sub.2], [K.sub.3]} and these keyword labels in the system database: [D.sub.1]={[K.sub.1], [K.sub.2], [K.sub.4]}; [D.sub.2]={[K.sub.1]}; [D.sub.3] = {[K.sub.2]}; [D.sub.4]={[K.sub.3]};. We can calculate easily that [d.sub.M](T, [D.sub.1]) = 4/9 and [d.sub.M] (T, [D.sub.i]) = 1/3 with 2 < i [less than or equal to] 4, so the greedy algorithm produces the solution [S.sub.G] = {[D.sub.1], [D.sub.4]} and the exponential searching outputs [S.sub.E] = {[D.sub.2], [D.sub.3], [D.sub.4]}. In both the solutions the target is fully matched, but if we consider the final semantic homogeneity index and we fix [W.sub.G] = {[union] [D.sub.i]|[D.sub.i] [member of] [S.sub.G]} and [W.sub.G] = {[union] [D.sub.i]|[D.sub.i] [member of] [S.sub.E]} we'll see that [d.sub.M]([W.sub.G],T) = 0.75, [d.sub.M]([W.sub.E],T) = 1 instead.

Even if the exponential searching produces a better result, it has a higher computational cost; supposing that N is the number of documents stored in the system, we can observe that selective greedy searching has a computational complexity computational complexity

Inherent cost of solving a problem in large-scale scientific computation, measured by the number of operations required as well as the amount of memory used and the order in which it is used.
 equal to O(N) (Aho, Hopcroft, & Ullman, 1973): in fact each iteration analyzes all the documents stored in the system and, if the maximum number of keywords in a label is upper bound, the computational cost for analyzing each label can be considered constant and equal to [C.sub.1], so the cost of each iteration is [C.sub.1] X N; moreover, in the worst case, the number of iterations is equal to [C.sub.2], that is the number of keywords in the target query and it can be considered constant similarly to [C.sub.1]. The final computational cost is then [C.sub.1] X [C.sub.2] X N, so, if [C.sub.1] and [C.sub.2] are constant complexity is equal to O(N). Exponential searching iteratively produces all the feasible subsets from the N documents stored in the database, so its computational complexity is O([2.sup.N]) (Aho, Hopcroft, & Ullman, 1973): this makes this algorithm unsuitable for context where there is the need of online real time responses. From the discussion we can say that selective greedy searching, given a solution that matches a number of keywords in a target set, which is never less than that one provided by every other solution, is the best agreement between computational cost and quality of solution, even if it doesn't provide the best feasible solution.

[FIGURE 2 OMITTED]

CONCLUSIONS

In this article we focused our attention on data sharing The ability to share the same data resource with multiple applications or users. It implies that the data are stored in one or more servers in the network and that there is some software locking mechanism that prevents the same set of data from being changed by two people at the same time.  and exchanging among e-learning systems. Starting from an IMS metadata model and from neural networks used in pattern classification, we have developed a model of semantic analysis based on keywords labels. We showed that the last one can be used both for classifying documents and for aggregating them; the latter functions can be very useful when a system receives a query made up of a keywords set and it has to reply, building a data aggregation that matches it. In the last part of this article we have described some searching algorithms that can be used both in advanced search engine and in inter-systems communication, when e-learning systems send data requests to one another and they reply, building data aggregations that matches the queries the more feasible the better. In particular, we showed the selective greedy searching algorithm and the exponential algorithm, proofing that the former one doesn't always produce the best solution (as done by the latter one), but it has a very low computational cost, so it's feasible for online and real-time querying.

[FIGURE 3 OMITTED]

REFERENCES

Aho, A., Hopcroft, J., & Ullman, J. (1973) The design and the analysis of computer algorithms, Boston: Addison Wesley.

Beckett, D. (2003). RDF/XML syntax specification. [Online]. Available: http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/

De Pietro, O. (2002). W-Didattica: Un sistema di didattica a distanza Internet-based. Didamatica 2002 Proceedings. Napoli, Italy: Liguori.

Fraternali, P. (1999). Tools and approaches for developing data-intensive web applications: A survey. ACM (Association for Computing Machinery, New York, www.acm.org) A membership organization founded in 1947 dedicated to advancing the arts and sciences of information processing. In addition to awards and publications, ACM also maintains special interest groups (SIGs) in the computer field.  Computing computing - computer  Surveys, 31(3), 34-45.

Huber, J.F., Weiler, D., & Brand, H. (2000). UMTS (Universal Mobile Telecommunications System) The GSM implementation of the 3G wireless phone system. Part of IMT-2000, UMTS provides service in the 2 GHz band and offers global roaming and personalized features. , the mobile multimedia vision for IMT-2000: A focus on standardization standardization

In industry, the development and application of standards that make it possible to manufacture a large volume of interchangeable parts. Standardization may focus on engineering standards, such as properties of materials, fits and tolerances, and drafting
. IEEE (Institute of Electrical and Electronics Engineers, New York, www.ieee.org) A membership organization that includes engineers, scientists and students in electronics and allied fields.  Communication Magazine, 12(3), 12-18.

Khanna, T. (1991) Neural networks foundations. Boston: Addison Wesley.

Klir, G. J., Clair, U. S., & Yuan, B. (1997) Fuzzy set theory--foundations and applications. Amsterdam, The Netherlands: Prentice Hall Prentice Hall is a leading educational publisher. It is an imprint of Pearson Education, Inc., based in Upper Saddle River, New Jersey, USA. Prentice Hall publishes print and digital content for the 6-12 and higher education market. History
In 1913, law professor Dr.
.

Moret, B., & Shapiro, H. (1991) Algorithms from P to NP: Vol. 1: Design and efficiency. Oxford, UK: Benjamin Cummings.

Pao, Y. (1989). Adaptive pattern recognition and neural networks. Boston: Addison Wesley.

Powell, A. (2003). Guidelines guidelines,
n.pl a set of standards, criteria, or specifications to be used or followed in the performance of certain tasks.
 for implementing Dublin Core A set of meta-data descriptions about resources on the Internet. Used for resource discovery, it contains data elements such as title, creator, subject, description, date, type, format and so on. Dublin Core descriptions are often included in HTML meta tags.  in XML. [Online]. Available: http://dublincore.org/documents/dcxml-guidelines/

ORLANDO DE PIETRO AND FRANCESCO APPRATTO, UNIVERSITY OF CALABRIA The University of Calabria (Università della Calabria, UNICAL), also known as the University of Cosenza (Università di Cosenza), is a medium-sized, state-run university in Italy. , ITALY

E-MAIL e-mail: see electronic mail.
e-mail
 in full electronic mail

Messages and other data exchanged between individuals using computers in a network.
: depietro@economia.unical.it

E-MAIL: fappratto@economia.unical.it
COPYRIGHT 2004 Association for the Advancement of Computing in Education (AACE)
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2004, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:Appratto, Francesco
Publication:International Journal on E-Learning
Geographic Code:1USA
Date:Jul 1, 2004
Words:6071
Previous Article:Key factors for determining student satisfaction in online courses.
Next Article:AMANDA: an intelligent system for mediating threaded discussions.
Topics:



Related Articles
TSC Launches eLearning and Knowledge Management Business.
Peer3 Awarded Fifth Era Contract for eLearning.
Worldwide IT Training Market To Approach $34 Billion By 2004, IDC Forecasts.
Vitalect Selects WebEx Network to Deliver Live, Web-Based Customer Training; Live Training Capability Helps Speed Critical Knowledge Transfer Across...
Docent and Cap Gemini Ernst & Young Team to Deliver Learning Solutions to Optimize Corporate Performance and Productivity.
Docent enterprise REL.4.7. (IT News).
New elearning Studio. (IT News).
Software World Index 2001.
Anytime anyplace learning. (eLearning).
The challenge of content creation to facilitate personalized e-learning experiences.

Terms of use | Copyright © 2012 Farlex, Inc. | Feedback | For webmasters | Submit articles