A fully automatic question-answering system for intelligent search in e-learning documents.E-learning is a novel method for presenting information to students for the purpose of education. Currently a sea of information is available in the form of PowerPoint slides, FAQs and e-books. However the potential of this large body of information remains unrealized due to lack of an effective information retrieval information retrieval Recovery of information, especially in a database stored in a computer. Two main approaches are matching words in the query against the database index (keyword searching) and traversing the database using hypertext or hypermedia links. system. Current search engines are used only for the web and return ranked lists of documents. Such engines would not be effective searching tools for e-learning documents and it would be difficult for a user to find the intended answer. This article introduces a fully automatic Question-Answering (QA) System that allows students to ask a question in common language and receive an answer quickly and succinctly suc·cinct adj. suc·cinct·er, suc·cinct·est 1. Characterized by clear, precise expression in few words; concise and terse: a succinct reply; a succinct style. 2. , with sufficient context to validate the answer. The system uses Natural Language Processing Natural language processing Computer analysis and generation of natural language text. The goal is to enable natural languages, such as English, French, or Japanese, to serve either as the medium through which users interact with computer systems such as (NLP (Natural Language Processing) The capability of understanding human language. If the language is spoken, voice recognition plays an important role in converting the sounds to individual words. Then, natural language processing figures out what the words mean. ) techniques to identify the semantic and syntactic Dealing with language rules (syntax). See syntax. structure of the question. It configures itself to a particular domain by automatically recognizing the entities from the course material. The information retrieval engine is used to extract answer passages using contextual information. A closed loop dialogue with the user leads to effective answer extraction through extensive passage analysis. Experimental results of the system are shown over the course material of Computer Networks. ********** E-learning has underlined the importance of quick access to relevant study material for effective education with the major advantage of enabling people to access learning facilities regardless of their location and at the time that is most convenient to them. Business enterprises are widely using this online learning for employee training and education because of its cost saving advantages, especially with respect to time and travel parameters (Dorai, Kermani, & Stewart, 2001). Currently a sea of information is available in the form of PowerPoint slides, digital text and FAQs. However, a lot of time is spent on e-learning by users searching for a desired concept or answer from this huge repository of information. To fill in this gap, an effective Question-Answering (QA) system is required that can retrieve answers to students' questions from the course material, suggest alternatives in case of any ambiguity in the question and thus help them to search for the intended answer. Examples of such interactive closed-loop QA systems developed are HITIQA (Small, Liu, Shimizu, & Strzalkowski, 2003) and SPIQA (Hori, Hori, Isozaki, Maeda, Katagiri, & Furui, 2003). The rapid success of distance education has led to extensive development of course material and its placement on web. A learner does not understand and knows where he can find the related terms and concepts mentioned in the lecture. Searching for topics through table-of-contents or index pages can be tedious and impractical due to a large volume of information present in these domains. For instance, the user wants to know which algorithms sort an array in a particular time complexity (i.e., O(nlogn)). Since such algorithms are distributed throughout the book (like BinarySearch, Mergesort, Binsort, Radixsort, MinHeap sort, etc.), table-of-contents or index pages cannot provide the user much information and he has to search through the entire book. Modern search engines (such as Google) are able to cope with the amount of text available. They are most useful when a user presents a query to the search engine which only returns a couple of documents of which the user can then manually search to find the relevant information. Such engines would not be effective searching tools for e-learning documents and it would be difficult for a learner to find the intended answer from the list of retrieved documents. Searching for a particular concept by keyword or phrase matching is insufficient because in many cases (i.e., for the question, "What is the difference between RIP and BGP (Border Gateway Protocol) The routing protocol that is used to span autonomous systems on the Internet. It is a robust, sophisticated and scalable protocol that was developed by the Internet Engineering Task Force (IETF). protocol?") words like "difference" may not be present; instead, words like "compare" or "contrast" can be there. In other cases like, "Give the time complexity of Mergesort," some semantically related terms like "asymptotic" or "Big O notation In computational complexity theory, big O notation is often used to describe how the size of the input data affects an algorithm's usage of computational resources (usually running time or memory). " have to be identified. The approach taken here is to implement a QA system based on searching in context and entities of a domain for effective extraction of answers to even domain specific questions. The system recognizes the entities by searching from the course material. It is fully automatic as it does not require any manual intervention for configuring it to any particular domain. The focus is on context based retrieval of information. For this purpose a retrieval engine retrieval engine n. A search engine. that works on locality-based similarity heuristics The similarity heuristic is a lesser-known psychological heuristic pertaining to how people make judgments based on similarity. More specifically, the similarity heuristic is used to account for how people make deductions, solve problems, and form biases based on similarity through is used to retrieve relevant passages from the collection, (i.e., passages that can potentially answer the question). During query formulation and expansion, the system tries to make judicious ju·di·cious adj. Having or exhibiting sound judgment; prudent. [From French judicieux, from Latin i interpretation in order to tap the semantics semantics [Gr.,=significant] in general, the study of the relationship between words and meanings. The empirical study of word meanings and sentence meanings in existing languages is a branch of linguistics; the abstract study of meaning in relation to language or of question. The system utilizes natural-language parsers and heuristics heu·ris·tic adj. 1. Of or relating to a usually speculative formulation serving as a guide in the investigation or solution of a problem: in order to return high-quality answers. This system can be used to serve as a first step towards automatic FAQs. It has good utility for a novice in a subject who does not know where to find related terms and concepts. It can also be quite helpful to students just before their exams for getting answers to review questions. Contribution of the Article The following are the contributions of the article: * Automatic Entity Recognition: The system is not restricted to only one domain. It is fully automatic as it learns about the domain by recognizing the entities from the course material. Manual development of structured data or annotations (as commonly used in other systems) is not required. * Integration of Alternative Resources: Different e-learning documents like scanned books and PowerPoint slides have different information and presentation methods. Books are illustrative il·lus·tra·tive adj. Acting or serving as an illustration. il·lus tra·tive·ly adv.Adj. 1. and give detailed analysis of concepts. Slides are condensed con·dense v. con·densed, con·dens·ing, con·dens·es v.tr. 1. To reduce the volume or compass of. 2. To make more concise; abridge or shorten. 3. Physics a. , highlighting the key points. Moreover the collection of material may be comprised of books and slides of different authors and teachers (who present the subject in different styles and concepts). The system tries to integrate information from different types of documents and present the summarised answer to the user. * The system's ability to recognize the context of the problem by using locality based similarity heuristics and query expansion (information science) query expansion - Adding search terms to a user's search. Query expansion is the process of a search engine adding search terms to a user's weighted search. The intent is to improve precision and/or recall. The additional terms may be taken from a thesaurus. (with the help of WorldNet). * Closed loop Q & A: The user is provided with a feedback of related keywords which can help the user to reframe Re`frame´ v. t. 1. To frame again or anew. a relevant question (within the limits of e-learning materials) and extract the answer from the system. Organization of the Article The rest of the article is organized as follows. The Literature Review and Background section gives an account of related work in e-learning and provides background on Question-Answering. The QA System section describes the different components of this QA system in detail. The Results section provides the results of the experiments and the method adopted to test the system's utility. Conclusions and future work follow these sections. LITERATURE REVIEW AND BACKGROUND Related work in E-learning Efforts have been made in the direction of providing ease to the student in extracting information from e-learning documents with respect to effective retrieval and presentation of knowledge. A similar system COVA COVA Colorado Organization for Victim Assistance COVA Central Oregon Visitors Association COVA Central Orleans Volunteer Ambulance (New York) COVA Change of Vehicle Assignment (on content-based retrieval) enables remote users to access specific parts of interest from a large lecture database by contents (Cha, 2002). However, manual development of XML schemas This is a list of XML schemas in use on the Internet sorted by purpose. XML schemas can be used to create XML documents for a wide range of purposes such as syndication, general exchange, and storage of data in a standard format. Bookmarks
v. an·no·tat·ed, an·no·tat·ing, an·no·tates v.tr. To furnish (a literary work) with critical commentary or explanatory notes; gloss. v.intr. To gloss a text. the vast amount of information can be laborious la·bo·ri·ous adj. 1. Marked by or requiring long, hard work: spent many laborious hours on the project. 2. Hard-working; industrious. and impractical. Another approach introduces Genetic Algorithms Genetic algorithms Search procedures based on the mechanics of natural selection and genetics. Such procedures are known also as evolution strategies, evolutionary programming, genetic programming, and evolutionary computation. into a traditional QA system which uses the concept of Case-Based Reasoning An AI problem solving technique that catalogs experience into "cases" and matches the current problem to the experience. Such systems are easier to maintain than rule-based expert systems, because changes require adding new cases without the complexity of adding new rules. (CBR (1) (Computer-Based Reference) Reference materials accessible by computer in order to help people do their jobs quicker. For example, this database on disk! (2) (Constant Bit Rate) A uniform transmission rate. ) (Fu, & Shen Shen, in the Bible, place, perhaps close to Bethel, near which Samuel set up the stone Ebenezer. , 2004). The huge number of cases that would be generated with large repository (with continual growth) and failure in case of complex queries put limitation in its practical use. A different approach taken in knowledge-based content navigation in e-learning applications presents a prototype implementation of the framework for semantic browsing of a test collection of RFC (Request For Comments) A document that describes the specifications for a recommended technology. Although the word "request" is in the title, if the specification is ratified, it becomes a standards document. documents (Mendes, Martinez & Sacks, 2002). They propose the use of fuzzy clustering Fuzzy clustering is a class of algorithm in computer science. Explanation of clustering Data clustering is the process of dividing data elements into classes or clusters so that items in the same class are as similar as possible, and items in different classes are as algorithms to discover knowledge domains and represent those knowledge domains using TopicMaps. However, success largely depends on how accurately the clusters are identified and the representation still suffers from the drawback DRAWBACK, com. law. An allowance made by the government to merchants on the reexportation of certain imported goods liable to duties, which, in some cases, consists of the whole; in others, of a part of the duties which had been paid upon the importation. attributed to table-of-contents page. E-learning Media Navigator (ELM-N) from IBM Research IBM Research, a division of IBM, is a research and advanced development organization and currently consists of eight locations throughout the world and hundreds of projects. is a system with which a user can access and interact with online heterogeneous course materials (Dorai, Kermani, & Stewart, 2001). Their efforts are aimed to reduce human effort and manual annotation 1. (programming, compiler) annotation - Extra information associated with a particular point in a document or program. Annotations may be added either by a compiler or by the programmer. work in order to make the system viable for voluminous information. Furthermore, challenges remain in the area of easy-to-use content delivery, access and augmented interaction. Background on Question Answering Question answering (QA) is a type of information retrieval. Given a collection of documents (such as the World Wide Web or a local collection) the system should be able to retrieve answers to questions posed in natural language. A QA system provides direct answers to user questions by consulting its knowledge base. It attempts to allow the user to ask questions in natural language and receive an answer quickly and succinctly, with sufficient context to validate answer (Hirschman, & Gaizauskas, 2001). Some QA systems that cater to a specific domain have been developed at very early stage. LUNAR (Woods, 1973) was such a closed domain QA system that it only answered questions related to moon rocks and soil gathered by the Apollo 11 mission. However, it relied on having the data to be available in a highly structured form and not as completely unstructured text. The availability of huge document collections (for example, the web itself), combined with improvements in information retrieval (IR) and Natural Language Processing (NLP) techniques, has attracted the development of a special class of QA systems that answers natural language questions by consulting a repository of documents (Cody, Oren, & Daniel, 2001). Most of the QA systems that have been developed treat the web as a collection of documents and thus cater to huge variety of questions. One of the commercial search engines known as AskJeeves responds to natural language questions, but its recall is very limited because the search engine uses its knowledge base (which is at least partially hand constructed) to answer questions and to update the knowledge base when asked a question which it has not encountered before. Another QA system, MULDER (Kwok et al., 2001) is claimed to be the first general-purpose, fully-automated question-answering system available on the web. MULDER's architecture, relies on multiple search-engine queries, natural-language parsing See parse. parsing - parser , and a novel voting procedure to yield reliable answers (with a recall of the same level as that of Google). However, the difficulty of NLP has limited their ability to give accurate answers to questions that are quite specific to a domain. In addition to the traditional difficulties associated with syntactic analysis, there remains many other problems to be solved, (e.g., semantic interpretation This is an important component in dialog systems. It is related to natural language understanding, but mostly its refers to the last stage of understanding. The goal of interpretation is binding the user utterance to concept, or something the system can understand. , ambiguity resolution, discourse modelling, inference, common sense, etc.). QA systems on the web try to answer questions that require a fact or one word answer. This is difficult for questions that are specific to a domain because the targeted domain is unrestricted and no assumption can be judiciously ju·di·cious adj. Having or exhibiting sound judgment; prudent. [From French judicieux, from Latin i made. E-learning questions are more complex than TREC-type questions as they require domain knowledge and long answers need to be extracted from multiple documents. Moreover these questions have inherent ambiguity. The objective is to allow the user to submit exploratory, analytical, non-factual questions such as, "How does Mergesort sort an array?" The distinguishing property of such questions is that one cannot generally anticipate what might constitute the answer. While certain types of things may be expected, the answer is heavily conditioned by what information is available on the topic. Users generally prefer answers embedded Inserted into. See embedded system. in context, regardless of the perceived reliability of the source documents (Lin, Quan, Sinha, Bakshi, Huynh, Katz, & Karger, 2003). When users search for a topic, an increased amount of text returned significantly decreases the number of queries that they pose to the system. The QA System Figure 1 shows the architecture of our QA system. The user begins by configuring the system to the particular course domain by triggering the Automatic Entity Generator module which recognizes domain specific entities from that particular course's documents. The question submitted by the user is classified in Question Classification to identify its case. The question is parsed using the Link Parser A routine that analyzes a continuous flow of text-based input and breaks it into its constituent parts. See parse. (language) parser - An algorithm or program to determine the syntactic structure of a sentence or string of symbols in some language. which constructs the linkage structure of the question. This information is used for extracting relevant information (like part of speech) during Question Parsing. Subsequently, Query Formulation translates the question into a set of queries that are given as keyword input to the Retrieval Engine. Query Expansion is needed to tap the semantic of the question and improve the answer extraction. The engine returns top passages after weighting and ranking them on basis of locality. Finally, Answer Selection is done by further extensive passage analysis, and is then presented to the user. To improve answers (if the user is not satisfied) the system takes user feedback which is again followed by answer extraction and selection. Each part is described in detail in the next section. [FIGURE 1 OMITTED] Question Classification The Question Classifier used pattern matching 1. pattern matching - A function is defined to take arguments of a particular type, form or value. When applying the function to its actual arguments it is necessary to match the type, form or value of the actual arguments against the formal arguments in some definition. based on wh-words and simple information to determine question types. The questions were broadly classified into the following categories: * Questions containing the keywords such as 'various,' 'ways,' 'difference,' 'types,' and 'compare.' These keywords require answers to be extracted from more than one passage. For example, "What are the various algorithms for sorting an array in O (nlogn) time complexity?" or, "What is the difference between RIP and BGP?" Normally, answers to such questions need to be extracted from several passages. * Questions that ask for numerical data Numerical data (or quantitative data) is data measured or identified on a numerical scale. Numerical data can be analysed using statistical methods, and results can be displayed using tables, charts, histograms and graphs. or date. Such questions were identified by a wh-phrase ("How many?", "How tall?", "When?"). The answer passages must focus on numerical data. * Questions that can be answered from one passage. The Question Focus (object of the verb Noun 1. object of the verb - the object that receives the direct action of the verb direct object object - (grammar) a constituent that is acted upon; "the object of the verb" ) is used to find the relevant answer. Question Parsing Usually search engines use keywords from the question to construct queries neglecting unimportant un·im·por·tant adj. Not important; petty. un im·por tance n. words like 'of,'
'for,' 'at,' etc. No importance is given to the
syntactic structure of the question while picking up keywords. In such
cases the meaning of the question is lost. For example, no difference
exists among the questions 'how,' 'why,' or
'what.' This QA system uses Link Grammar Parser Link Grammar Parser, or LinkParser is a syntactic parser by Davy Temperley, Daniel Sleator and John Lafferty of Carnegie Mellon University. by Martin Chase. Examples
n. Abbr. NP A phrase whose head is a noun, as our favorite restaurant. Noun 1. noun phrase - a phrase that can function as the subject or object of a verb nominal, nominal phrase are identified to tap the semantic structure of the question. This information is used to select plausible answers from the e-learning materials. The Link Parser is a syntactic parser of English, based on link grammar Link grammar (LG) is a theory of syntax by Davy Temperley and Daniel Sleator which builds relations between pairs of words, rather than constructing constituents in a tree-like hierarchy. There are two basic parameters: directionality and distance. , an original theory of English syntax. Given a sentence, the system assigns to it a syntactic structure, which consists of a set of labelled links connecting pairs of words. (Temperley, Sleator, & Lafferty, 1993). The parser has a dictionary of about 60,000 word forms. It has coverage of a wide variety of syntactic constructions, including many rare and idiomatic id·i·o·mat·ic adj. 1. a. Peculiar to or characteristic of a given language. b. Characterized by proficient use of idiomatic expressions: a foreigner who speaks idiomatic English. ones. The parser is robust; it is able to skip over Verb 1. skip over - bypass; "He skipped a row in the text and so the sentence was incomprehensible" pass over, skip, jump neglect, omit, leave out, pretermit, overleap, overlook, miss, drop - leave undone or leave out; "How could I miss that typo?"; "The portions of the sentence that it cannot understand, and assign some structure to the rest of the sentence. It is able to handle unknown vocabulary, and make intelligent guesses from context and spelling about the syntactic categories Noun 1. syntactic category - (grammar) a category of words having the same grammatical properties grammatical category grammar - the branch of linguistics that deals with syntax and morphology (and sometimes also deals with semantics) of unknown words. It has knowledge of capitalization, numerical expressions, and a variety of punctuation punctuation [Lat.,=point], the use of special signs in writing to clarify how words are used; the term also refers to the signs themselves. In every language, besides the sounds of the words that are strung together there are other features, such as tone, accent, and symbols. The Link Parser works as follows. The dictionary of nouns, verbs, adverbs, prepositions and adjectives is used to parse a sentence. The parser starts at the right end and searches linkages throughout the sentence. It considers each entry for the word as a different word and generates all linkages found for all entries. This parser considers relationships between pairs of words. For example, in the sentence shown in Figure 2 there is an S (subject) relation between "Internet" and "is," and a D (determiner) relation between "a" and "network." The requirements, like parts of speech, syntactic functions and constituents, can be recovered from the link structure rather easily. For example, whatever word is on the left end of an S-link is the subject of a clause (or the head word of the subject phrase); whatever is on the right end is the finite verb A finite verb is a verb that is inflected for person and for tense according to the rules and categories of the languages in which it occurs. Finite verbs can form independent clauses, which can stand by their own as complete sentences. ; whatever is on the left-end of a D-link is a determiner, etc. The system finds the question focus by using the S or O linkage to get the object of the verb. Importance is given to question focus by assigning it more weightage during retrieval of answers. Moreover, all nouns, verbs, and adjectives in the dictionary are subscripted (as ".n," ".v," or ".a"), so in these cases the syntactic category of the word is made explicit. The constituent structure constituent structure n. Grammar An analysis, often in the form of a schematic representation, of the constituents of a construction, such as a sentence. of sentences, while not absolutely explicit, is also quite close to the surface in linkage structures. Constituents can be defined as sets of words which can be reached from certain links, tracing in a certain direction. For example, a verb phrase verb phrase n. Abbr. VP 1. A phrase consisting of a verb and its auxiliaries, as should be done in the sentence The students should be done with the exam by noon. 2. is everything reachable from an S-link, tracing to the right--that is, not tracing through the left end of the S-link itself. For noun phrases there are several possibilities. Anything that can be reached from an O-link by tracing right is an NP (noun phrase). The system tries to find all possible NP in the question. For example, the following NPs were found in the question, "Why are buffers needed at the output port of a switch?"--[buffers], [the output port of a switch], [the output port], [a switch]. Automatic Entity Recognition This module tries to recognize the entities in a particular course (domain specific entities) to which the user wants to pose questions. This configures the system automatically to any type of course domain. The system administrator on the server providing distance learning (or the user who wants to search answer from documents present in his local system) gives an index or table of contents file as input. The module runs Link Parser on every line giving its syntactic structure. It takes nouns, adjectives and verbs (ending with ing) as entities (as they carry the focus of the sentence). In the absence of table of contents or index pages, the system searches through the main heading and sub headings of slides or digital text for recognizing the entities. If no linkage is formed it tokenizes the string and word filtering is done to remove any elementary words (as shown in Table 1). If no elementary words are found in the string then the whole string is also taken as an entity (for example, Binary Search Tree In computer science, a binary search tree (BST) is a binary tree data structure which has the following properties:
[FIGURE 2 OMITTED] Query Formulation The query formulation module converts the user's question into a set of keywords (query) which is then sent to the retrieval engine for answer extraction. The system uses the entity file to recognize the domain specific entities in the question. During initialization in·i·tial·ize tr.v. in·i·tial·ized, in·i·tial·iz·ing, in·i·tial·iz·es Computer Science 1. To set (a starting value of a variable). 2. To prepare (a computer or a printer) for use; boot. 3. , the system reads from default file (which can be set to a particular course by the user) and constructs a hash table A lookup table that is designed to efficiently store non-contiguous keys (account numbers, part numbers, etc.) that may have wide gaps in their alphabetic and numeric sequences. Hash tables are created by using a hashing function (algorithm) to hash the keys into hash buckets. of these entities. Individual words in the question are compared from this table to identify the entities. These keywords are considered most important and are given the maximum weightage of 2. The question focus (object of the verb) identified during question parsing is also given the same weightage of 2. Elementary words (as shown in Table 1) are given the weightage 0. The rest of the words in the question are given the weightage 1. Query Expansion: Extending the query through query expansion enhances the search process by including semantically related terms and thus retrieves texts in which the query terms do not specifically appear (Gonzalo, Verdejo, Chugur, & Cigarran, 1998). For example, in questions like, "Compare and contrast link state and distance vector routing algorithm," the answers may occur in sentences such as "The difference between ..." The system uses a popular thesaurus called WordNet to identify semantically related concepts. WordNet is a semantic network (data) semantic network - A graph consisting of nodes that represent physical or conceptual objects and arcs that describe the relationship between the nodes, resulting in something like a data flow diagram. containing words grouped into sets called synsets. Synsets are linked to each other by different relations such as synonyms, hypernyms and meronyms. For nouns, the most common and useful relation is the is-a relation. This exists between two concepts when one concept is-a-kind-of another concept. Such a concept is also known as a hypernym n. 1. a word that is more generic or more abstract than a given word; a word designating a class of which the given word is a member. Inverse of Noun 1. . For example, a computer is a hypernym of machine. This creates a network where related concepts can be identified (to some extent) by their relative distance from each other. Only those query terms were expanded which do not occur as domain entities. Gaining from this knowledge, query evaluation is no longer restrained to query terms submitted by users but may also embody synonymous or semantically related terms. However, caution is taken as these newly found terms are not as reliable as the initial terms obtained from users. Only closely related terms are taken that have direct relation with either the query term itself or with the words that are directly related to the query term. An appropriate weighting (0.5) scheme allows a smooth integration of these related terms by reducing their influence over the query. Answer Extraction To extract passages from the collection of documents an Information Retrieval engine is needed to analyse the keywords and passages in detail. The answers to a query are locations in the text where there is local similarity to the query, and similarity is assessed by a mechanism that employs as one of its parameters the distance between words (Kretser, & Moffat, 1999). For this purpose it was found that the locality-based similarity heuristic (in which every word location in each document is scored) provides retrieval effectiveness as good as the document-based technique, and has the additional advantage of presenting focussed answer passages (instead of the whole document) with sufficient context to validate the answer. Therefore, the engine used is based on this concept and has been customized for this application. The important features of Locality-Based Retrieval (with Similarity) in this context are: * The focus is on local context by considering top n ranked passages, instead of the top n documents. * Each term has a certain scope, where its importance decreases with respect to the distance from that term. * Similarity is computed as the sum of weighted overlaps between terms. It is based on intuitive notion that the distance between terms is indicative of some semantics of the sentence. The entire retrieval process is carried out using a world-level inverted index (database, information science) inverted index - A sequence of (key, pointer) pairs where each pointer points to a record in a database which contains the key value in some particular field. using all of the terms in the automatically generated query. An example of a construction of word-level inverted inverted reverse in position, direction or order. inverted L block a pattern of local filtration anesthesia commonly used in laparotomy in the ox. page list is shown in Figure 3. The drawback of the seamless approach is that more index information must be manipulated and that querying requires more resources, but with the use of appropriate techniques these costs are manageable. Using this fully automatic mechanism, results as good as or better than comparable document-based retrieval techniques, and are obtained within relatively modest resource requirements The components of a system that are required by software or hardware. It refers to resources that have finite limits such as memory and disk. In a PC, it may also refer to the resources required to install a new peripheral device, namely IRQs, DMA channels, I/O addresses and memory . [FIGURE 3 OMITTED] Rather than considering the text collection to be a sequence of documents, it is considered to be a sequence of words, and query term occurrences within the collection are presumed to exert an influence over a neigh-bourhood of nearby words. Then, supposing that the influence from separate query terms is additive, the contribution of each occurrence of each query term is summed to arrive at a similarity score In Sabermetrics and APBRmetrics, similarity scores are a method of comparing baseball and basketball players (usually in MLB or the NBA) to other players, with the intent of discovering who the single most similar historical player is to a certain player. for any particular location in any document in the collection. This concept is illustrated in Figure 4. The contribution function [c.sub.t] is then defined in terms of l, the location of the query term (as an integral word number); x, the word location at which we seek to calculate a contribution; [h.sub.t], the peak height assigned to the term, assumed to occur at the word position occupied by the term in question; and [s.sub.t], the one-sided spread of the term. The parameters that are used for scoring the passages are: * N: Total number of terms in the collection * Term frequency ([f.sub.t]): How often the term t appears * [F.sub.q,t]: Within query frequency of the term * Inverse document frequency (idf): log (N/[f.sub.t]) * Height ([h.sub.t]): The height assigned to a term t is a monotonic function “Monotonic” redirects here. For other uses, see Monotone. In mathematics, a monotonic function (or monotone function) is a function which preserves the given order. of the term's scarcity Scarcity The basic economic problem which arises from people having unlimited wants while there are and always will be limited resources. Because of scarcity, various economic decisions must be made to allocate resources efficiently. in the collection. [h.sub.t] = [F.sub.q,t] * log (N/[f.sub.t]) [FIGURE 4 OMITTED] * d = |x - l| is the distance in words between the term in question and the location at which its influence is being evaluated. In each case the value of ct(x; l) is defined to be zero when |x - l| > [s.sub.t] [C.sub.t](x, l) = [h.sub.t] * [square root of (1 - (d/[s.sub.t])[.sup.2])] The top N (value set by the user) ranked passages (window surrounding the location) is returned after scoring all the locations of the query term according to according to prep. 1. As stated or indicated by; on the authority of: according to historians. 2. In keeping with: according to instructions. 3. the weightage assigned to them. The implementation also handles case folding Case folding is a term denoting the conversion of all characters in a string to lower or upper-case, typically to make case-insensitive comparisons. Case Folding in some high-level languages Most, if not all, BASIC dialects provide these basic functions: and stemming (to match up a keyword with any of its other grammatical forms) of words while searching the words and indexing them into the inverted page list. For repeated use, the system can be configured to reduce the retrieval time manifold manifold In mathematics, a topological space (see topology) with a family of local coordinate systems related to each other by certain classes of coordinate transformations. Manifolds occur in algebraic geometry, differential equations, and classical dynamics. . This is done by searching all the domain specific entities (as already identified) from the documents and indexing them into the table beforehand. This increases the speed of the system since each time the question is asked most of the query terms location are already available and the system does not need to search again (except in the case when additional documents have been added). Answer Selection and Presentation The top-ranked passages which are now returned (after weighting and ranking on basis of locality and context) are answer candidates. These are further processed to select those answer passages that will be presented to the user. Some passages may be ranked higher just because of frequent occurrence of one of the principal terms in the query without actually illustrating the intended relation for which the user has asked. For example, the user gives following question: "What is the difference between Bus and Star network topologies See topology. ?" It is probable that a passage from the introduction of network topology (where occurrence of "network" and "topology topology, branch of mathematics, formerly known as analysis situs, that studies patterns of geometric figures involving position and relative position without regard to size. " is more frequent with just a reference to Bus and Star topology See star network. ). To avoid these situations the system searches the occurrence of Noun Phrases (identified in the Question Parsing section) in the passages. Those passages in which matches are found are ranked higher amongst the top ones. After phrase matching, the system processes the passages according to the classification done in question classification. If the question was classified in the second category requiring any date or numerical expression then the system searches for these terms in the passages to match the answer type. For questions in the first category, the system extracts information from more than one passage (those which are scored higher than a threshold value) and presents all of them to the user along with the links to their respective locations in the documents (as shown in Figure 5). This helps the learner to quickly find the relevant information from many documents and to understand the concept. Furthermore, if the top passages are coming from different resources (slides or books of different authors) then they are ranked separately (amongst the same type of resource) and best answer passage from each is presented to the user. Feedback Feedback is the one of the important parts of the QA system that distinguishes it from other QA systems being used today. It provides interactivity between the user and the system. When the question is ambiguous, proper feedback can guide the user to improve the query or reformulate Verb 1. reformulate - formulate or develop again, of an improved theory or hypothesis redevelop formulate, explicate, develop - elaborate, as of theories and hypotheses; "Could you develop the ideas in your thesis" the question and get the intended answer. This mechanism prevents the system from failing in case of questions where focus was not clear and proper context was not used. It provides feedback to the user by suggesting extra keywords to be included in the query (as shown in Figure 6). This is done through a closed loop dialogue. Closed Loop Dialogue The user inputs a question at the specified place in natural language. After the user has entered the question he observes a sequence of passages as probable answers to his question. With the passages hyperlinks are provided so that user can access the documents concerned. If the user is not satisfied with the answers provided he can opt for improving the query. This is done through domain-specific query expansion. In such cases the system goes for extensive passage analysis where domain specific entities are searched from lower ranked passages. These entities are then suggested as extra keywords to be included in the query. This guides the user on how to improve the query or reformulate the question in such a way that can extract relevant answers from the system. The user can choose any number of entities (amongst the suggested ones) which he thinks can improve his question. He may also opt for reformulating the whole question. [FIGURE 5 OMITTED] RESULTS The main goal of our experiments was to determine the efficiency of our system to locate the exact answers or give an indication of having the exact answer just near to the retrieved passages. For experiment purpose, a course on Computer Networks was selected. Text books (scanned) of "Computer Networking
Computer networking is the engineering discipline concerned with communication between computer systems or devices. : A Top-down Approach Top-down approach A method of security selection that starts with asset allocation and works systematically through sector and industry allocation to individual security selection. Featuring the Internet" by James F. Kurose and Keith W.Ross and "Computer Networks 4th Edition" by Andrew S. Tanenbaum Andrew S. Tanenbaum - Andrew Tanenbaum were used along with their PowerPoint slides. The questions used for testing were picked from review questions at the back of the chapters and FAQs available on Internet. Also a separate collection of questions was drawn out by a survey among students with their knowledge of computer networks varying from beginners (not familiar with the subject) to students who performed well in the subject. [FIGURE 6 OMITTED] The questions covered a wide range of topics on computer networks. They were of varying type, complexity and difficulty. Questions were nonfactual, explanatory and required extracting passages from different places. Three results per query were extracted. The results are shown in Table 2 and Table 3. The time for information retrieval was quite negligible and we aim to make it faster in the near future. The percentages of confidence (on average) the system had that the answer was present in the first, second and third passage was on average 100%, 85%, and 65%. In Table 2, the questions that were answered in these passages are given in the second, third and fourth columns. Under the column Directs, those questions were included which were not answered directly, but gave the indication of the exact answer to be contained in the same document (near the retrieved passages). Those questions which were answered, only after taking feedback from the user, were included in the next column. Those questions which could not be answered by the system were included under the column Failed. In nearly 11% of the questions, our system failed to get the right answer. Amongst these, nearly half of the questions were not within the purview The part of a statute or a law that delineates its purpose and scope. Purview refers to the enacting part of a statute. It generally begins with the words be it enacted and continues as far as the repealing clause. of the material. The rest of the cases were because the frequency of occurrence of keywords factor failed, giving undue importance to certain keywords. In 7.5% of the questions, the answer improved from failure to exact (because of our query expansion technique). It successfully answered questions like, "What is the difference between source-based tree and centre-based trees in ..." by extracting passages from two different documents. The results are quite pleasing and the importance of feedback is made apparent because it improved the system in the case of failure by giving the right answer. CONCLUSIONS In this article, a QA system is proposed which can solve a learner's problems to a great extent with minimal human-computer dialogue. Using the concept of entities the system is fully automated to work in any subject domain with some input from human expertise. The system is based on searching in context and utilizes syntactic and partial semantic information. This achieves good accuracy in results. While additional work is required to enhance the speed and prediction accuracy of the system and to enable it to withstand a very high workload, our initial experiments are promising. The system can handle multiple resources as is frequently available in e-learning domain. The current implementation utilizes only partial semantic information during answer extraction and selection. It is believed that recall would be much higher if these factors were taken more into consideration. Improvement upon the search facility can be done by storing previous queries and links of their respective answers which were accepted by users in full confidence. Fundamental approach used by (Kutay, & Ho, 2003) for the analysis of students' interaction and learning could be helpful in such a design. Such a facility could be used to help future users and will facilitate group learning, although this will be a burden on the memory of the system. In addition, a learner model similar to building a user model as done by (Davis, Kay, Kummerfeld, Poon poon n. Any of several trees of the genus Calophyllum, of southern Asia, having light hard wood used for masts and spars. [Sinhalese p , Quigley, Suanders, Yacef, 2003) could be used for enhancing accuracy for repeated use by a learner.
Table 1 Examples of removed words
Words Removed
By Is So As
To Otherwise The Will
An In For Of
Does At Are Did
Be Over We Our
Table 2 Our questions (mostly Review questions and FAQs)
#Questions ANSWER 1 ANSWER 2 ANSWER 3 DIRECTS FEEDBACK FAILED
150 72 15 3 32 12 16
Table 3 Questions collected from survey
Questions ANSWER 1 ANSWER 2 ANSWER 3 DIRECTS FEEDBACK FAILED
25 (experts) 10 5 2 3 1 4
25 (naives) 14 4 1 2 2 2
Note Source code for the implementation work can be requested at the following email address See Internet address. : ankumfec@iitr.ernet.in References Cody, C. T. K., Oren, E., & Daniel, S. W. (2001). Scaling question answering to the Web. Proceedings of the Tenth International Conference on World Wide Web, 150-161. Cha, G. (2002). COVA: A System for content-based distance learning. Proceedings International WWW WWW or W3: see World Wide Web. (World Wide Web) The common host name for a Web server. The "www-dot" prefix on Web addresses is widely used to provide a recognizable way of identifying a Web site. Conference (11), Honolulu, Hawaii For the city and county of Honolulu, see City & County of Honolulu. “Honolulu” redirects here. For other uses, see Honolulu (disambiguation). Honolulu is the capital as well as the most populous community of the State of Hawaii, United States. , USA. Davis, J., Kay, J., Kummerfeld, B., Poon, J., Quigley, A., Saunders, G., & Yacef, K. (2003). Using workflow, user modeling and tutoring strategies for just-in-time document delivery. AIEd2003. Workshop on Technologies for Electronic Documents for Supporting Learning (Volume X). Dorai, C., Kermani, P., & Stewart, A. (2001). ELM-N: E-learning media navigator. International Multimedia Conference Proceedings of the Ninth ACM (Association for Computing Machinery, New York, www.acm.org) A membership organization founded in 1947 dedicated to advancing the arts and sciences of information processing. In addition to awards and publications, ACM also maintains special interest groups (SIGs) in the computer field. International Conference on Multimedia, 634-635. Fu, Y., & Shen, R. (2004). GA based CBR approach in Q & A system. Expert Systems with Applications, 26(2), pp. 167-170(4). Gonzalo, J., Verdejo, F., Chugur, I., & Cigarran, J. (1998). Indexing with WordNet synsets can improve text retrieval. Proceedings of the COLING/ACL '98 Workshop on Usage of WordNet for NLP.Montreal, Canada, 38-44. Hirschman, L., & Gaizauskas, R. (2001). Natural language question answering: The view from here. Natural Language Engineering. Hori, C., Hori, T., Isozaki, H., Maeda, E., Katagiri, S., & Furui, S. (2003). Study on spoken interactive open domain question answering. SSPR SSPR S&S Public Relations, Inc SSPR Self-Service Password Reset SSPR Scottish Society for Psychical Research SSPR System Selection for Preferred Roaming 2003, pp. 111-113. Kretser, O.D., & Moffat, A. (1999). Effective document presentation with a locality-based similarity heuristic. Proceedings of the 22nd Annual International ACM SIGIR SIGIR Special Interest Group on Information Retrieval (Association for Computing Machinery) SIGIR Special Inspector General for Iraq Reconstruction Conference on Research and Development in Information Retrieval, San Francisco San Francisco (săn frănsĭs`kō), city (1990 pop. 723,959), coextensive with San Francisco co., W Calif., on the tip of a peninsula between the Pacific Ocean and San Francisco Bay, which are connected by the strait known as the Golden , pp. 113-120. Kutay, C., & Ho, P. (2003). Using intelligent agents for the analysis of students interaction and learning. AIEd2003. Workshop on Technologies for Electronic Documents for Supporting Learning, (Volume X). Lin, J., Quan, D., Sinha, V., Bakshi, K., Huynh, D., Katz, B., & Karger, D.R. (2003). The role of context in question answering systems. Proceedings of the 2003 Conference on Human Factors in Computing Systems. Mendes, M. E. S., Martinez, E., & Sacks, L. (2002). knowledge-based content navigation in e-learning applications. The London Communication Symposium. Small, S., Liu, T., Shimizu, N., & Strzalkowski, T. (2003). HITIQA: An interactive question answering system, a preliminary report. Proceedings of ACL See access control list. 1. ACL - Access Control List. 2. ACL - Association for Computational Linguistics. 3. ACL - A Coroutine Language. A Pascal-based implementation of coroutines. ["Coroutines", C.D. '03, Workshop on QA. Sapporo, Japan. Temperley, D., Sleator, D., & Lafferty, J. (1993) Parsing English with a link grammar. Third Annual Workshop on Parsing Technologies. Woods, W. (1973). Progress in natural language understanding--An application to lunar geology. AFIPS (American Federation of Information Processing Societies Inc.) An organization founded in 1961 dedicated to advancing information processing in the U.S. It was the U.S. representative of IFIP and umbrella for 11 membership societies. Conference Proceedings, 42, 441-450. PRAVEEN KUMAR, SHRIKANT KASHYAP, ANKUSH MITTAL, AND SUMIT GUPTA Indian Institute The Indian Institute in central Oxford, England is located at the north end of Catte Street on the corner with Holywell Street and faching down Broad Street from the east.[1] of Technology, India pkmaxuec@iitr.ernet.in shrikuec@iitr.ernet.in ankumfec@iitr.ernet.in sumitfec@iitr.ernet.in |
|
||||||||||||||||||

tra·tive·ly adv.
im·por
Printer friendly
Cite/link
Email
Feedback
Reader Opinion