Printer Friendly

Workshop on Human Language Technology and Knowledge Management. (Workshop Reports).

The Workshop on Human Language Technology and Knowledge Management was held on July 6 and 7 in Toulouse, France, in conjunction with the meeting of the Joint Association for Computational Linguistics and European Association for Computational Linguistics (ACL / EACL '01). Human language technologies promise solutions to challenges in human-computer interaction, information access, and knowledge management. Advances in technology areas such as indexing, retrieval, transcription, extraction, translation, and summarization offer new capabilities for learning, playing, and conducting business. These adances promise to support enhanced awareness, creation, and dissemination of enterprise expertise and know-how.

Organized by the European Network of Excellence in Human Langauge Technologies (Steven Krauwer, U. Utrecht) and The MITRE Corporation (Mark Maybury), the workshop brought together a group of 50 computational linguists, Al researchers, and computer scientists from North America, Europe, Asia, Australia, and South Africa working in a range of areas (for example, speech and language processing, translation, summarization, multimedia presentation, content extraction, dialog tracking) both to report advances in human language technology and their application to knowledge management and to work toward a road map for the human language technologies for the next decade. In part, the workshop focused on human language technologies that could enable knowledge management functions such as the following:

Expert discovery: Modeling, cataloging, and tracking of distributed organizations and communities of experts

Knowledge discovery: Identification and classification of knowledge from unstructured multimedia data

Knowledge sharing: Awareness of, and access to, enterprise expertise and know-how

Table 1 (from Mark Maybury's introduction to the workshop) illustrates how these knowledge management functions are supported by a broad range of human language technologies, including query analysis-retrieval, information extraction, question answering, machine translation, agent-user modeling, summarization, presentation generation, and awareness-collaboration.

During the second day, John Domingue, deputy director of The Knowledge Media Institute at The Open University in England, gave the keynote entitled "Supporting Organizational Learning through the Enrichment of Documents." According to Domingue, only a small percentage of corporate training is ever applied within the workplace because organizations tend to use school-based methods of learning in contrast to organizational learning based on theories of learning in the workplace. Domingue described knowledge sharing by enriching web documents with informal and formal representations, a process that captures the context in which a document is created and applied. Domingue demonstrated how this enrichment facilitates retrieval and comprehension.

In addition, the group heard an invited talk from Hans Uszkoreit (DFKI Saarbruecken), scientific director at the German Research Center for Artificial Intelligence (DFKI), head of DFKI Language Technology Lab, and professor of computational linguistics at the Department of Computational Linguistics and Phonetics of Saarland University at Saarbrucken. Uszkoreit's talk was entitled "Crosslingual Language Technologies for Knowledge Creation and Knowledge Sharing." He described how "language technology can provide means for associating shared knowledge with the relevant decision situations by automatically linking it to the critical elements within decision triggers, that is, electronic documents in the work flow that demand and record a decision." Uszkoreit described the role of information extraction, automatic hyperlinking, and (human) inferencing in this process. He exemplified this "automatic relational hyperlinking" using the example of a hypercode system developed for a large German bank to facilitate work with legacy code by densely interlinking source code and documentation. Uszkoreit concluded by addressing cross-lingual knowledge management, describing his efforts to augment general-purpose translation systems with specialized terminology and transfer rules for multilingual expert groups in a project for a large multinational automobile manufacturer.

A poster session included system demonstrations and offered participants an opportunity for rich dialogue and interaction. Major papers sessions were held in ontology construction, question answering, summarization, multilingual processing, multimedia processing, and dialogue. Group brainstorming sessions followed each major technology theme, focusing on construction of a road map. During these sessions, the group focused on an analysis of the present situation, a vision of where we want to be in the future, and a number of intermediate milestones that would help in setting intermediate goals and measuring our progress toward our goals.

The group outlined key challenges and promising solutions in the areas of ontology, summarization, multilingual processing, and multimedia processing. With respect to ontologies, the group emphasized the need for tools and tasks that were reusable across domains to create and populate ontologies; the importance of a user-centered process view; the need to integrate shallow and deep methods; the need to collaborate with domain ontology creators; and the need to address ontology quality, ambiguity, and usability (for example, using tools for structuring, integrating, visualizing, and accessing massive or heterogeneous ontologies). The group highlighted the promise of the semantic web, the importance of information extraction "plug-ins," the possibility of organizing massive documents using domain-specific ontologies, the opportunity to use a "top" or core ontology to bootstrap new domains, the value of multidisciplinary collaborative teams (for example, domain experts, linguists, knowledge engineers), the value of controlled language management, and the promise of component-based methods to facilitate ontology decomposition, reuse, and life-cycle management.

With respect to summarization, the group outlined challenges as including the appropriate level-depth of analysis-representation (for example, semantic relations, speech acts, rhetorical structure), summarization presentation-visualization, speech for presentation of short summaries, the appropriate use of indicative versus informative summaries, and the need for action-oriented summaries (for example, executive-management summaries). The group discussed a range of solutions encompassing the analysis of information, its transformation (including operations such as selection, aggregation, abstraction), and its presentation.

The group also identified a number of fundamental multilingual challenges, including relations between cultures, languages, lexical resources, and ontologies; the importance of domain knowledge and the adaptation-integration of semantic resources; the complexity of dealing with one-to-one translation of even the 200 most spoken-written languages (requiring 39,000 language pairs); the need for large-scale, robust natural language processing and, at the same time, the importance of fine-grained linguistic knowledge; and the challenge of new application domains such as content-driven hypertextual authoring and crosslingual news linking.

The group identified resources (for example, WORDNET, EURONET, application databases, text resources) as key to advancement, the INTERLINGUA approach as promising, the importance of deeply annotated data combined with machine learning, the promise of translation memories and machine learning, and the possibility of tailoring multiple ontologies to users and their tasks.

Finally, the group turned its attention to multimedia challenges and opportunities. Challenges include the integration of multiple media; the nature of processing (is it centralized or mobile); the challenges of privacy, security, and scalability; the importance of both remembering and forgetting information; the need for multilingual and multisource information extraction; and the challenge of cross-document coreference resolution. Location-based services were highlighted as a promising future area.

Two cross-cutting enabling capabilities were identified for all the addressed areas. First is the need for (intelligent) text annotation. Second is the need for large-scale annotated corpora to enable automated training and system evaluation.

ELSNET has captured the workshop input and will continue to revise a technology road map. A web site to share the materials and results of the workshop has been set up. (1)
Table 1. Human Language Technology for Knowledge Management.

Human          Grand Challenges             Benefits to Knowledge Dis-
Language                                    covery, Access, Exploita-
Technology                                  tion

Input or       Interpretation of impre-     Natural (written, spoken,
Query          cise, ambiguous, or par-     gestural) access to infor-
Analysis       tial multimodal input. Fa-   mation and knowledge.
               cilities include spoken      Decrease in access com-
               query processing, visual     plexity or user training.
               query analysis (e.g.,        Broaden availability of
               sketching), and mixed-       knowledge to users.
               media query (e.g., text
               and graphics).
Retrieval      Natural language proces-     Enhancements to document-
               sing of queries and docu-    retrieval precision and
               ments. Content-based re-     recall. Direct access to
               trieval of text, imagery,    media, easing navigational
               audio, video.                burden of user. Reduction
                                            of search time.
Extraction     Segmentation, object and     Direct access to informa-
               event identification, and    tion or knowledge ele-
               extraction from multimedia   ments, including specific
               sources (text, audio,        types that might be user
               video).                      preferred. Reuse of media
                                            elements enabling user-
                                            tailored selection or
                                            presentations.
Question       Question analysis, res-      Overcome time, memory, or
Answering      ponse discovery and gene-    attention limitations re-
               ration from heterogeneous    quired to sift through
               sources (e.g., multilin-     many returned web pages
               gual; multimedia; unstruc-   from a traditional search
               tured, structured,           by providing direct ans-
               semistructured)              wers to questions.
Translation    Rapid creation of trans-     Cross media-mode informa-
               lingual corpora. Effective   tion and knowledge access
               translingual retrieval,      enabling broader access to
               summarization, and trans-    global information sources
               lation. Access verbaliza-    using methods such as
               tion of graphics, visuali-   translingual information
               zation of text.              retrieval.
Dialogue       Mixed-initiative natural     Ability to tailor flow and
Management     interaction that deals       control of interactions
               robustly with context        and facilitate interac-
               shift, interruptions,        tions. Includes error de-
               feedback, and shift of       tection and correction
               locus of control.            tailored to individual
                                            physical, perceptual, and
                                            cognitive differences.
                                            Motivational and engaging
                                            lifelike agents.
Agent or       Unobtrusive learning; re-    Enables tracking of user
User Mode-     presentation; and use of     characteristics, skills,
ling           characteristics, beliefs,    and goals to enhance in-
               goals, and plans of agents   teraction as well as dis-
               (including the user).        covery of experts by other
                                            users or agents.
Summari-       Scaleability, cross-         Increasing speed of re-
zation         linguality, multimedia       viewing materials. Multi-
               summarization.               media summarization,
                                            cross-lingual sumariza-
                                            tion, large multidocument
                                            summarization.
Presentation   Automated generation of      Mixed media (e.g., text,
Generation     coordinated speech, na-      graphics, video, speech,
               tural language, gesture,     and nonspeech audio) and
               animation, nonspeech         mode (e.g., linguistic,
               audio, generation, pos-      visual, auditory) displays
               sibly delivered via in-      tailored to the user and
               teractive, animated life-    context. Agents engaging
               like agents (includes        and motivating to younge
               challenges of media se-      or less experienced users.
               lection, allocation, coor-
               dination, and realization)
Awareness and  Topic detection and trac-    Enhance awareness of new
Collaboration  king, place-based asyn-      knowledge, as well as
               chronous and synchronous     other user's interests and
               collaboration environ-       expertise, and the ability
               ments.                       of experts to exchange or
                                            integrate knowledge.


Notes

(1.) www.elsnet.org/acl2001-hlt+km.html.

Mark Maybury is executive director of MITRE's Information Technology Division in Bedford, Massachusetts. He is a member of the board of directors of the Object Management Group, secretary-treasurer of the Association of Computing Machinery SIGART, and a member of the Intelligent User Interfaces Steering Council. Maybury has published over 50 technical and tutorial articles and is editor of a number of books. Maybury received a doctorate in AI from Cambridge University in England in 1991. Maybury is an international adviser to the German Ministry for Education and Research.
COPYRIGHT 2002 American Association for Artificial Intelligence
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2002 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Maybury, Mark T.
Publication:AI Magazine
Date:Jun 22, 2002
Words:1737
Previous Article:Autonomous mental development: Workshop on Development and Learning (WDL). (Workshop Reports).
Next Article:Similarity and Categorization: a review.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters