A survey of metadata research for organizing the web.ABSTRACT THIS ARTICLE ATTEMPTS TO PROVIDE an overview of the key metadata research issues and the current projects and initiatives that are investigating methods and developing technologies aimed at improving our ability to discover, access, retrieve, and assimilate information on the Internet through the use of metadata. 1. INTRODUCTION The rapid expansion of the Internet has led to a demand for systems and tools that can satisfy the more sophisticated requirements for storing, managing, searching, accessing, retrieving, sharing, and tracking complex resources of many different formats and media types. Metadata is the value-added information that documents the administrative, descriptive, preservation, technical, and usage history and characteristics associated with resources. It provides the underlying foundation upon which digital asset management systems rely to provide fast, precise access to relevant resources across networks and between organizations. The metadata required to describe the highly heterogeneous, mixed-media objects on the Internet is infinitely more complex than simple metadata for resource discovery of textual documents through a library database. The problems and costs associated with generating and exploiting such metadata are correspondingly magnified. Metadata standards, such as Dublin Core A set of meta-data descriptions about resources on the Internet. Used for resource discovery, it contains data elements such as title, creator, subject, description, date, type, format and so on. Dublin Core descriptions are often included in HTML meta tags. , provide a limited level of interoperability between systems and organizations to enable simple resource discovery. But, there are still many problems and issues that remain to be solved. Cory Doctorow (2001) believes that the vision of an Internet in which everyone describes their goods, services, or information using concise, accurate, and common or standardized metadata that is universally understood by both machines and humans is a "pipe-dream, founded on self-delusion, herd hubris Hubris An arrogance due to excessive pride and an insolence toward others. A classic character flaw of a trader or investor. and hysterically inflated market opportunities." Other people cite the popularity and efficiency of Google as an example of an extremely successful search engine that does not depend on expensive and unreliable metadata. Google combines PageRanking (in which the relative importance of a document is measured by the member of links to it) with sophisticated text-matching techniques to retrieve precise, relevant, and comprehensive search results (Brin & Page, 1998). Some of the major disadvantages of metadata are cost, unreliability, subjectivity, lack of authentication, and lack of interoperability with respect to syntax, semantics, vocabularies, languages, and underlying models. However, there are many researchers currently investigating strategies to overcome different aspects of these limitations in an effort to provide more efficient means of organizing content on the Internet. Other researchers are investigating metadata to describe the new types of real-time streaming content being generated by emerging broadband and wireless applications to enable both push and pull of this content based on users' needs. The goal of this article is to provide an overview of some of the key metadata research underway that is expected to improve our ability to search, discover, retrieve, and assimilate relevant information on the Internet regardless of the domain or format. 2. THE KEY RESEARCH AREAS In this section I have identified what I consider to be some of the key metadata research areas, both now and over the next few years. The following subsections provide a brief description of the work being undertaken and some key citations for each of the research areas summarized in the list below: * Extensible Markup Language See XML. (language, text) Extensible Markup Language - (XML) An initiative from the W3C defining an "extremely simple" dialect of SGML suitable for use on the World-Wide Web. http://w3.org/XML/. (XML XML in full Extensible Markup Language. Markup language developed to be a simplified and more structural version of SGML. It incorporates features of HTML (e.g., hypertext linking), but is designed to overcome some of HTML's limitations. )--XML and its associated technologies--XML Namespaces, XML Query languages, and XML Databases--are enabling implementers to develop metadata application profiles (XML Schemas This is a list of XML schemas in use on the Internet sorted by purpose. XML schemas can be used to create XML documents for a wide range of purposes such as syndication, general exchange, and storage of data in a standard format. Bookmarks
* Semantic Web A collaboration of the World Wide Web Consortium (W3C) and others to provide a standard for defining data on the Web. The Semantic Web uses XML tags that conform to Resource Description Framework and Web Ontology Language formats (see RDF and OWL). technologies--"The Semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation" (Berners-Lee, Hendler, & Lassila, 2001). There are two main building blocks for the semantic Web: ** Formal languages--RDF (Resource Description Framework (World-Wide Web, specification, data) Resource Description Framework - (RDF) A specification being developed in 2000 by the W3C as a foundation for processing meta-data regarding resources on the Internet, including the World-Wide Web. ), DAML+OIL, and OWL (Web Ontology Language The Web Ontology Language (OWL) is a language for defining and instantiating Web ontologies.[1] An OWL ontology may include descriptions of classes, along with their related properties and instances. ), which is being developed by the Web Ontology ontology: see metaphysics. ontology Theory of being as such. It was originally called “first philosophy” by Aristotle. In the 18th century Christian Wolff contrasted ontology, or general metaphysics, with special metaphysical theories Working Group of the W3C (World Wide Web Consortium, www.w3.org) An international industry consortium founded in 1994 by Tim Berners-Lee to develop standards for the Web. It is hosted in the U.S. by the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT (www.csail.mit.edu/index.php). . ** Ontologies--communities will use the formal languages to define both domain-specific ontologies and top-level ontologies to enable relationships between ontologies to be determined for cross-domain searching, exchange, and information integration. * Web Services--using open standards Specifications for hardware and software that are developed by a standards organization or a consortium involved in supporting a standard. Available to the public for developing compliant products, open standards imply "open systems;" that an existing component in a system can be replaced such as WSML WSML Web Services Meta Language , UDDI (Universal Description, Discovery and Integration) An industry initiative for a universal business registry (catalog) of Web services turned over to the stewardship of OASIS in 2002 as the version 3 specification of UDDI was released. , and SOAP, Web services (1) Loosely, any online service delivered over the Web. Such usage appears in articles from non-technical sources, but not in IT-oriented publications, because definition #2 below describes the correct use of the term. will enable the building of software applications without having to know who the users are, where they are, or anything else about them. * Metadata Harvesting--the Open Archives Initiative The Open Archives Initiative (OAI) is an attempt to build a "low-barrier interoperability framework" for archives (institutional repositories) containing digital content (digital libraries). It allows people (Service Providers) to harvest metadata (from Data Providers). (OAI (Open Application Interface) A computer to telephone interface that lets a computer control and customize PBX and ACD operations. ) provides a protocol for data providers to make their metadata and content accessible--enabling value-added search and retrieval services to be built on top of harvested metadata. * Multimedia metadata--there will be a further move away from textual resources to new multimedia formats that support better quality and higher compression ratios, e.g., images (JPEG-2000), video (MPEG-4), audio (MP3), 3D (VRML (Virtual Reality Modeling Language) A 3D graphics language used on the Web. After downloading a VRML page, its contents can be viewed, rotated and manipulated. Simulated rooms can be "walked into." The VRML viewer is launched from within the Web browser. , Web3D), multimedia (SMIL (Synchronized Multimedia Integration Language) Pronounced "smile." A format for delivering and synchronizing multimedia content on the Web. Introduced in the summer of 1998 by the W3C, it is a document type (DTD) of XML and provides the timing commands that , Shockwave Flash Shockwave Flash - Flash ), and interactive digital objects. All of these new media types will require complex fine-grained metadata, extracted automatically where possible. * Rights metadata--new emerging standards such as MPEG-21 and XrML are designed to enable automated copyright management and services. * Automatic metadata extraction--technologies to enable the automatic classification and segmentation of digital resources. In particular, automatic image processing image processing Set of computational techniques for analyzing, enhancing, compressing, and reconstructing images. Its main components are importing, in which an image is captured through scanning or digital photography; analysis and manipulation of the image, accomplished , speech recognition, and video-segmentation tools will enable content-based querying and retrieval of audiovisual content. * Search engines: ** Smarter agent-based search engines; ** Federated search Federated search is the simultaneous search of multiple online databases and is an emerging feature of automated, Web-based library and information retrieval systems. It is also often referred to as a portal, as opposed to simply a Web-based search engine. engines; ** Peer-to-peer search engines; ** Multimedia search engines; ** Multilingual search engines; ** New search interfaces--search interfaces that present results graphically; ** Automatic/dynamic aggregation and generation of search results into hypermedia hypermedia: see hypertext. The use of hyperlinks, regular text, graphics, audio and video to provide an interactive, multimedia presentation. All the various elements are linked, enabling the user to move from one to another. and multimedia presentations. * Personalization/customization--autonomous agents that push relevant information to the user based on user preferences that may be personally configured or learned by the system. * Broadband networks--multigigabit-capable networks for high-quality video-conferencing and visualization applications: ** Grid computing--distributed computing and communications infrastructures for data intensive computing applications; ** The Semantic Grid--the combination of semantic Web technologies with grid computing grid computing, the concurrent application of the processing and data storage resources of many computers in a network to a single problem. It also can be used for load balancing as well as high availability by employing multiple computers—typically personal to provide large scale data access and integration to the e-Science community. * Mobile and wireless technologies--delivery of information to mobile devices or appliances based on users' current context or location. * Authentication--technologies to ensure trust and record the provenance of metadata. * Annotation systems--enable users to attach their own subjective notes, opinions, and views to resources for others to access and read. * Preservation metadata--metadata to support long-term preservation strategies for all types of digital resources. 2.1 XML Technologies and Metadata XML and its associated technologies--XML Namespaces, XML Query languages, and XML Databases--are enabling implementers to develop metadata schemas, application profiles, large repositories of XML metadata, and search interfaces using XML Query Language. These technologies are key to enabling the automated computer-processing, integration, and exchange of information over the Internet. 2.1.1 Extensible Markup Language (XML). XML (W3C XML W3C XML World Wide Web Consortium Extensible Markup Language , 2003) is a simple, very flexible text format derived from SGML SGML in full Standard Generalized Markup Language Markup language for organizing and tagging elements of a document, including headings, paragraphs, tables, and graphics. (ISO (1) See ISO speed. (2) (International Organization for Standardization, Geneva, Switzerland, www.iso.ch) An organization that sets international standards, founded in 1946. The U.S. member body is ANSI. 8879). Originally designed to meet the challenges of large-scale electronic publishing An umbrella term for non-paper publishing, which includes publishing online or on media such as CDs and DVDs. , XML is playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere. Because XML makes it possible to exchange data in a standard format, independent of storage, it has become the de facto standard Hardware or software that is widely used, but not endorsed by a standards organization. Contrast with de jure standard. de facto standard - A widespread consensus on a particular product or protocol which has not been ratified by any official standards body, such as ISO, for representing metadata descriptions of resources on the Internet. 2.1.2 XML Schema The definition of an XML document, which includes the XML tags and their interrelationships. Residing within the document itself, an XML schema may be used to verify the integrity of the content. Language. XML Schema Language (W3C XML Schema An XML schema from the W3C. It is a superset of DTD, which is the standard SGML schema. Unlike DTD, W3C XML Schema is written in XML syntax, which although more verbose than DTD, can be created with any XML tools. , 2003) provides a means for defining the structure, content, and semantics of XML documents. It provides an inventory of XML markup constructs, which can constrain and document the meaning, usage, and relationships of the constituents of a class of XML documents: datatypes, elements and their content, attributes and their values, entities and their contents, and notations. Thus, the XML Schema Language can be used to define, describe, and catalog XML vocabularies for classes of XML documents, such as metadata descriptions of Web resources or digital objects. XML Schemas have been used to define metadata schemas for a number of specific domains or applications--such as METS METS Metropolitans (New York baseball team) METS Metadata Encoding and Transmission Standard MetS Metabolic Syndrome METS Metabolic Equivalents (multiples of resting oxygen uptake) (Library of Congress, 2003), MPEG-7 (Martinez, 2002), MPEG-21 (Bormans & Hill, 2002), and NewsML (IPTC IPTC International Press Telecommunications Council IPTC International Petroleum Technology Conference IPTC In-Prison Therapeutic Community IPTC Innovation and Productivity Tax Credit (Canada) , 2001). An additional major metadata development has been the employment of W3C's XML Schemas and XML Namespaces to combine metadata elements from different domains/namespaces into "application profiles" or metadata schemas that have been optimized for a particular application. For example, a particular community may want to combine elements of Dublin Core (DCMI DCMI Dublin Core Metadata Initiative (Online Computer Library Center) DCMI Disclosure of Classified Military Information , 2003), MPEC MPEC Multi-Purpose Events Center (Wichita Falls, TX) MPEC Minor Planet Electronic Circular (newsletter) MPEC Multipolar Electrocoagulation MPEC Mission Planning Enterprise Contract ,-7 (Martinez, 2002), and IMS (1) See IP Multimedia Subsystem. (2) (Information Management System) An early IBM hierarchical DBMS for IBM mainframes. IMS was widely implemented throughout the 1970s under MVS and continues to be used under z/OS. (IMS, 2003) to enable the resource discovery of audiovisual learning objects. 2.1.3 XML Query. The mission of the XML Query Working Group (W3C XML Query, 2003) is to provide flexible query facilities to extract data from real and virtual documents on the Web, thereby providing the needed interaction between the Web world and the database world. Ultimately, collections of XML files will be accessed like databases. The new query language, XQuery, is still evolving, but it will provide a functional language comprised of several kinds of expressions that can be nested or composed with full generality. A working draft version of XQuery and a list of current XQuery implementations is available at http://www.w3.org/XML/Query.html. 2.1.4 XML Databases. There is a large amount of research and development going on in the area of XML databases. Ronald Bourret provides an excellent overview of the current state of this work and a comparison of current XML database technologies (Bourret, 2003a; Bourret, 2003b). Bourret divides XML Database solutions into the following categories: * Middleware--software you call from your application to transfer data between XML documents and databases; * XML-enabled databases--databases with extensions for transferring data between XML documents and themselves; * Native XML databases--databases that store XML in "native" form, generally as some variant of the DOM mapped to an underlying data store. This includes the category formerly known as persistent DOM (PDOM PDOM Persistent Document Object Model PDOM Problem Domain Objects Model PDOM Position Determination Operating Mode ) implementations; * XML servers--XML-aware J2EE (Java 2 Platform, Enterprise Edition) A platform from Sun for building distributed enterprise applications. J2EE services are performed in the middle tier between the user's machine and the enterprise's databases and legacy information systems. servers, Web application servers, integration engines, and custom servers. Some of these are used to build distributed applications while others are used simply to publish XML documents to the Web. Includes the category formerly known as XML application servers; * Content Management Systems (CMS (1) See content management system and color management system. (2) (Conversational Monitor System) Software that provides interactive communications for IBM's VM operating system. )--applications built on top of native XML databases and/or the file system for content/document management and which include features such as check-in/check-out, versioning, and editors; * XML query engines--standalone engines that can query XML documents; * XML data binding--products that can bind XML documents to objects. Some of these can also store/retrieve objects from the database. 2.1.5 Metadata Schema Registries. A number of groups have been tackling the issue of establishing registries of metadata schemas to enable the reuse and sharing of metadata vocabularies and to facilitate semantic interoperability Please [improve the article] or discuss this issue on the talk page. . In particular the CORES project (CORES, 2003), which builds on the work of SCHEMAS (SCHEMAS, 2002), is exploring the use of metadata schema registries in order to enable the reuse of existing schemas, vocabularies, and application profiles that have been "registered." 2.2 The Semantic Web and Interoperability According to according to prep. 1. As stated or indicated by; on the authority of: according to historians. 2. In keeping with: according to instructions. 3. Tim Berners-Lee (person) Tim Berners-Lee - The man who invented the World-Wide Web while working at the Center for European Particle Research (CERN). Now Director of the World-Wide Web Consortium. Tim Berners-Lee graduated from the Queen's College at Oxford University, England, 1976. , director of the World Wide Web Consortium (W3C), the Semantic Web is "an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.... The Semantic Web will bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users" (Berners-Lee, Hendler, & Lassila, 2001). But the Semantic Web has a long way to go before this dream is realized. The real power of the Semantic Web will be realized when programs and applications are created that collect Web content from diverse sources, process the information, and exchange the results with other programs. Two of the key technological building blocks for the Semantic Web are: * Formal languages for expressing semantics, such as the Resource Description Framework (RDF (Resource Description Framework) A recommendation from the W3C for creating meta-data structures that define data on the Web. RDF is designed to provide a method for classification of data on Web sites in order to improve searching and navigation (see Semantic Web). ), DAML+OIL, and OWL (Web Ontology Language), which have been/are being developed within the W3C's Semantic Web Activity (W3C Semantic Web Activity, 2002); and * The ontologies that are being constructed from such languages. 2.2.1 Formal Languages: RDF, DAML+OIL, OWL. The general consensus appears to be that while XML documents and schemas are ideal for defining the structural, formatting, and encoding constraints for a particular domain's metadata scheme, a different type of language is required for defining meaning or semantics. The Resource Description Framework (RDF) (W3C, RDF Syntax, & Model Recommendation, 1999; W3C RDF Vocabulary Description Language, 2003) uses triples to makes assertions that particular things (people, Web pages, or whatever) have properties (such as "is a sister of," "is the author of") with certain values (another person, another Web page). The triples of RDF form webs of information about related things. Because RDF uses URIs to encode this information in a document, the URIs ensure that concepts are not just words in a document but are tied to a unique definition that everyone can find on the Web. This work is being undertaken by the RDF Core Working Group of the W3C. The W3C Web Ontology Working Group (W3C Web Ontology, 2003) is building upon the RDF Core work to develop a language for defining structured Web-based ontologies that will provide richer integration and interoperability of data among descriptive communities. This is the Web Ontology Language (OWL) (W3C, OWL, 2003), which in turn is building upon the DAML+OIL (DAML+OIL, 2001) specification developed by DARPA DARPA: see Defense Advanced Research Projects Agency. (Defense Advanced Research Projects Agency) The name given to the U.S. Advanced Research Projects Agency during the 1980s. It was later renamed back to ARPA. . 2.2.2 Ontologies. An ontology consists of a set of concepts, axioms, and relationships that describes a domain of interest. An ontology is similar to a dictionary or glossary but with greater detail and structure and expressed in a formal language (e.g., OWL) that enables computers to process its content. Ontologies can enhance the functioning of the Web to improve the accuracy of Web searches and to relate the information in a resource to the associated knowledge structures and inference rules defined in the ontology. Upper ontologies provide a structure and a set of general concepts upon which domain-specific ontologies (e.g., medical, financial, engineering, sports, etc.) could be constructed. An upper ontology is limited to concepts that are abstract and generic enough to address a broad range of domain areas at a high level. Computers utilize upper ontologies for applications such as data interoperability, information search and retrieval, automated inferencing, and natural language processing Natural language processing Computer analysis and generation of natural language text. The goal is to enable natural languages, such as English, French, or Japanese, to serve either as the medium through which users interact with computer systems such as . A number of research and standards groups are working on the development of common conceptual models (or upper ontologies) to facilitate interoperability between metadata vocabularies and the integration of information from different domains. The Harmony project developed the ABC ABC in full American Broadcasting Co. Major U.S. television network. It began when the expanding national radio network NBC split into the separate Red and Blue networks in 1928. Ontology/Model (Lagoze & Hunter, 2001)--a top-level ontology to facilitate interoperability between metadata schemas within the digital library domain. The CIDOC CIDOC Centro Intercultural de Documentación (Center for Intercultural Documentation; Cuernavaca, Mexico) CRM (Customer Relationship Management) An integrated information system that is used to plan, schedule and control the presales and postsales activities in an organization. (CIDOC CRM, 2003) has been developed to facilitate information exchange in the cultural heritage and museum community. The Standard Upper Ontology Standard upper ontology (SUO) is a IEEE P1600.1 term for a near-universal upper ontology (or foundation ontology). The following ontologies are now competing to be used as the foundation for standard:
SUO Small Unit Operations SUO Senior Under Officer (British military) SUO Southern Oregon University (Ashland, Oregon) SUO Statement Under Oath , 2002) is being developed by the IEEE (Institute of Electrical and Electronics Engineers, New York, www.ieee.org) A membership organization that includes engineers, scientists and students in electronics and allied fields. SUO Working Group. Many communities are developing domain-specific or application-specific ontologies. Some examples include biomedical bi·o·med·i·cal adj. 1. Of or relating to biomedicine. 2. Of, relating to, or involving biological, medical, and physical sciences. ontologies such as OpenGALEN (OpenGALEN, 2002) and SNOMED CT SNOMED (Systematized Nomenclature of Medicine), is a systematically organised computer processable collection of medical terminology covering most areas of clinical information such as diseases, findings, procedures, microorganisms, pharmaceuticals etc. (SNOMED CT, 2003), financial, and sporting ontologies such as the soccer, baseball, or running ontologies in the DAML DAML DARPA Agent Markup Language DAML Digital Added Main Line DAML Directory Access Markup Language Ontology Library (DAML Ontology Library, 2003). A large number of research efforts are focusing on the development of tools for building and editing ontologies (Denny, 2002)--these are moving towards collaborative tools such as OntoEdit (Sure et al., 2002) and built-in support for RuleML to enable the specification of inferencing rules. 2.2.4 Topic Maps Topic Maps is an ISO standard for the representation and interchange of knowledge, with an emphasis on the findability of information. The standard is formally known as ISO/IEC 13250:2003. . Topic Maps (Topic Maps, 2000) is a new ISO standard for a system describing knowledge structures and associating them with information resources (1) The data and information assets of an organization, department or unit. See data administration. (2) Another name for the Information Systems (IS) or Information Technology (IT) department. See IT. . They provide powerful ways of navigating large and interconnected corpora corpora plural form of corpus. corpora albicantia see corpus albicans. corpora arenacea sandy or gritty bodies, found in the pineal body; appear to be of glial or stromal origin; have the structure of . Instead of replicating the features of a book index, the topic map generalizes them, extending them in many directions at once. The difference between Topic Maps and RDF is that Topic Maps are centered on topics while RDF is centered on resources. RDF annotates the resources directly whilst topic maps create a "virtual map" above the resources, leaving them unchanged. 2.2.5 Ontology Storage and Querying. A number of research groups are currently working on the development of inferencing tools and deductive de·duc·tive adj. 1. Of or based on deduction. 2. Involving or using deduction in reasoning. de·duc query engines to enable the deduction of new information or knowledge from assertions or metadata and ontologies expressed in formal ontology Please help recruit one or [ improve this article] yourself. See the talk page for details. languages (RDE RDE Remote Data Entry RDE Rotating Disk Electrode RDE Research Development and Extension RDE Right Defensive End (pro football) RDE Rule Developing Experimentation (from the book Selling Blue Elephants) DAML+OIL, or OWL). A technical report on "Ontology Storage and Querying," published recently by ICS (1) (Internet Connection Sharing) A Windows feature that enables two or more computers to share one Internet connection. First introduced in Windows 98 Second Edition, sharing is accomplished with network address translation (NAT), which is the common method. FORTH in Crete, provides a very good survey of the current state of ontology storage and querying tools (Magkanaraki et al., 2002). 2.3 Web Services Web services (W3C Web Services Activity, 2003) are a relatively new concept, expected to evolve rapidly over the next few years. They could be the first major practical manifestation of Semantic Web-based thinking. Detailed definitions vary, but Web services will enable the building of software applications without having to know who the users are, where they are, or anything else about them. In the next few years, Web services may be developed that can be understood and used automatically by the computing devices of users and of public libraries. External Application Services See ASP and Web services. Providers (ASPs) may also provide such services. Web services are based on open, Internet standards See Internet Engineering Task Force. . The core standards and protocols for Web services are being developed and are expected to be finalized by 2003. They include (in addition to XML): * Web Services Description Language “WSDL” redirects here. For other uses, see WSDL (disambiguation). The Web Services Description Language (WSDL, pronounced 'wiz-dəl' or spelled out, 'W-S-D-L') is an XML-based language that provides a model for describing Web services. (WSDL (Web Services Description Language) An XML-based language for defining Web services. Developed by Microsoft and IBM, WSDL describes the protocols and formats used by the service. ) (WSDL, 2003), which enables a common description of Web Services; * Universal Description, Discovery, and Integration (standard, protocol) Universal Description, Discovery, and Integration - (UDDI) The service discovery protocol for Web Services through which companies can find one another to conduct business. This standard was unveiled by Ariba, IBM, Microsoft, and 33 other companies in September 2000. (UDDI) (OASIS, 2003) registries, which expose information about a business or other entity and its technical interfaces; * Simple Object Access Protocol (protocol) Simple Object Access Protocol - (SOAP) A minimal set of conventions for invoking code using XML over HTTP. DevelopMentor, Microsoft Corporation, and UserLand Software submitted SOAP to the IETF as an internal draft in December 1999. Latest version: SOAP 1. (SOAP)/XML Protocol (W3C XML Protocol Working Group, 2003), which enables structured message exchange between computer programs. The concept of Web services is currently being developed under the banner of e-commerce. However, there do appear to be potential applications for public sector service providers. For example, search interfaces could be accessed or provided as Web services by public libraries or by Application Service Providers (ASPs) on their behalf. 2.4 Metadata Harvesting--The Open Archives Initiative (OAI) The Open Archives Initiative (OAI) (OAI, 2003) is a community that has defined an interoperability framework, the Open Archives Initiative Protocol for Metadata Harvesting OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) is a protocol developed by the Open Archives Initiative. It is used to harvest (or collect) the metadata descriptions of the records in an archive so that services can be built using metadata from many (OAI-PMH OAI-PMH Open Archives Initiative Protocol for Metadata Harvesting ), to facilitate the sharing of metadata. Using this protocol, data providers are able to make metadata about their collections available for harvesting through an HTTP-based protocol. Service providers then use this metadata to create value-added services. OAI-PMH Version 2.0 was released in February 2003 (OAI-PMH, 2003). To facilitate interoperability, data providers are required to supply metadata that complies to a common schema, the unqualified Dublin Core Metadata Element Set. Additional schemas are also allowed and are distinguished through the use of a metadata prefix. Although originating in the E-Print community, OAI data providers now include a number of multimedia collections such as the Library of Congress American Memory American Memory is an Internet-based archive for public domain image resources, as well as audio, video, and archived Web content. It is published by the Library of Congress. The archive came into existence on October 13, 1994 after $13,000,000 was raised in donations. collection (Library of Congress, 2002), OpenVideo (OpenVideo, 2002), and University of Illinois University of Illinois may refer to:
To date, OAI service providers have mostly developed simple search and retrieval services (OAI Registered Service Providers, 2002). These include Arc, citebaseSearch, and my.OAI. Scirius searches and retrieves specifically scientific data--from the Web, proprietary databases, and Open Archives. One of the more interesting services is DP9, a gateway service that allows traditional Web search engines A Web site that maintains an index and short summaries of billions of pages on the Web, Google being the world's largest. Most search engine sites are free and paid for by advertising banners, while others charge for the service. (e.g., Google) to index otherwise hidden information from OAI archives. The DSTC's MAENAD mae·nad n. 1. Greek Mythology A woman member of the orgiastic cult of Dionysus. 2. A frenzied woman. [Latin Maenas, Maenad-, from Greek mainas, project developed a search, retrieval, and presentation system for OAI that searches for and retrieves mixed-media resources on a particular topic, determines the semantic relationships between the retrieved objects, and combines them into a coherent multimedia presentation, based on their relationships to each other (Little, Guerts, & Hunter, 2002). 2.5 Multimedia Metadata Audiovisual resources in the form of still pictures, graphics, 3D models, audio, speech, and video will play an increasingly pervasive role in our lives and, because of the complex information-rich nature of such content, value-added services such as analysis, interpretation, and metadata creation become much more difficult, subjective, time consuming, and expensive. Audiovisual content requires some level of computational interpretation and processing in order to generate metadata of useful granularity efficiently. Standardized multimedia metadata representations that will allow some degree of machine interpretation will be necessary. The MPEG-7 and MPEG-21 standards have been developed to support such requirements. 2.5.1 MPEG-7 Multimedia Content Description Interface. MPEG-7 (Martinez, 2002), the "Multimedia Content Description Interface," is an ISO/IEC ISO/IEC International Organization for Standardization/International Electrotechnical Commission (ITU-T M 3000) standard for describing multimedia content, developed by the Moving Pictures Expert Group (MPEG (Moving Pictures Experts Group) An ISO/ITU standard for compressing digital video. Pronounced "em-peg," it is the universal standard for digital terrestrial, cable and satellite TV, DVDs and digital video recorders (DVRs). ). The goal of this standard is to provide a rich set of standardized tools to enable both humans and machines to generate and understand audiovisual descriptions that can be used to enable fast, efficient retrieval from digital archives (pull applications) as well as filtering of streamed audiovisual broadcasts on the Internet (push applications). MPEG-7 can describe audiovisual information regardless of storage, coding, display, transmission, medium, or technology. It addresses a wide variety of media types including still pictures, graphics, 3D models, audio, speech, video, and combinations of these (e.g., multimedia presentations). The MPEG-7 specification provides: * A core set of Descriptors (Ds) that can be used to describe the various features of multimedia content; * Predefined structures of Descriptors and their relationships, called Description Schemes (DSs). MPEG-7 Multimedia Description Schemes enable descriptions of multimedia content, including: * Information describing the creation and production processes of the content (director, title, short feature movie); * Information related to the usage of the content (copyright pointers, usage history, broadcast schedule); * Media information on the storage features of the content (storage format, encoding); * Structural information on spatial, temporal, or spatio-temporal components of the content (scene cuts, segmentation in regions, region motion tracking); * Information about low-level features in the content (colors, textures, sound timbres, melody description); * Conceptual, semantic information of the reality captured by the content (objects and events, interactions among objects); * Information about how to browse the content in an efficient way (summaries, views, variations, spatial and frequency sub-bands); * Organization information about collections of objects and models that allow multimedia content to be characterized on the basis of probabilities, statistics, and examples; * Information about the interaction of the user with the content (user preferences, usage history). Until now research in this area has primarily focused on developing efficient, low-level, digital signal processing See DSP. Digital Signal Processing - (DSP) Computer manipulation of analog signals (commonly sound or image) which have been converted to digital form (sampled). methods to extract values for image, video, and audio Descriptors such as color, shape, texture, motion, volume, and phonemes. Algorithms have been developed to automatically segment video into scenes and shots for faster browsing and retrieval or to automatically transcribe To copy data from one medium to another; for example, from one source document to another, or from a source document to the computer. It often implies a change of format or codes. speech and video content. Multimedia metadata research is now focusing on how to automatically generate semantic descriptions of multimedia (machine recognition of objects and events) from combinations of low-level descriptors such as color, texture, and shape and audio descriptors to enable natural language querying and higher-level knowledge extraction. Additional research efforts are investigating how to combine ontologies for specific domains, e.g., sports, medical, bio-informatics, and nanotechnology with MPEG-7 to describe multimedia content in terms relevant to the particular domain or to relate and integrate multimedia information from across domains or disciplines. 2.5.2 MPEG-21--Multimedia Framework. The goal of MPEG's latest initiative, MPEG-21 (ISO/IEC 18034-1) (Bormans & Hill, 2002), the Multimedia Framework, is to define the technology needed to support Users to exchange, access, consume, trade, and otherwise manipulate multimedia Digital Items in an efficient, transparent, and interoperable way. Users may be content creators, producers, distributors, service providers, or consumers. They include individuals, communities, organizations, corporations, consortia, governments, and other standards bodies Following are some of the standards bodies defined in this database. For Windows users of CDE, look up Lessons/Review/Associations. For Web users of CDE's online HTML version, review the Lessons list at the bottom of the definition. Organization Covers ANSI U.S. and initiatives around the world. The fundamental unit of content is called the Digital Item, and it could be anything from a textual document or a simple Web page to a video collection or a music album. At its most basic level, MPEG-21 provides a framework in which one User interacts with another User and the object of that interaction is a Digital Item commonly called content. Some such interactions are creating content, providing content, archiving content, rating content, enhancing and delivering content, aggregating content, delivering content, syndicating content, retail selling of content, consuming content, subscribing to content, regulating content, facilitating transactions that occur from any of the above, and regulating transactions that occur from any of the above. The current MPEG-21 Work Plan consists of nine parts: * Part 1: Vision, Technologies, and Strategies--a technical report that describes MPEG-21's architectural elements together with the functional requirements See information requirements and functional specification. (specification) functional requirements - What a system should be able to do, the functions it should perform. for their specification; * Part 2--Digital Item Declaration--a flexible model for precisely defining the scope and components of a Digital Item; * Part 3--Digital Item Identification--a specification for uniquely identifying Digital Items and their components; * Part 4--Intellectual Property Management and Protection (IPMP IPMP Investigations into Polymer Membrane Processing (Space Shuttle) IPMP Intellectual Property Management & Protection (MPEG-4) IPMP Institute for Participatory Management and Planning )--to provide interoperability between IPMP tools, such as MPEG-4's IPMP hooks; * Part 5--Rights Expression Language--a machine-readable language that can declare rights and permissions using the terms as defined in the Rights Data Dictionary A database about data and databases. It holds the name, type, range of values, source, and authorization for access for each data element in the organization's files and databases. (XrML); * Part 6--Rights Data Dictionary--definitions of terms to support Part 5; * Part 7--Digital Item Adaptation--adaptation may be based on user, terminal, network and environmental characteristics, resource adaptability, or session mobility; * Part 8--Reference Software--used to test conformance with requirements and the standard's specifications; * Part 9--File Format--this is expected to inherit many MPEG-4 concepts, since it will need to be able to encapsulate en·cap·su·late v. 1. To form a capsule or sheath around. 2. To become encapsulated. en·cap digital item information, still and dynamic media, metadata, and layout data in both textual and binary forms. Future work plans for MPEG-21 include developing functional requirements and solutions to the persistent association of identification and description with Digital Items; scalable, error-resilient content representation; and the accurate recording of all events. 2.6 Rights Metadata The Internet has been characterized as the largest threat to copyright since its inception. Copyrighted works on the Internet include news stories, software, novels, screenplays, graphics, pictures, usenet messages, and even e-mail. The reality is that almost everything on the Internet is protected by copyright law. This can pose problems for both hapless surfers as well as the copyright owners. A number of XML-based vocabularies have been developed to define the usage and access rights associated with digital resources--XrML (XrML, 2003), developed by ContentGuard, and ODRL ODRL Open Digital Rights Language (ODRL, 2003), developed by IPR IPR Intellectual Property Rights IPR Inprocess/Inprogress Review IPR Industrial Property Rights IPR Institute for Policy Research (Northwestern University and University of Cincinnati) IPR Institute of Public Relations Systems are the two major contenders. XrML has been adopted by MPEG-21 as its Rights Expression Language, and ODRL was recently selected by the Open Mobile Alliance as its rights language for mobile content. In addition there are a number of researchers investigating the development of well-defined, underlying, interoperable data models for rights management that is necessary for facilitating interoperability and the integration of information (indecs Framework, 2000; Delgado et al., 2002). Project RoMEO (Rights MEtadata for Open archiving) (RoMEO, 2003) is investigating the rights issues surrounding the "self-archiving" of research in the U.K. academic community under the Open Archive Initiative's Protocol for Metadata Harvesting. Academic and self-publishing authors who make their works available through Open Archives are more concerned with issues such as plagiarism Using ideas, plots, text and other intellectual property developed by someone else while claiming it is your original work. , corruption, or misuse of the text than financial returns to the author or publisher. The "Indigenous Collections Management Project" being undertaken by Distributed Systems Technology Centre The Distributed Systems Technology Centre (DSTC) was a leading research organization in the field of Information Technology in Australia. It conducted applied research focusing on a number of application domains, such as government, defence and health care [1]. (DSTC DSTC Dynamic Stability and Traction Control (Volvo) DSTC Distance DSTC Distributed Systems Technology Centre (Australian government/industry consortium) DSTC J. F. ), University of Queensland The University of Queensland (UQ) is the longest-established university in the state of Queensland, Australia, a member of Australia's Group of Eight, and the Sandstone Universities. It is also a founding member of the international Universitas 21 organisation. , in collaboration with the Smithsonian's National Museum of the American Indian National Museum of the American Indian, institution devoted to the collection, preservation, and presentation of the culture of the indigenous populations of the Western Hemisphere, a division of the Smithsonian Institution. , has also been investigating metadata for the rights management and protection of traditional knowledge belonging to indigenous communities, in accordance with customary laws regarding access (Hunter, 2002; Hunter, Koopman, & Sledge, 2003). 2.7 Automatic Metadata Extraction Because of the high cost and subjectivity associated with human-generated metadata, a large number of research initiatives are focusing on technologies to enable the automatic classification and segmentation of digital resources--i.e., computer-generated metadata for textual documents, images, audio, and video resources. 2.7.1 Automatic Document Indexing/Classification. Automatic-categorization software (Reamy, 2002) uses a wide variety of techniques to assign documents into subject categories. Techniques include statistical Bayesian analysis Bayesian analysis A decision-making analysis that '…permits the calculation of the probability that one treatment is superior based on the observed data and prior beliefs…subjectivity of beliefs is not a liability, but rather explicitly allows of the patterns of words in the document; clustering of sets of documents based on similarities; advanced vector machines that represent every word and its frequency with a vector; neural networks; sophisticated linguistic inferences; the use of preexisting pre·ex·ist or pre-ex·ist v. pre·ex·ist·ed, pre·ex·ist·ing, pre·ex·ists v.tr. To exist before (something); precede: Dinosaurs preexisted humans. v.intr. sets of categories; and seeding categories with keywords. The most common method used by autocategorization software is to scan every word in a document and analyze the frequencies of patterns of words and, based on a comparison with an existing taxonomy, assign the document to a particular category in the taxonomy. Other approaches use "clustering" or "taxonomy building" in which the software is pointed at a collection of documents (e.g., 10,000-100,000) and it searches through all the combinations of words to find clumps or clusters of documents that appear to belong together. Some systems are capable of automatically generating a summary of a document by scanning through the document and finding important sentences using rules like the first sentence of the first paragraph is often important. Another common feature of autocategorization is noun phrase noun phrase n. Abbr. NP A phrase whose head is a noun, as our favorite restaurant. Noun 1. noun phrase - a phrase that can function as the subject or object of a verb nominal, nominal phrase extraction--the extracted list of noun phrases can be used to generate a catalog of entities covered by the collection. Autocategorization cannot completely replace a librarian or information architect, although it can make them more productive, save them time, and produce a better end-product. The software itself, without some human rules-based categorization, cannot currently achieve more than about 90 percent accuracy. While it is much faster than a human categorizer, it is still not as good as a human. 2.7.2 Image Indexing. Image retrieval An image retrieval system is a computer system for browsing, searching and retrieving images from a large database of digital images. Most traditional and common methods of image retrieval utilize some method of adding metadata such as captioning, keywords, or descriptions to the research has moved on from the IBM (International Business Machines Corporation, Armonk, NY, www.ibm.com) The world's largest computer company. IBM's product lines include the S/390 mainframes (zSeries), AS/400 midrange business systems (iSeries), RS/6000 workstations and servers (pSeries), Intel-based servers (xSeries) QBIC QBIC Query By Image Content QBIC queries based on image content QBIC Cubic Format (query by image content) system (QBIC, 2001), which uses colors, textures, and shapes to search for images. New research is focusing on semantics-sensitive matching (DCSE DCSE Division of Child Support Enforcement DCSE Dell Certified Systems Expert DCSE Dedicated Comparative Sequence Editor DCSE Direct Consumer Sales & Electronics (Australia) , 2003; Barnard, 2003) and automatic linguistic indexing (Wang & Li, 2003), in which the system is capable of recognizing real-world objects or concepts. 2.7.3 Speech Indexing and Retrieval. Speech recognition is increasingly being applied to the indexing and retrieval of digitized speech archives. Dragon Systems Dragon Systems, Inc., was the company that created DragonDictate and Dragon NaturallySpeaking. It was founded in 1982 by Drs. James and Janet Baker and bought by Lernout & Hauspie in 2000. (Dragon Systems, 2003) has developed a system that creates a keyword index of spoken words from within volumes of recorded audio, eliminating the need to listen for hours to pinpoint information. Speech recognition systems can generate searchable text that is indexed to time code on the recorded media, so users can both call up text and jump right to the audio clip containing the keyword. Normally, running a speech recognizer on audio recordings doesn't produce a highly accurate transcript because speech-recognition systems have difficulty if they haven't been trained for a particular speaker or if the speech is continuous. However, the latest speech recognition systems will work even in noisy environments, are speaker-independent, work on continuous speech, and are able to separate two speakers talking at once. Dragon is also working on its own database for storing and retrieving audio indexes. 2.7.4 Natural Language and Spoken Language Querying. Dragon has also developed systems that allow users to retrieve information from databases using natural language queries. Such systems are expected to become more commonplace in the future (Oard, 2003). 2.7.5 Video Indexing and Retrieval. Commercial systems such as Virage (Virage, 2003), Convera (Convera Screening Room, 2003), and Artesia (Artesia, 2003) are capable of parsing See parse. parsing - parser hours of video, segmenting it, and turning it into an easily searchable and browsable database. The latest video-indexing systems combine a number of indexing methods--embedded textual data, (SMPTE (Society of Motion Picture and Television Engineers, White Plains, NY, www.smpte.org) A professional society for motion picture and TV engineers with more than 9,000 members worldwide. It prepares standards and documentation for TV production. timecode, lineup files, and closed captions), scene change detection, visual clues, and continuous-speech recognition to convert spoken words into text. For example, CMU's Informedia project (Informedia, 2003) combines text, speech, image, and video recognition techniques to segment and index video archives and enable intelligent search and retrieval. The system can automatically analyze videos and extract named entities from transcripts, which can be used to produce time and location metadata. This metadata can then be used to explore archives dynamically using temporal and spatial graphical user interfaces, e.g., mapping interfaces or date sliders sliders a species of tortoise kept as pets. They have a black shell and a red stripe behind the eye. Called also Chrysemys scripta elegans, red-eared sliders. . For example--"give me all video content on air crashes in South America South America, fourth largest continent (1991 est. pop. 299,150,000), c.6,880,000 sq mi (17,819,000 sq km), the southern of the two continents of the Western Hemisphere. in early 2000" (Ng et al., 2003). Current research in this field is concentrating on the difficult problem of extracting metadata in real-time from streaming video A one-way video transmission over a data network. It is widely used on the Web as well as company networks to play video clips and video broadcasts. Computers in home networks stream video to digital media hubs connected to a home theater. content, rather than during a postprocessing step. 2.8 Search Engine Research and Development 2.8.1 Smarter Agent-based Search Engines. One of the major advances in search engines in the future will be in the use of "intelligent agents" and expert systems that apply artificial intelligence (AI), ontologies, and knowledge bases to enable all relevant information on a particular subject to be retrieved and integrated. Improved user interfaces will become available through the incorporation of expert systems into online catalog Similar to an online library or databases in the information storage respect, ‘’’online catalogs’’’ allow potential customers to browse a company’s items for sale from a different location using the internet. searching, i.e., "intelligent" sophisticated online systems that incorporate AI, knowledge bases, and ontologies. In the future librarians will use "intelligent agent kits" that will crawl over the Web retrieving relevant information and will analyze and interpret it to create a body of knowledge for a specific purpose. Periodic resampling will automatically keep it up-to-date. However, human intervention will still be needed to customize, supervise, and check the computer-generated results (Virginia Tech, 1997; Nardi & O'Day, 1998). 2.8.2 Federated Search Engines. Quite a large number of metadata research projects are focusing on the problems of federated searching across distributed, heterogeneous, networked digital libraries and the interoperability problems that need to be overcome (Goncalves et al., 2001; Liu et al., 2002). For example, the MetaLib project, at the University of East Anglia “UEA” redirects here. For other uses, see UEA (disambiguation). Academically, it is one of the most successful universities founded in the 1960s, consistently ranking amongst Britain's top higher education institutions; 19th in the Sunday Times University League Table 2006 , implements a single integrated environment and cross-searching portal for managing and searching electronic resources, whether these be abstracting and indexing databases, full-text e-journal services, CD-ROMs, library catalogs, information gateways, or local collections (Lewis, 2002). 2.8.3 Peer-to-Peer JXTA-based Search Engines. Peer-to-peer (P2P See peer-to-peer and point-to-point. ) search engines are based on the idea of decentralized de·cen·tral·ize v. de·cen·tral·ized, de·cen·tral·iz·ing, de·cen·tral·iz·es v.tr. 1. To distribute the administrative functions or powers of (a central authority) among several local authorities. metadata provided by networked peers rather than clients accessing centralized metadata repositories sitting on a server. Sam Joseph at the University of Tokyo “Todai” redirects here. For the restaurant called Todai, see Todai (restaurant). The University of Tokyo (東京大学 has written an excellent overview of Internet search engines based on decentralized metadata (Joseph, 2003). JXTA (JuXTApose) Pronounced "jux-tah," it is an open source protocol for peer-to-peer computing originally developed by Sun. Introduced in 2001, it enables data to be shared between clients as well as servers on the Internet. (short for Jxtapose) is a peer-to-peer interoperability framework created by Sun. It incorporates a number of protocols, but the most relevant to the idea of decentralized metadata is the Peer Discovery Protocol (PDP (1) (Plasma Display Panel) See plasma display. (2) (Policy Decision Point) See COPS and XACML. (3) (Programmed Data P ). PDP allows a peer to advertise its own resources and discover the resources from other peers. Every peer resource is described and published using an advertisement, which is an XML document that describes a network resource. JXTASearch operates over the lower-level JXTA protocols (JXTA, 2003). Edutella (Edutella, 2002) is an RDF-based Metadam Infrastructure for P2P Applications based on JXTA. The first application developed by Edutella focuses a P2P network for the exchange of educational resources between German universities (including Hannover, Braunschweig, and Karlsruhe), Swedish universities (including Stockholm and Uppsala), Stanford University Stanford University, at Stanford, Calif.; coeducational; chartered 1885, opened 1891 as Leland Stanford Junior Univ. (still the legal name). The original campus was designed by Frederick Law Olmsted. David Starr Jordan was its first president. , and others. 2.8.4 Multimedia Search Engines. More and more search engines are becoming multimedia-capable--even allowing users to specify media types (images, video, or audio) and formats (e.g.,JPEG JPEG in full Joint Photographic Experts Group Standard computer file format for storing graphic images in a compressed form for general use. JPEG images are compressed using a mathematical algorithm. , MP3, SMIL). Examples include the FAST Multimedia Search Engine (FAST, 2000), Alta Vista See AltaVista. (World-Wide Web) Alta Vista - A World-Wide Web site provided by Digital which features a very fast Web and Usenet search engine. As of April 1996 its word index is 33GB in size. (AltaVista, 2003), Google Image Search Google Image Search is a search service created by Google which allows users to search the Web for image content. The feature was originally announced in December 2001. The keywords for the image search are based on the filename of the image, the link text pointing to the image, (Google, 2003), Singingfish Multimedia Search (SingingFish, 2002), Friskit Music Streaming Media See streaming audio, streaming video and digital media hub. Search (Friskit, 2002), and the Fossick fos·sick v. fos·sicked, fos·sick·ing, fos·sicks Australian v.intr. 1. To search for gold, especially by reworking washings or waste piles. 2. Online Multimedia and Digital Image Search (Fossick, 2003). 2.8.5 Cross-lingual Search Engines. In the future, universal translators will automatically translate a query in one particular language into any number of other languages and also translate the results into the original query language. There are a number of research projects and search engines focusing on cross-lingual search engines, e.g., SPIRIT-W3, a distributed cross-lingual indexing and search engine (Fluhr et al., 1997), and the TITAN Cross-Language Web search engine See Web search engines. (TITAN, 2003). 2.9 Graphical/Multimedia Presentation of Results 2.9.1 Graphical Presentation of Search Results. More search engines are going to present search results in more innovative graphical ways other than simple lists of URLs. Interfaces like Kartoo (Kartoo, 2000) and WebBrain (WebBrain, 2001) illustrate the relationships between retrieved digital resources graphically. Kartoo uses Flash to provide a graphical representation of the results. The results are displayed in a 2-3D map representing sites that match your query as nodes on the map, and relationships between nodes are represented as labeled arcs. WebBrain presents search results in a graphical browse interface that allows users to navigate through related topics. TouchGraph GoogleBrowser (TouchGraph, 2001) is a tool for visually browsing the Google database by exploring links between related sites. It uses Google's database to determine and display the linkages between a URL URL in full Uniform Resource Locator Address of a resource on the Internet. The resource can be any type of file stored on a server, such as a Web page, a text file, a graphics file, or an application program. that you enter and other pages on the Web. Results are displayed as a graph, showing both inbound and outbound relationships between URLs. "Friend of a Friend" or foaf (foaf, 2000) is an RDF vocabulary for describing the relationships between people, invented by Dan Brickley and Libby Miller of RDF Web. foafCORP (foafCORP, 2002) is an interesting semantic Web visualization of the interconnectedness of corporate America based on the loaf RDF vocabulary. It provides a simple graphical user interface to trace relationships between board members of major companies in the United States United States, officially United States of America, republic (2005 est. pop. 295,734,000), 3,539,227 sq mi (9,166,598 sq km), North America. The United States is the world's third largest country in population and the fourth largest country in area. . 2.9.2 Automatic Aggregation/Compilation Tools. The rapid growth in multimedia content on the Internet, the Internet, the, international computer network linking together thousands of individual networks at military and government agencies, educational institutions, nonprofit organizations, industrial and financial corporations of all sizes, and commercial enterprises standardization of machine-processable, semantically rich (RDF-based) content descriptions, and the ability to perform semantic inferencing have together led to the development of systems that can automatically retrieve and aggregate semantically related multimedia objects and generate intelligent multimedia presentations on a particular topic, i.e., knowledge-based authoring tools (Little et al., 2002; CWI CWI - Centrum voor Wiskunde en Informatica , 2000; Conlan et al., 2000; Andre, 2000). Automatic information aggregation tools that can dynamically generate hypermedia and multimedia learning objects will be extremely relevant to libraries in the future. Such tools will expedite the cost-effective creation of value-added learning objects and will also ensure that any relevant content only recently made available by content providers will be automatically incorporated in the dynamically generated learning objects. 2.10 Metadata for Personalization/Customization The individualization individualization, n the process of tailoring remedies or treatments to cure a set of symptoms in an indiv-idual instead of basing treatment on the common features of the disease. of information, based on users' needs, abilities, prior learning, interests, context, etc., is a major metadata-related research issue (Lynch, 2001a). The ability to push relevant, dynamically generated information to the user, based on user preferences, may be implemented * either by explicit user input of their preferences; * or learned by the system by tracking usage patterns and preferences and adapting the system and interfaces accordingly. The idea is that users can get what they want without having to ask. The technologies involved in recommender systems are information filtering, collaborated filtering, user profiling, machine learning, case-based retrieval, data mining, and similarity-based retrieval. User preferences typically include information such as the user's name, age, prior learning, learning style, topics of interest, language, subscriptions, device capabilities, media choice, rights broker, payment information, etc. Manually entering this information will produce better results than system-generated preferences, but it is time consuming and expensive. More advanced systems in the future will use automatic machine-learning techniques to determine users' interests and preferences dynamically rather than depending on user input. Some examples of "personalized current awareness news services" are Net2one (Net2one, 2003), MSNBC MSNBC Microsoft/National Broadcasting Company News Filters (MSNBC, 2003), and the eLib Newsagent newsagent Noun Brit a shopkeeper who sells newspapers and magazines Noun 1. newsagent - someone who sells newspapers newsdealer, newsstand operator, newsvendor project (eLib Newsagent, 2000). These services allow users to define their interests and then receive daily updated relevant reports. Filtering of Web radio and TV broadcasts will also be possible in the future, based on users' specifications of their interests and the embedding of standardized content descriptions, such as MPEG-7, within the video streams (Rogers et al., 2002). 2.11 Metadata for Broadband/Grid Applications The delivery and integration of information is shifting to wireless mobile devices and high-performance broadband networks You can assist by [ editing it] now. . To support research and development in advanced grid and networking services and applications, a number of broadband multigigabit advanced networks have been established throughout the world and made accessible to the research and higher education higher education Study beyond the level of secondary education. Institutions of higher education include not only colleges and universities but also professional schools in such fields as law, theology, medicine, business, music, and art. communities of these regions: * Internet2--U.S. broadband research network (Internet2, 2003); * GrangeNet--Australian broadband network (GrangeNet, 2003); * Canarie--Canadian broadband network (Canarie, 2002); * DANTE--European broadband research network (DANTE, 2003); * APAN--Asia Pacific Advanced Network (APAN APAN Asia Pacific Area Network APAN Acetone Peroxide Ammonium Nitrate (high-order explosive) APAN AEGIS Performance Assessment Network APAN Association of Professional Aerial Navigators , 2003). Related research projects are focusing on real-time, collaborative, distributed applications that require very high-quality video or high-speed access to large data sets for remote collaboration and visualization. Examples of applications include remote telemicroscopy, remote surgery, 3D visualization of large datasets (e.g., bio-informatics, astronomy data), collaborative editing Collaborative editing is the practice of groups producing works together through individual contributions. Most usually it is applied to textual documents or programmatic source code. of HDTV-quality digital video, and distributed real-time music and dance performances. 2.11.1 Grid Computing. Computational Grids enable the sharing, selection, and aggregation of a wide variety of geographically distributed computational resources (such as supercomputers, computer clusters, storage systems, data sources, instruments, people) and presents them as a single, unified resource for solving large-scale compute and data-intensive computing applications (e.g., molecular modeling for drug design, brain activity analysis, climate modeling, and high-energy physics) (Grid Computing, 2000). Wide-area distributed computing (1) The use of multiple computers networked throughout a wide geographical area, or the world via the Internet, in order to solve a single problem. See grid computing. (2) The use of multiple computers in an enterprise rather than one centralized system. , or "grid" technologies, provide the foundation to a number of large-scale efforts utilizing the global Internet to build distributed computing and communications infrastructures. A list of current grid initiatives and projects can be found at http://www.gridforum.org/L_Involved_Mktg/init.htm (GGE GGE gradient gel electrophoresis (HDL or LDL measurements) GGE Graduate Group in Ecology GGE Gallon Gas Equivalent GGE Grupo Gênese de Ensino (Brazilian school) GGE God's Green Earth 2003). 2.11.2 The Semantic Grid. This term refers to the underlying computer infrastructure needed to support scientists who want to generate, analyze, share, and discuss their results/data over broadband Grid networks--basically it is the combination of Semantic Web technologies with Grid computing for the scientific community (Semantic Grid, 2003). In particular, the combination of Semantic Web technologies with live information flows is highly relevant to grid computing and is an emerging research area for example, the multiplexing (embedding) of live metadata with multicast video streams raises the issue of Quality of Service (QoS) demands on the network. Archival and indexing tools for collaborative video conferences held through Access Grid Nodes are going to be in demand. In typical access grid installations, there are three displays with multiple views. There is a live exchange of information. Events such as remote camera control and slide transitions could be used to segment and index the meetings for later search and browsing. Notes and annotations taken during the meeting provide additional sets of metadata that can be stored and shared. Metadata schemes to support collaborative meetings and collaboratories will be required. Scientists collaborating on grid networks are going to require methods and tools to build large-scale ontologies, annotation services, inference engines, integration tools, and knowledge discovery services for Grid and e-Science applications (De Roure et al., 2001). 2.12 Metadata for Wireless Applications Infrared detection and transmission can be used in libraries to beam context-sensitive data or applications to users' PDAs, depending on where they are physically located (Kaine-Krolak & Novak, 1995). Similarly, GPS information can be used to download location-relevant data to users' PDAs or laptops when they are traveling, e.g., scientists on field trips. Such context-sensitive applications require location metadata to be attached to information resources in databases connected to wireless networks. The ROADNet (ROADNet, 2002) project on HPWREN HPWREN High Performance Wireless Research and Education Network (HPWREN, 2001), a high-performance wireless network, is a demonstration of the collection and streaming of real-time seismic, oceanographic, hydrological hy·drol·o·gy n. The scientific study of the properties, distribution, and effects of water on the earth's surface, in the soil and underlying rocks, and in the atmosphere. , ecological, geodetic See geodetic coordinates. , and physical data and metadata via a wireless network. Real-time numeric, audio, and video data are collected via field sensors and researchers connected to HPWREN and posted to discipline-specific servers connected over a network. This data is immediately accessible by interdisciplinary scientists in near-real time. Extraction of metadata from real-time data flow, as well as high-speed metadata fusion across multiple data sensors, are high-priority research goals within applications such as ROADNet. 2.13 Metadata Authentication Manually generated metadata for Web resources cannot be assumed to be accurate or precise descriptions of those resources. The metadata and/or the Web page may have been deliberately constructed or edited so as to misrepresent mis·rep·re·sent tr.v. mis·rep·re·sent·ed, mis·rep·re·sent·ing, mis·rep·re·sents 1. To give an incorrect or misleading representation of. 2. the content of the resource and to manipulate the behavior of the retrieval systems that use the metadata. Basically, anyone can create any metadata they want about any object on the Internet with any motivation. There is an urgent need for technologies that can vouch for or authenticate metadata so that Web indexing systems that crawl across the Internet developing Web index databases know when the associated metadata can be trusted (Lynch, 2001b). Hence there are a number of research projects investigating methods for explicitly identifying and validating the source of metadata assertions, using technologies such as XML Signature. Search engines give higher confidence weightings to metadata signed by trusted providers, and this is reflected in the retrieved search results. The XML Signature Working Group, a joint working group of the IETF See Internet Engineering Task Force. IETF - Internet Engineering Task Force and W3C (W3C XML Signature, 2003), has developed an XML compliant syntax for representing signatures of Web resources (or anything referenceable by a URI Uri, in the Bible Uri (y `rī), in the Bible.1 Father of Bezaleel (1.) 2 Father of Geber (2.) 3 Porter. ) and procedures for computing and verifying such signatures. Such signatures can easily be applied to metadata and used by Web servers and search engines to ensure metadata's authenticity and integrity. The XML Signature specification is based on Public Key Cryptography An encryption method that uses a two-part key: a public key and a private key. To send an encrypted message to someone, you use the recipient's public key, which can be sent to you via regular e-mail or made available on any public Web site or venue. in which signed and protected data is transformed according to an algorithm parameterized by a pair of numbers--the so-called public and private keys. Public Key Infrastructure (PKI (Public Key Infrastructure) A framework for creating a secure method for exchanging information based on public key cryptography. The foundation of a PKI is the certificate authority (CA), which issues digital certificates that authenticate the identity of ) systems provide management services for key registries--they bind users' identities to digital certificates and public/private key pairs that have been assigned and warranted by trusted third parties (Certificate Authorities). Another approach is the Pretty Good Privacy (PGP (Pretty Good Privacy) A data encryption program from PGP Corporation, Palo Alto, CA (www.pgp.com). Published as freeware in 1991 and widely used around the world for encrypting e-mail messages and securing files, PGP is available for commercial use and as freeware for ) system (PGP, 2002) in which a "Web of Trust" is built up from an established list of known and trusted identity/key bindings. Trust is established in new unfamiliar identity/key bindings because they are cryptographically signed by one or more parties that are already trusted. 2.14 Annotation Systems The motivation behind annotation systems is related to the issue of metadata trust and authentication--users can attach their own metadata, views, opinions, comments, ratings, and recommendations to particular resources or documents on the Web, which can be read and shared with others. The basic philosophy is that we are more likely to value and trust the opinions of people we respect than metadata of unknown origin. The W3C's Annotea system (W3C Annotea, 2001) and DARPA's Web Annotation Service (DARPA, 1998) are two Web-based annotation systems that have been developed. Current research is focusing on annotation systems within real-time collaborative environments (Benz and Lijding, 1998), annotation tools for film/video and multimedia content (IBM VideoAnnEx, 2001; Ricoh MovieTool, 2002; ZGDV ZGDV Zentrum für Graphische Datenverarbeitung VIDETO, 2002; DSTC FilmEd, 2003), and tools to enable the attachment of spoken annotations to digital resources (PAXit, 2003) such as images or photographs. 2.15 Weblogging Metadata Weblogging or Blogging (Sullivan, 2002; Reynolds et al., 2002) is a very successful paradigm for lightweight publishing, which has grown sharply in popularity over the past few years and is being used increasingly to facilitate communication and discussion within online communities. The idea of semantic blogging is to add additional semantic structure to items shared over blog channels or RSS feeds to enable semantic search, navigation, and filtering of blogs or streaming data. Blizg (Blizg, 2003) and BlogChalking (BlogChalking, 2002) are two examples of Weblog See blog and Web log. (World-Wide Web) weblog - (Commonly "blog") Any kind of diary published on the World-Wide Web, usually written by an individual (a "blogger") but also by corporate bodies. search engines that use metadata to enable searching across Weblog archives and the detection of useful connections between and among blogs. 2.16 Metadata for Preservation A number of initiatives have been focusing on the use of metadata to support the digital preservation of resources. Such initiatives include: Reference Model for an Open Archival Information System An Open Archival Information System (or OAIS) is an archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community. (OAIS OAIS Open Archival Information System (library and information science) OAIS Officer Assignment Information System OAIS Opinion, Attitude, and Interest Survey , 2002), the CURL Exemplars in Digital Archives project (CEDARS, 2002), the National Library of Australia The National Library of Australia is located in Canberra, Australia. Established in 1960, the Library grew out of the Federal Parliamentary Library, which was established in 1901. (NLA NLA National Library of Australia NLA National Liberation Army (Macedonian rebel group) NLA No Longer Available NLA Network Location Awareness NLA National Lipid Association NLA National Legislative Assembly ) PANDORA project (PANDORA, 2002), the Networked European Deposit Library (NEDLIB NEDLIB Networked European Deposit Library , 2001), and the Online Computer Library Center/Research Libraries Group (OCLC/ RLG RLG Research Libraries Group, Inc. (Dublin, OH) RLG Ring Laser Gyro RLG RedLightGreen Project RLG Royal Laotian Government RLG Resident Love Goddess RLG Right, Let's Go ) Working Group on Preservation Metadata (OCLC/RLG, 2003). These initiatives rely on the preservation of both the original bytestream/digital object, as well as detailed metadata that will enable the preserved data to be interpreted in the future. The preservation metadata provides sufficient technical information about the resources to support either migration or emulation. Metadata can facilitate the long-term access of the digital resources by providing a complete description of the technical environment needed to view the work, the applications and version numbers needed, and decompression schemes, as well as any other files that need to be linked to it. However, associating appropriate metadata with digital objects will require new workflows and metadata input tools at the points of creation, acquisition, reuse, migration, etc. This will demand initial effort to be made the first time a particular class of digital resource is received into a collection. However, assuming many of the same class of resource are received, economies of scale can be achieved by reusing the same metadata model and input tools. The Library of Congress's Metadata Encoding and Transmission Standard (METS) (Library of Congress, 2003) schema provides a flexible mechanism for encoding descriptive, administrative, and structural metadata for a digital library object and for expressing the complex links between these various forms of metadata. Other research initiatives are investigating extensions to METS to enable the preservation of audiovisual content or complex multimedia objects such as multimedia artworks (Avant Garde, 2003; DSTC NewMedia, 2003). These approaches involve the association of ancillary and contextual information such as interviews with artists and the use of the Bit Stream Description Language (BSDL (Boundary Scan Description Language) An IEEE language used to describe structures for boundary scan testing. See scan technology. ) (Amielh and Devillers, 2002) to convert objects preserved as bit streams into formats that can be displayed on the current platforms. 3. CONCLUSIONS In this paper, I have attempted to provide an overview of some of the key metadata research efforts currently underway that are expected to improve our ability to search, discover, retrieve, and assimilate information on the Internet. The number and extent of the research projects and initiatives described in this paper demonstrate three things: 1. The resource requirements and intellectual and technical issues associated with metadata development, management, and exploitation are far from trivial, and we are still a long way from MetaUtopia; 2. Metadata means many different things to many different people, and its effectiveness depends on implementers resolving key issues, including: * Identifying the best metadata models, schemas, and vocabularies to satisfy their requirements; * Deciding on the granularity of metadata necessary for their needs--this will involve a trade-off between the costs of developing and managing metadata, the desired search capabilities, potential future uses, and preservation needs; * Balancing the costs and subjectivity of user-generated metadata with the anticipated error rate of automatic metadata extraction tools; * Ensuring the currency, authenticity, and integrity of the metadata; * Choosing between decentralized, distributed metadata architectures and centralized repositories for the storage and management of metadata. 3. Despite its problems, metadata is still considered a very useful and valuable component in organizing content on the Internet and in enabling us to find relevant information and services effectively. REFERENCES Alta Vista. (2003). Retrieved July 28, 2003, from http://www.altavista.com/. Amielh, M., & Devillers, S. (2002). Bitstream syntax description language: Application of XML-schema to multimedia content adaptation. WWW WWW or W3: see World Wide Web. (World Wide Web) The common host name for a Web server. The "www-dot" prefix on Web addresses is widely used to provide a recognizable way of identifying a Web site. 2002 Conference. Honolulu. Retrieved August 11, 2003, from http://www2002.org/CDROM/alternate/334/. Andre, E. (2000). The generation of multimedia documents. In R. Dale, H. Moisl, and H. Somers (Eds.), A handbook of natural language processing: Techniques and applications for the processing of language as text. (pp. 305-327). Tampa: Marcel Dekker, Inc. Retrieved August 11, 2003, from http://www.dfki.de/imedia/papers/handbook.ps. Artesia. (2003). Retrieved August 11, 2003, from http://www.artesiatech.com/. Asia Pacific Advanced Network (APAN). (2003). Retrieved August 11, 2003, from http:// apan.net/. Avant Garde. (2003). Archiving the avant garde: Documenting and preserving variable MediaArt. Retrieved August 11, 2003, from http://www.bampfa.berkeley.edu/ciao/avant_garde.html. Barnard, K. (2003). Computer vision meets digital libraries. Retrieved August 11, 2003, from http://elib.cs.berkeley.edu/vision.html. Benz, H., & Lijding, M. E. (1998). Asynchronously replicated shared workspaces for a multimedia annotation service over Internet. Lecture notes in computer science Lecture Notes in Computer Science (LNCS) is a computer science series published by Springer Science+Business Media. . Retrieved August 11, 2003, from http://elib.uni-stuttgart.de/opus/volhexte/1999/533/. Berners-Lee, T., Hendler, J., & Lassila, O. (2001, May). The semantic Web. Scientific American. Retrieved August 11, 2003, from http://www.sciam.com /article.cfm?collD=1&articlelD=00048144-10D2-1C70-84A9809EC588EF21. Blizg. (2003). Retrieved August 11, 2003, from http://blizg.com/. BlogChalking. (2002). Retrieved August 11, 2003, from http://www.blogchalking.tk/. Bormans, J. & Hill, K. (2002). MPEG-21 overview V.5. Retrieved August 11, 2003, from http://www.chiariglione.org/mpeg/standards/mpeg-21/mpeg-21.htm. Bourret, R. (2003a). XML and databases. Retrieved August 11, 2003, from http://www.rpbourret.com/xml/XMLAndDatabases.htm. Bourret, R. (2003b). XML Database products. Retrieved August 11, 2003, from http://www.rpbourret.com/xml/XMLDatabaseProds.htm. Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. In Proceedings of the 7th International World Wide Web Conference (WWW7) (pp 107-117). Brisbane, Australia. Retrieved August 11,2003, from http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm. Canarie. (2002). Retrieved August 11, 2003, from http://www.canarie.ca/. CEDARS, CURL. (2002). Exemplars in digital archives. Retrieved August 11, 2003, from http://www.leeds.ac.uk/cedars/. CIDOC CRM. (2003). CIDOC conceptual reference model The CIDOC Conceptual Reference Model (CRM) provides the extensible ontology for concepts and information in cultural heritage and museum documentation. It is the international standard (ISO 21127:2006) for the controlled exchange of cultural heritage information. . Retrieved August 11, 2003, from http://cidoc.ics.forth.gr/. Conlan, O., Wade, V., Bruen, C., & Gargan, M. (2002). Multi-model, metadata driven approach to adaptive hypermedia, services for personalized e-learning. Second International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems, Malaga, Spain, May 2002. Convera Screening Room. (2003). Retrieved August 11, 2003, from http://www.convera.com/Products/products_sr.asp. CORES. (2003). CORES--A forum on shared metadata vocabularies. Retrieved August 11, 2003, from http://www.cores-eu.net/. CWI's Semi-automatic Hypermedia Presentation Generation (Dynamo) Project. (2000). Retrieved August 11, 2003, from http://db.cwi.nl/projecten/project.php4?prjnr=74. DAML+OIL. (2001, December 18). Reference description. W3C Note 18 December 2001. Retrieved August 11, 2003, from http://www.w3.org/TR/daml+oil-reference. DAML Ontology Library. (2003). Retrieved August 11, 2003, from http://www.daml.org/ontologies/. DANTE. (2003). Retrieved August 11, 2003, from http://www.dante.net/. DARPA Object Service Architecture Web Annotation Service (1998). Project Summary. Retrieved August 11, 2003, from http://www.objs.com/OSA/Annotations-service.html. Delgado, J., Gallego, I, Garcia, R., & Gil, R. (2002). An ontology for intellectnal property rights: IPROnto. Poster at 1st International Semantic Web Conference (ISWC ISWC International Symposium on Wearable Computers ISWC International Standard Musical Work Code ISWC Institute of Soil and Water Conservation ISWC Islamic Society of Western Connecticut (Danbury, CT, USA) ISWC International Semantic Web Conference 2002). Retrieved August 11, 2003, from http://dmag.upf.es/flas_eng/publicaciones.htm. Denny, M. (2002, November). Ontology building: A survey of editing tools. Retrieved August 11, 2003, from http://www.xml.com/pub/a/2002/11/06/ontologies.html. Department of Computer Science and Engineering, University of Washington (DCSE). (2003). Object and concept recognition for content-based image retrieval Content-based image retrieval (CBIR), also known as query by image content (QBIC) and content-based visual information retrieval (CBVIR) is the application of computer vision to the image retrieval problem, that is, the problem of searching for . Retrieved August 11, 2003, from http://www.cs.washington.edu/research/imagedatabase/. De Roure, D., Jennings, N., & Shadbolt, N. (2001). Research agenda for the semantic grid: A future e-science infrastructure. [Technical report]. UKeS-2002-02, UK e-Science Technical Report Series. National e-Science Centre, Edinburgh, UK. Retrieved August 11, 2003, from http://www.semanticgrid.org/html/semgrid.html. Doctorow, C. (2001). Metacrap: Putting the torch to seven straw-men of the meta-utopia. Retrieved August 11, 2003, from http://www.well.com/~doctorow/metacrap.htm. Dragon Systems. (2003). Retrieved August 11, 2003, from http://www.dragonsys.com/. DSpace. (2002). DSpace durable digital depository. Retrieved August 11, 2003, from http://www.dspace.org. DSTC FilmEd. (2003). The FilmEd project. Retrieved August 11, 2003, from http://metadata.net/filmed/. DSTC New Media. (2003). The New Media art preservation New media art preservation, a form of art preservation, is the study and practice of techniques for sustaining artworks created using digital, biological, performative, and other variable media. project. Retrieved August 11, 2003, from http://metadata.net/newmedia/. Dublin Core Metadata Initiative (DCMI). (2003). Retrieved August 11, 2003, from http://www.dublincore.org/. Edutella. (2002). Retrieved August 11, 2003, from http://edutella.jxta.org/. eLib Newsagent project. (1996). Retrieved August 11, 2003, from http://www.ukoln.ac.uk/ services/elib/projects/newsagent/. FAST. (2000). Multimedia Search Engine. Retrieved August 11, 2003, from http://www.multimedia.alltheWeb.com/ Fluhr, C., Schmit, D., Ortet, P., Elkateb, E, & Gurmer, K., (1997). SPIRIT-W3: A distributed cross-lingual indexing and search engine. Retrieved August 11, 2003, from http:// www.isoc.org/isoc/whatis/conferences/inet/97/proceedings/A8/A8_1.HTM. Foaf. (2000). The "Friend of a Friend" Project. Retrieved August 11, 2003, from http://www.foaf-project.org/. FoafCORP. (2002). Retrieved August 11, 2003, from http://www.grorg.org/2002/10/foafcorp/. Fossick. (2003). Online multimedia and digital image search. Retrieved August 11, 2003, from http://fossick.com/Multimedia.htm. Friskit. (2002). Music streaming media search. Retrieved August 11, 2003, from http://www.friskit.com/. GGF GGF Gegebenenfalls (German: if necessary) GGF Geschäftsführer (German) GGF Global Grid Forum GGF Glass and Glazing Federation (UK) GGF Great-Grandfather GGF Good Good Friday . (2003). Grid initiatives and projects. Retrieved August 11, 2003, from http:// www.gridforum.org/L_Involved_Mktg/init.htm. Goncalves M. A., France R. K., & Fox, E. A. (2001). MARIAN: Flexible interoperability for federated Connected and treated as one. See federated database and federated directories. digital libraries. In Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries (ECDL ECDL European Computer Driving License (computer skills certification programme; European Computer Driving Licence Foundation Ltd.) ECDL European Conference on Research and Advanced Technology for Digital Libraries 2001). Darmstadt, Germany. Retrieved August 11, 2003, from http://link.springer.de/link/service/series/0558/papers/ 2163/21630173.pdf. Google. (2003). Image search. Retrieved August 11, 2003, from http://images.google.com/. GrangeNet. (2003). Retrieved August 11, 2003, from http://www.grangenet.net/. Grid Computing. (2000). Retrieved August 11, 2003, from http://www.gridcomputing.com/. High Performance Wireless Research and Education Network The introduction to this article provides insufficient context for those unfamiliar with the subject matter. Please help [ improve the introduction] to meet Wikipedia's layout standards. You can discuss the issue on the talk page. (HPWREN). (2001). Retrieved August 11, 2003, from http://hpwren.ucsd.edu/news/011109.html. Hunter, J. (2002). Rights markup extensions for the protection of indigenous knowledge. WWW2002 Conference. Honolulu, HI. Retrieved August 11, 2003, from http://archive.dstc.edu.au/IRM_project/paper.pdf. Hunter, J., Koopman, B., & Sledge, J. (2003). Software tools for indigenous knowledge management. In Museums on the Web. Charlotte. Retrieved August 11, 2003, from http://archive.dstc.edu.au/IRM_project/software_paper/IKM_software.pdf. IBM VideoAnnEx. (2001). Retrieved August 11, 2003, from http://www.research.ibm.com/VideoAnnEx/. IMS. (2003). IMS Learning Resource Meta-data Specification. Retrieved August 11, 2003, from http://www.imsglobal.org/metadata/index.cfm. indecs Framework Ltd. (2000). Retrieved August 11, 2003, from http://www.indecs.org/. Informedia. (2003). Digital video understanding. Retrieved August 11, 2003, from http://www.informedia.cs.cmu.edu/. Internet2. (2003). Retrieved August 11, 2003, from http://www.internet2.edu/. IPTC. (2001). NewsML--Markup for the third millenium. Retrieved August 11, 2003, from http://www.iptc.org/site/NewsML/. Joseph, S. (2003). Decentralized meta-data strategies. University of Tokyo. Retrieved August 11, 2003, from http://www.neurogrid.net/Decentralized_Meta-Data_Strategies-neat.html. JXTA. (2003). Retrieved August 11, 2003, from http://www.jxta.org. Kaine-Krolak, M., & Novak, M. (1995). An introduction to infrared technology: Applications in the home, classroom, workplace, and beyond.... Retrieved August 11, 2003, from http://trace.wisc.edu/docs/ir_intro/ir_intro.htm. Kartoo. (2000). Retrieved August 11, 2003, from http://www.kartoo.com/. Lagoze, C., & Hunter, J. (2001). The ABC ontology and model. Journal of Digital Information, 2(2). Retrieved August 11, 2003, from http://jodi.ecs.soton.ac.uk/Articles/v02/i02/Lagoze/. Lewis, N. (2002). Talking about a revolution? First impressions of Ex Libris's MetaLib. Ariadne, 32. Retrieved August 11, 2003, from http://www.ariadne.ac.uk/issue32/metalib/. Library of Congress. (2002). Library of Congress: American Memory Historical Collections for the National Digital Library. Retrieved August 11, 2003, from http://memory.loc.gov/. Library of Congress. (2003). METS (Metadata Encoding and Transmission Standard). Retrieved August 11, 2003, from http://ww.loc.gov/standards/mets/. Little, S., Guerts, J., & Hunter, J. (2002). The dynamic generation of intelligent multimedia presentations through semantic inferencing. ECDL 2002. Rome, Italy. Retrieved August 11, 2003, from http://archive.dstc.edu.au/maenad/ecdl2002/ecdl2002.html. Liu X., et. al., (2002). Federated searching interface techniques for heterogeneous OAI repositories. Journal of Digital Information, 2(4). Retrieved August 11, 2003, from http://jodi.ecs.soton.ac.uk/Articles/v02/i04/Liu/. Lynch, C. (2001a). Personalization and recommender systems in the larger context: New directions and research questions. Second DELOS Network of Excellence Workshop on Personalisation and Recommender Systems in Digital Libraries, Dublin, Ireland. Retrieved August 11, 2003, from http://www.ercim.org/publication/ws-proceedings/ DelNoe02/CliffordLynchAbstract.pdf. Lynch, C. (2001b). When documents deceive: Trust and provenance as new factors for information retrieval in a tangled Web. Journal of the American Society for Information Science, 52(1), 12-17. Retrieved August 11, 2003, from http://www.cs.ucsd.edu/~rik/others/lynch-trust-jasis00pdf. Magkanaraki, A., Karvounarakis, G., Anh, T. T., Christophides, V., & Plexousakis, D. (2002). Ontology storage and querying. Technical Report No. 308, ICS FORTH, Crete. Retrieved August 11, 2003, from http://139.91.183.30:9090/RDF/publications/tr308.pdf. Martinez, J. (2002). MPEG-7 overview (Version 8). Retrieved August 11, 2003, from http://www.chiariglione.org/mpeg/standards/mpeg-7/lnpeg-7.htm. MSNBC News Tools Home (2003). Retrieved August 11,2003, from http://www.msnbc.com/toolkit.asp. Nardi, B. A., & O'Day, V. L. (1998). Application and implications of agent technology for libraries. The Electronic Library, 16(5), 325-337. Net2one Personalized News Informer Informer Battus revealed theft by Mercury; turned to touchstone. [Gk. and Rom. Myth.: Walsh Classical, 47] Cenci, Count Francesco old libertine ravishes his daughter Beatrice. [Br. Lit. . (2003). Retrieved August 11, 2003, from http://www.net2one.com/index2.asp Networked European Deposits Library (NEDLIB). (2001). Retrieved August 11, 2003, from http://www.kb.nl/coop/nedlib/. Ng, D., Wactlar, H., Hauptmann, A., & Christel, M. (2003). Collages as dynamic summaries of mined video content for intelligent multimedia knowledge management. AAAI AAAI American Association for Artificial Intelligence AAAI Association for the Advancement of Artificial Intelligence (Menlo Park, California) AAAI American Academy of Allergy, Asthma, and Immunology Spring Symposium Series on Intelligent Multimedia Knowledge Management. Palo Alto, CA. Retrieved August 11, 2003, from http://www-2.cs.cmu.edu/~hdw/aaai03_ng.pdf. Oard, D. (2003). Speech retrieval papers and project descriptions. Retrieved August 11, 2003, from http://raven.umd.edu/dlrg/speech/papers.html. OASIS. (2003). Universal Description, Discovery & Integration (UDDI) of Web Services. Retrieved August 11, 2003, from http://www.uddi.org/. OCLC/RLG. (2003). Preservation Metadata Working Group. Retrieved August 11, 2003, from http://www.oclc.org/research/pmwg/. ODRL. (2003). Retrieved August 11, 2003, from http://www.odrl.net/. Open Archival Information System (OAIS) Resources. (2002). Retrieved August 11, 2003, from http://www.rlg.org/longterm/oais.html. Open Archives Initiative (OAI). (2003). Retrieved August 11, 2003, from http://www.openarchves.org/. Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). (2003). Version 2.0, June 14, 2002. Retrieved August 11, 2003, from http://www.openarchives.org/OAI/openarchivesprotocol.html. Open Archives Initiative (OAI) Registered Service Providers. (2002). Retrieved August 11, 2003, from http://www.openarchives.org/service/listproviders.html. OpenGALEN. (2002). Retrieved August 11, 2003, from http://www.opengalen.org/. OpenVideo. (2002). The OpenVideo project. Retrieved August 11, 2003, from http://www.open-video.org/. PANDORA. (2002). National Library of Australia, PANDORA project. Retrieved August 11, 2003, from http://pandora.nla.gov.au/. PAXit. (2003). PAXit image database software. Retrieved Angust 11, 2003, from http://www.paxit.com/paxit/communications.asp. Pretty Good Privacy (PGP). (2002). Retrieved August 11, 2003, from http://www.rubin.ch/pgp/pgp.en.html. QBIC. (2001). IBM's query by image content. Retrieved August 11, 2003, from http://wwwqbic.almaden.ibm.com/. Reamy, T., (2002). Auto-categorization: Coming to a library or intranet near you! EContent Magazine. Retrieved August 11, 2003, from http://www.econtentmag.com/r5/2002/reamy11_02.html. Reynolds, D., Cayzer, S., Dickinson, I., & Shabajee, F. (2002). Blogging and semantic blogging. SWAD-Europe Deliverable12.1.1: Semantic Web applications--analysis and selection. Retrieved August 11, 2003, from http://www.w3.org/2001/sw/Europe/reports/ chosen_demos rationale_report/hp-applications-selection.html#sec-appendix-blogging. Ricoh Movie Tool. (2002). Retrieved August 11, 2003, from http://www.ricoh.co.jp/src/multimedia/MovieTool/. ROADNet. (2002). Real-time observatories, applications, and data management network. Retrieved August 11, 2003, from http://roadnet.ucsd.edu/. Rogers, D., Hunter J., & Kosovic, D. (2002). The TV-trawler project. International Journal of Imaging Systems and Technology, Special Issue on Multimedia Content Description and Video Compression. RoMEO. (2003). Project ROMEO (Rights MEtadata for Open archiving). Retrieved August 11, 2003, from http://www.lboro.ac.uk/departments/ls/disresearch/romeo. SCHEMAS. (2002). SCHEMAS--Forum for metadata schema implementers. Retrieved August 11, 2003, from http://www.schemas-forum.org/registry/. Semantic Grid. (2003). Retrieved August 11, 2003, from http://www.semanticgrid.org/. Singingfish. (2002). Multimedia search. Retrieved August 11, 2003, from http://www.singingfish.com/. SNOMED CT. The Systemized Nomenclature of Medicine. (2003). Retrieved August 11, 2003, from http://www.snomed SNOMED Standard Nomenclature of Medical Diseases and Operations. SNOMED Systemized Nomenclature of Medicine & Veterinary Health informatics A computerized electronic vocabulary system for medical databases, which may become the standard vocabulary .org/. Sullivan, A. (2002). The blogging revolution. Wired, 10.05. Retrieved August 11, 2003, from http://www.wired.com/wired/archive/10.05mustread.html?pg=2. SUO. (2002). IEEE P1600.1 Standard Upper Ontology SUO Working Group. Retrieved August 11, 2003, from http://suo.ieee.org/. Sure, Y, Erdmann, M., Angele, J., Staab, S, Studer, R., & Wenke, D. (2002). OntoEdit: Collaborative ontology engineering for the semantic Web. In Proceedings of the First International Semantic Web Conference 2002 (ISWC 2002). Sardinia, Italy. Retrieved August 11, 2003, from http://link.springer.de/link/service/series/0558/papers/2342/23420221.pdf. TITAN. (2003). A Cross-Language WWW Search Engine. Retrieved August 11, 2003, from http://titan.mcnet.ne.jp/. Topic Maps. (2000). Retrieved August 11, 2003, from http://www.topicmaps.org/. TouchGraph GoogleBrowser. (2001). Retrieved August 11, 2003, from http://www.touchgraph.com/TGGoogleBrowser.html. University of Illinois Library (UIL). (2002). University of Illinois Open Archives Collection. Retrieved August 11, 2003, from http://bolder.grainger.uiuc.edu/uiLibOAIProvider/ 2.0/oai.asp. Virage. (2003). Retrieved August 11, 2003, from http://www.virage.com/. Virginia Tech. (1997a). Digital libraries and software agents. Retrieved August 11, 2003, from http://scholar.lib.vt.edu/digilib/reports/agents.pdf. Virginia Tech. (1997b). Ontologies and agents in digital libraries. Retrieved August 11, 2003, from http://ei.cs.vt.edu/~cs6604/f97/agents.htm. W3C Annotea Web Annotation Service. (2001). Retrieved August 11, 2003, from http://annotest.w3.org/. W3C RDF Syntax and Model Recommendation. (1999). Retrieved August 11, 2003, from http://www.w3.org/TR/REC-rdf-syntax/. W3C RDF Vocabulary Description Language 1.0. (2003). RDF Schema, W3C working draft. Retrieved August 11, 2003, from http://www.w3.org/TR/rdf-schema/. W3C semantic web activity. (2002). Retrieved August 11, 2003, from http://www.w3.org/ 2001/sw/Activity. W3C Web Ontology Language (OWL). (2003). Guide, version 1.0, W3C working draft. Retrieved August 11, 2003, from http://www.w3.org/TR/owl-guide/. W3C Web Ontology (WebOnt) Working Group. (2003). Retrieved August 11, 2003, from http://www.w3.org/2001/sw/WebOnt/. W3C Web Services Activity. (2003). Retrieved August 11, 2003, from http://www.w3.org/2002/ws/. W3C Extensible Markup Language (XML). (2003). Retrieved August 11, 2003, from http://www.w3.org/XML. W3C XML Protocol Working Group. (2003). Simple object access protocol (SOAP). Retrieved August 11, 2003, from http://www.w3.org/2000/xp/Group/. W3C XML Query. (2003). Retrieved August 11, 2003, from http://www.w3.org/XML/Query. W3C XML Schema Language. (2003). Retrieved August 11, 2003, from http://www.w3.org/XML/Schema. W3C XML Signature Working Group. (2003). Retrieved August 11, 2003, from http://www.w3.org/Signature/. Wang, J. Z., & Li, J. (2003). Evaluation strategies for automatic linguistic indexing of pictures. In Proceedings of the IEFE International Conference on Image Processing (ICIP ICIP International Conference on Image Processing ICIP Industry Cooperative Innovation Program (Australian government) ICIP International Conference on Information Processing ICIP Indigenous Cultural and Intellectual Property ). Barcelona, Spain. WebBrain. (2001). Retrieved August 11, 2003, from http://www.Webbrain.com/. Web Services Description Language (WSDL). (2003). Version 1.2, W3C working draft. Retrieved August 11, 2003, from http://www.w3.org/TR/wsd112. XrML. (2003). Retrieved August 11, 2003, from http://www.xrml.org/. ZGDV VIDETO. (2002). ZGDV video description tool. Retrieved August 11, 2003, from http://www.rostock.igd.fraunhofer.de/ZGDV/Abteilungen/zr2/Produkte /videto/index_html_en. Jane L. Hunter, Distributed Systems Technology Centre Pty. Ltd., Level 7, GP South, University of Queensland, St. Lucia, Queensland 4072, Australia JANE L. HUNTER is a Senior Research Fellow at the Distributed Systems Technology Centre at the University of Queensland. Her research interests are multimedia metadata modeling and interoperability between metadata standards across domains and media types. She was chair of the MPEG-7 Semantic Interoperability Adhoc Group, editor of the MPEG-7 Description Definition Language DDL (Description Definition Language) is part of the MPEG-7 standard. It gives an important set of tools for the users to create their own Description Schemes (DSs) and Descriptors (Ds). ISO/IEC 15838-2, and is the liaison between MPEG, W3C, and the Dublin Core Metadata Initiative. |
|
||||||||||||||

`rī)
Lecture Notes in Computer Science (LNCS) is a computer science series published by Springer Science+Business Media.
Printer friendly
Cite/link
Email
Feedback
Reader Opinion