Printer Friendly
The Free Library
14,503,364 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

XML for Content and E-Commerce.


A striking advantage of computer-based data is its flexible re-use for multiple purposes. Re-purposing digital document content creates new publication avenues and potentially creates new revenue streams for competitive enterprises. The ability to transmit data faster and more economically by electronic means creates incentive for using computer-based methods of document distribution instead of more traditional "print and distribute" models. However, most organizations investing time and resources in creating document content want to get the greatest possible return. If a single investment in content creation yields multiple products and services, operating cost is reduced and new income generated.

The use of extensible markup language See XML.

(language, text) Extensible Markup Language - (XML) An initiative from the W3C defining an "extremely simple" dialect of SGML suitable for use on the World-Wide Web.

http://w3.org/XML/.
 (XML XML
 in full Extensible Markup Language.

Markup language developed to be a simplified and more structural version of SGML. It incorporates features of HTML (e.g., hypertext linking), but is designed to overcome some of HTML's limitations.
) meets these goals admirably. Document data and text content encoded with XML can be posted to Internet Web sites, processed with desktop publishing software The following is a list of major desktop publishing software. A wide range of related software tools exist in this field, including many plug-ins and tools related to the applications listed below.  for distribution on CD-ROM CD-ROM: see compact disc.
CD-ROM
 in full compact disc read-only memory

Type of computer storage medium that is read optically (e.g., by a laser).
 media, and used in e-commerce applications. In fact, the growing acceptance of XML use for building document content is creating a sense that XML is a new open data standard that can be used for data exchange, document publishing, and eventually document archiving. Organizations have begun to employ XML for many document publication and data exchange purposes.

As with most new technologies, implementation realities result in scaling back expectations as "lessons learned" begin to surface. Some organizations will experience XML shock as they retool re·tool  
v. re·tooled, re·tool·ing, re·tools

v.tr.
1. To fit out (a factory, for example) with a new set of machinery and tools for making a different product.

2.
 information systems departments and computer infrastructures. Others will find that XML documents are actually more limited than those produced in more traditional formats and that not all Web-site-hosted documents really need XML to be informative and functional. XML use for content management has enormous potential, but it must be implemented realistically.

Evolving Content Management

The explosion in Internet use during the 1990s and the subsequent drive to Web-enable many computer software systems resulted in the need for a standard method of presenting information on Web sites. For many years, basic-level hypertext markup language (hypertext, World-Wide Web, standard) Hypertext Markup Language - (HTML) A hypertext document format used on the World-Wide Web. HTML is built on top of SGML. "Tags" are embedded in the text. A tag consists of a "<", a "directive" (in lower case), zero or more parameters and a ">".  (HTML HTML
 in full HyperText Markup Language

Markup language derived from SGML that is used to prepare hypertext documents. Relatively easy for nonprogrammers to master, HTML is the language used for documents on the World Wide Web.
) was used to present textual content across the Internet by creating tagged text Refers to a text file that contains information fields (the tags) embedded within the words, sentences and paragraphs. The tags are also text, but are contained within unique start and end symbols; typically the less-than and greater-than characters (< and >), although other symbols  documents. Tags -- special text codes -- were placed around various text and graphics elements in the document to indicate structure and specific formatting when the document displayed or printed. HTML was a convenient subset of standard generalized markup language (language, text) Standard Generalized Markup Language - (SGML) A generic markup language for representing documents. SGML is an International Standard that describes the relationship between a document's content and its structure.  (SGML SGML
 in full Standard Generalized Markup Language

Markup language for organizing and tagging elements of a document, including headings, paragraphs, tables, and graphics.
), an international standard that had been around for many years.

SGML is maintained by the International Organization for Standardization International Organization for Standardization (ISO)

Organization for determining standards in most technical and nontechnical fields. Founded in Geneva in 1947, its membership includes more than 100 countries.
 and is not bound to any particular computer operating system operating system (OS)

Software that controls the operation of a computer, directs the input and output of data, keeps track of files, and controls the processing of computer programs.
 or hardware platform. Originally intended to foster the exchange of text-based data between organizations, SGML is a language syntax used to create markup language markup language

Standard text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship among its parts. The most widely used markup languages are SGML, HTML, and XML.
 tags for different types of documents so that their content and formatting can be dealt with separately. Having a standard language for managing document content has proven advantageous for organizations over the years as a basis for content communication and exchange. For these reasons, there are SGML-based text editors, typesetters, and document management products, and many organizations have placed document content in a standard format by using SGML.

Since SGML documents are not tied to any one vendor or software, various industries have created their own document types (document type definitions -- DTDs) for information commonly exchanged in their business activities. However, SGML requires special training to use, and it is not particularly fast or efficient in the way it runs on computer systems. In addition, all documents must use a DTD (Document Type Definition) A language that describes the contents of an SGML document. The DTD is also used with XML, and the DTD definitions may be embedded within an XML document or in a separate file.  designed by the document's creator.

As a subset of SGML, HTML is useful for publishing documents on Internet Web sites and does not require DTDs to be created. Internet browsers, such as Microsoft's Internet Explorer Microsoft's Web browser, which comes with Windows starting with Windows 98. Commonly called "IE," versions for Mac and Unix are also available. Internet Explorer is the most widely used Web browser on the market. It has also been the browser engine in AOL's Internet access software.  and Netscape's Navigator, display HTML well and are relatively inexpensive. HTML is easy to learn, and the coded hypertext links among HTML pages form the basis for much of the Internet's fast and seamless document accessibility. HTML's simple, inexpensive nature contributed to the vast proliferation of HTML-based document content on the Internet almost overnight.

As HTML documents have become common, their inadequacies have become more obvious. Pages often print poorly due to the lack of sophisticated page-formatting capabilities. HTML does not work well with text justification, kerning, and hyphenation Breaking words that extend beyond the right margin. Software hyphenates words by matching them against a hyphenation dictionary or by using a built-in set of rules, or both. See discretionary hyphen. , or with features such as hanging indents and multiple columns that word processors easily address. HTML has no method of creating custom tags for specific document types, and HTML tags do not have good structural relationships, thus making it difficult to present content in rigorously structured pages. For these reasons, many contemporary Web sites use HTML for visually interesting and informational page displays, but make actual documents downloadable in Adobe's portable document format (file format) Portable Document Format - (PDF) The native file format for Adobe Systems' Acrobat. PDF is the file format for representing documents in a manner that is independent of the original application software, hardware, and operating system used to create those documents.  (PDF (Portable Document Format) The de facto standard for document publishing from Adobe. On the Web, there are countless brochures, data sheets, white papers and technical manuals in the PDF format. ). Electronic documents in PDF appear the same way across all computer platforms, software systems, and printers. In addition, the Adobe Acrobat Reader The former name of Adobe Reader. See PDF.  software is freely distributed, basic PDF files are inexpensive to produce, hot links to other documents or Web sites can be embedded, and there is little training required.

PDF also has a few drawbacks. It is possible to create page-oriented hot links internally within PDF files. However, they have minimal structural hierarchy and content indexing ability. Text searching is limited largely to the facilities provided by Adobe software A list of Adobe Systems products.

Current
  • Adobe Acrobat Capture
  • Adobe Acrobat Connect (formerly Macromedia Breeze)
  • Adobe Presenter
  • Adobe Acrobat Distiller
  • Adobe Acrobat Reader
  • Adobe AIR
  • Adobe Audition
 products. Most information retrieval information retrieval

Recovery of information, especially in a database stored in a computer. Two main approaches are matching words in the query against the database index (keyword searching) and traversing the database using hypertext or hypermedia links.
 is page oriented, as are most document navigation and display tools, due to the paper-replica orientation of the Adobe environment. A PDF document's precise visual display can vary slightly from the original document, depending on the available character fonts and color palettes of computers or printers. Software tools are unable to precisely access textual or numeric data Refers to quantities and money amounts used in calculations. Contrast with string or character data.  within PDF files, so PDF files are not especially useful for document data exchange.

XML, in contrast, excels at content management. It is not limited to a fixed number of tags, as is HTML. XML does not require using a DTD, plus new elements and attributes for use within documents can be defined. In addition, XML can create custom tags such as for improving the recall specificity of text searching. It also can be used to define new markup languages
  • List of XML markup languages
  • List of general purpose markup languages
  • List of document markup languages
  • List of content syndication markup languages
  • List of lightweight markup languages
  • List of user interface markup languages
 to customize data management when building complex documents, thus ensuring that the data can be used by more than one software application.

Instead of being presentation oriented, as with HTML (in browsers) and PDF (in print and viewed by Acrobat Reader), XML enhances document content management while still enabling a variety of software programs to handle presentation and publication. Information content can be stored in an XML document and then processed with HTML, a word processor, Adobe Acrobat, or database software.

XML Rockets into Cyberspace

As with all new technologies or methodologies, there is a great deal of hype with respect to XML. The advantages of XML document content management seem so overwhelming that many organizations are moving rapidly to embrace it.

There are few business environments where the exchange of information internally and externally is more important than in government settings. A U.S. federal government chief information officer council has now created an XML working group at www.xml.gov. The group promotes XML use by government agencies, encourages the learning of XML, and fosters sharing of implementation experiences. The U.S. Defense Department is creating a similar site at www.xml.mil, while the General Services Administration The General Services Administration (GSA) was established by section 101 of the Federal Property and Administrative Services Act of 1949 (40 U.S.C.A. § 751). The GSA sets policy for and manages government property and records. , the U.S. Fish and Wildlife Service, and the U.S. Patent and Trademark Office (PTO PTO
abbr.
1. Parent Teacher Organization

2. or p.t.o. please turn over

3. power takeoff


PTO or pto please turn over

Noun 1.
) already have internal XML initiatives (O'Hara 2000).

The Federal Deposit Insurance Corp. (FDIC FDIC

See: Federal Deposit Insurance Corporation


FDIC

See Federal Deposit Insurance Corporation (FDIC).
) plans to separate content from format when it publishes its quarterly Regional Outlook to the Web using XML. Being able to multi-publish this description of economic conditions in both paper and Web formats is a big incentive. However, the FDIC also exchanges data with the Federal Reserve Board, the Office of the Comptroller of the Currency The Office of the Comptroller of the Currency (or OCC) was established by the National Currency Act of 1863 and serves to charter, regulate, and supervise all national banks and the federal branches and agencies of foreign banks in the United States. , and private sector banking companies, where XML is expected to be especially useful (Caterinicchia 2001).

The National Archives National Archives, official depository for records of the U.S. federal government, established in 1934 by an act of Congress. Although displeasure concerning the method of keeping national records was voiced in Congress as early as 1810, the United States continued  and Records Administration (NARA Nara (nä`rä), city (1990 pop. 349,349), capital of Nara prefecture, S Honshu, Japan. An ancient cultural and religious center, it was founded in 706 by imperial decree and was modeled after Chang'an (see Xi'an), the capital of T'ang China. ), concerned with the long-term preservation of documentary information, is also entering the XML fray. An electronic records archives program will use XML to ensure that documents can be read in the future without regard to the software used to produce them. Some research is under way to ensure that records' authenticity is preserved, and this research is expected to show that more informational constructs will be required than just DTDs and document style sheets. In addition, XML topic maps Topic Maps is an ISO standard for the representation and interchange of knowledge, with an emphasis on the findability of information. The standard is formally known as ISO/IEC 13250:2003.  for knowledge representation and communication may prove beneficial in helping NARA connect records to various agencies' business processes (O'Hara 2001).

The PTO has set up an electronic filing system that allows it to accept patents across the Internet in XML format using public-key-encryption-based digital signature technology (www.uspto.gov/ebc/index.html). The PTO offers a downloadable patent application specification authoring tool (PASAT PASAT Poppleton Allen Sales Aptitude Test ) that works with word processors to export documents in XML format. SGML is now used for publishing patents, but the agency will switch to XML in January 2002. One incentive for XML-based electronic publishing An umbrella term for non-paper publishing, which includes publishing online or on media such as CDs and DVDs.  is that the agency will move to a new building in 2003 where there is little space for paper files (Daukantas 2000).

In the private sector, e-commerce is driving XML use, especially for data interchange. As Web sites began to take orders for commodities during the 1990s, it soon became clear that some document-based information would be dynamic by its very nature. Lists of available products, current inventory data, regional sales data, or an organization's roster of current members may fluctuate by the minute. To deliver this type of information across the Internet has previously required the integration of database software applications with a Web-based interface. Now, with XML, organizations can have Web-based data processing operating in the background to ensure that updateable documents are accessible. Although HTML simply indicates how data should look, XML can tell an application what the data means. For this reason, the same data can be displayed differently to different users and in disparate computer applications. In some cases, XML implementation replaces electronic date interchange (EDI (Electronic Data Interchange) The electronic communication of business transactions, such as orders, confirmations and invoices, between organizations. Third parties provide EDI services that enable organizations with different equipment to connect. ) applications.

EDI's advantage as a business method has always been the ability to conduct very fast, reliable, and secure transactions over value-added networks (VANs). For organizations conducting thousands of business transactions a day, EDI was the best solution for automating the business process. VANs were not inexpensive, however, as they were primarily dedicated, leased computer communication lines. In addition, EDI software and expertise are required, and most EDI transactions are one-to-one business communications. In contrast, XML data formatting knowledge and expertise are freely available, XML is customizable, and it can be used in a one-to-many business model -- the model of the Internet. For these reasons, many organizations plan over time to replace some EDI transactions with XML-based business processes (DeJesus 2001).

After all, how can they go wrong when XML use is a recommendation of the World Wide Web Consortium (www.w3c.org) with 400 international members dedicated to Web interoperability, and an XML working group that makes speeches, presentations, and technical specifications freely available at www.w3.org/XML? In addition, the Organization for the Advancement of Structured Information Standards (OASIS), a non-profit, international consortium that creates interoperable industry standards based on public standards like SGML and XML, has a number of XML-related committees at work (oasis-open.org/committees/committees. shtml).

Additionally, many organizations now offer consulting and education in XML. For instance, the O'Reilly publishing company produces the Web site XML.COM (1) (Computer Output Microfilm) Creating microfilm or microfiche from the computer. A COM machine receives print-image output from the computer either online or via tape or disk and creates a film image of each page.  (www.xml.com) with resources, buyer's guides, answers to frequently asked questions (FAQs), and free newsletters. Publications such as Web Techniques contain regular columns covering XML (e.g., XML@Large), and most major computer industry trade journals feature articles covering XML (see references).

Organizations such as the Graphic Communication Association are educating their members regarding XML's impact (www.gca.org/ whats_xml/). The variety of conferences, seminars, and publications are almost overwhelming. A visit to any major bookstore will reveal 10 to 20 titles of XML-related textbooks, technical references, and self-help manuals sitting on the shelves, ranging from The XML Bible with CD to XML for Dummies. A quick search for XML-related topics on the Internet book-seller Amazon.com's Web site indicated 354 related items for sale, of which two-thirds of the first 50 items have XML in the title.

Content = Records?

The advantages of XML will undoubtedly make it a leading development platform for both document creation and data exchange; but there are factors to consider from both the records management and business perspectives to temper assumptions that XML will invade all document-based applications. As with all new technologies, some education, infrastructure retooling, and capital investments will be required, leading to questions about return-on-investment and the value added Value Added

The enhancement a company gives its product or service before offering the product to customers.

Notes:
This can either increase the products price or value.
 to current business processes.

An unforeseen benefit of interest in XML is a renewed interest in data standardization and integration across enterprise computing architectures. However, as in the past, organizational politics and the technical intricacies of legacy systems can frustrate any attempt at achieving cross-organizational data sharing. Most important is the renewed realization that "an enterprise understand and resolve the different words and meanings it uses to refer to things important to it" (Finkelstein 2000). Records and information management professionals must seize this opportunity to point out that data standardization, commonly accepted names for records series, and integrated records classification and filing systems are all strongly related. How can a XML or SGML document type definition not be related to a records series?

If the goal of XML is re-purposing so that one set of content can be published in multiple document forms, how will that affect the authenticity of documents that are declared to be records? "Form (layout and design) must be separated from content (primarily text). Although the words and pictures of a print layout are related, they don't necessarily work together the same way on the Web or on a CD-ROM as they do in print" (Ward 2000). How does one designate an official record among a set of transitory (but related) documents in multiple formats derived from the same set of data? How does one preserve the context, as well as the content and presentation of the record?

Text and graphics based on the same content can appear very different visually depending on the final publication media. Printed documents can take advantage of many color palettes, image resolutions, and media to enhance text and graphics presentation. However, when the same text and graphics are published using XML on a Web site or on CD-ROM media, a more limited array of presentation tools is available due to software and hardware limitations.

Which media format will provide the best and most authentic official record for long-term preservation of informational content? Will a record's evidentiary value be affected even though its information content is less questionable? How critical is presentation and context to the authenticity of records? It may be up to the courts to decide.

XML has much potential to save us from the continuing Towers of Babel Babel (bā`bəl) [Heb.,=confused], in the Bible, place where Noah's descendants (who spoke one language) tried to build a tower reaching up to heaven to make a name for themselves.  that we create with new renditions of technology. Whether or not it will do so depends on our making well-considered decisions about how we use this new technology.

REFERENCES

Caterinicchia, Dan. "FDIC Simplifies Publishing with XML." Federal Computer Week. 8 January 2001:20.

Daukantas, Patricia. "PTO Starts E-government Shift." Government Computer News. 20 November 2000:1,41.

DeJesus, Edmund X. "EDI? XML? Or Both?" Computerworld. 8 January 2001:54-56.

Finkelstein, Clive. "XML Is Not a Silver Bullet." DM Review. October 2000:42.

Harold, Elliotte Rusty. The XML Bible. Foster City, CA: IDG IDG International Data Group
IDG Integrated Drive Generator
IDG Installation Design Guide
IDG Internet Discussion Group
IDG Inset Dielectric Guide
IDG International Dangerous Goods (mail, shipping) 
 Books Worldwide. 1999.

O'Hara, Colleen. "Future Electronic Records Archives Bet on XML." Federal Computer Week. 8 January 2001:24.

--. "XML Portal in the Works." Federal Computer Week. 4 December 2000:10.

Tittel, Ed and Frank Boumphrey. XML for Dummies. Foster City, CA: IDG Books Worldwide. 2000.

Ward, Noel. "The Challenges of Re-purposing." Electronic Publishing. December 2000:38.

John Phillips, CRM (Customer Relationship Management) An integrated information system that is used to plan, schedule and control the presales and postsales activities in an organization. , is the owner of Information Technology Decisions, a management consulting firm. He has more than 20 years' experience in information resources management, specializing in automated records management systems and other technology-related areas. He can be reached at jtpid@usit.net.
COPYRIGHT 2001 Association of Records Managers & Administrators (ARMA)
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2001, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:PHILLIPS, JOHN T.
Publication:Information Management Journal
Article Type:Brief Article
Geographic Code:1USA
Date:Apr 1, 2001
Words:2644
Previous Article:The Law and Records -- Rarely the Twain Shall Meet.(Brief Article)
Next Article:Managing Intellectual Capital.(Brief Article)
Topics:



Related Articles
Promonium.com to Offer Promotional Products Through Ariba Network E-Commerce Platform.
OnLink Announces Ariba's Selection of the OnLink Commerce Suite to Deploy New Supplier Advisor Service.
SMARTworks.com, Inc. to Offer Integrated Document Management Services Through Ariba B2B Commerce Platform; SMARTworks.com to Support cXML for Open...
eXcelon Corporation to Offer Supplier Enablement With Ariba B2B Commerce Platform.
Nitorum Corporation to Offer Integrated Staffing Services Through Ariba B2B Commerce Platform.
Vivant! to Offer Contract Labor Staffing Solutions Through Ariba B2B Commerce Platform; Vivant Supports cMXL for Open Platform Interoperability.
TestMart to Offer Test and Measurement Products Through Ariba B2B Commerce Platform.
Database and Network Journal Products 2000.(News Briefs)
WORLD WIDE WEB CONSORTIUM ISSUES XML SCHEMA AS A W3C RECOMMENDATION.(Technology Information)
Database and Network Journal 2001 - Product Reviews.

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles