Printer Friendly
The Free Library
19,607,050 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

First principles ... XML.


What is a Document?

Outside the IT world, we encounter all sorts of hard-copy documents: letters, forms, books, newspapers, magazines, invoices, maps, birthday cards, leaflets, posters, sticky notes, and many others. The concept of the hard-copy document evolved almost without notice. When we encounter new types of hard- copy documents in our homes, offices, classrooms, libraries, stationery stores, or local newsstands, we seem to accept them unconsciously. Meanwhile, within the IT world, the concept of the electronic document has evolved, too. Let's start with a more basic definition first:

The definition of text. Text is generally considered to consist of words, sentences, lines, paragraphs, and even pages. Typically, the term text also refers to electronic text stored as only simple character codes (for example, American Standard Code for Information Interchange American Standard Code for Information Interchange: see ASCII.


See ASCII.

American Standard Code for Information Interchange - The basis of character sets used in almost all present-day computers.
, or ASCII ASCII or American Standard Code for Information Interchange, a set of codes used to represent letters, numbers, a few symbols, and control characters. Originally designed for teletype operations, it has found wide application in computers. , codes)-that is, without any formatting. At one time, the electronic document was considered to be only a text file created with applications called text editors or word processors. You could almost use the terms text and document interchangeably. However, as developments occurred on many IT fronts, the concept of the electronic document expanded to contain tables, graphics, charts, and other objects, in a manner that parallels the evolution of hard-copy documents. Now, in the IT world, documents are considered to be electronic files of any size for any media (for example, text, audio, video, and graphics), created by any application. Thus now, the definition of text is a subset of the definition of the document.

In their Extensible Markup Language See XML.

(language, text) Extensible Markup Language - (XML) An initiative from the W3C defining an "extremely simple" dialect of SGML suitable for use on the World-Wide Web.

http://w3.org/XML/.
 1.0 Recommendation, which is recognized as the official XML XML
 in full Extensible Markup Language.

Markup language developed to be a simplified and more structural version of SGML. It incorporates features of HTML (e.g., hypertext linking), but is designed to overcome some of HTML's limitations.
 standard, the W3C (World Wide Web Consortium, www.w3.org) An international industry consortium founded in 1994 by Tim Berners-Lee to develop standards for the Web. It is hosted in the U.S. by the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT (www.csail.mit.edu/index.php).  (World Wide Web Consortium) defines an XML document as a 'data object if it is well-formed, as defined in (Extensible Markup Language Recommendation).... Each XML document has both a logical and a physical structure.'

Related to the discussion of documents is the term document processing Processing text documents, which includes indexing methods for text retrieval based on content. See document imaging. , which is the discipline that deals with creating applications that allow you to deal with documents of all types. Document processing is split into creating or manipulating those documents destined des·tine  
tr.v. des·tined, des·tin·ing, des·tines
1. To determine beforehand; preordain: a foolish scheme destined to fail; a film destined to become a classic.

2.
 for human viewing and consumption (people-oriented processing), as well as those that are destined for computer consumption (machine-oriented processing). Documents of the former type were comparatively long-lived (examples: specifications, drawings, procedures, charts, and memos). Documents of the latter type tend to have shorter lives because their data may be manipulated, transformed, or combined on the fly to create or add to different documents. XML descends from a rich document-processing heritage.

What Is Markup (text) markup - In computerised document preparation, a method of adding information to the text indicating the logical components of a document, or instructions for layout of the text on the page or other information which can be interpreted by some automatic system. ?

The concept of markup is important. After all, it's the M in XML. But what does it mean? Basically, it's a way to add information about data to the data itself. You may not have had much experience with other markup languages
  • List of XML markup languages
  • List of general purpose markup languages
  • List of document markup languages
  • List of content syndication markup languages
  • List of lightweight markup languages
  • List of user interface markup languages
, but you have probably used markup in one form or another. For example, have you ever:

* Underlined or highlighted words or passages on a hard-copy document to indicate important information?

* Marked up a draft hard copy of a document with symbols indicating "new paragraph here," "bold this,' or 'remove this"?

* Made marks on a map indicating where you want to turn, or where specific features are located?

* Numbered bits of information, such as steps, in an otherwise unnumbered procedure?

Those and similar activities involve marking up data. All the symbols, notes, numbers, designated actions, or highlights-all of which qualify as some sort of markup--emphasize or convey something about the data: what it means or what you are supposed to do with it.

A significant paper entitled 'Markup Systems and the Future of Scholarly Text Processing," by James H. Coombs Coombs can refer to:
  • Coombs test, a test for the presence of antibodies or antigens
  • Coombs reagent, the reagent used in the Coombs test
  • Coombs' method, a type of voting designed by the psychologist Clyde Coombs
 and Allen H. Renear of Brown University and Steven J. DeRose of Electronic Book Technologies, describes six types of markup:

* Punctuational, which consists of the use of defined marks (examples: spaces, periods, and commas) to provide primarily syntactic information about written utterances. Punctuation has been around so long that we take it for granted.

* Presentational, which we use to group our materials for order and clarity. Examples include horizontal and vertical spacing, page breaks, numbering, chapter and section breaks, justification, and lists.

* Procedural, which is a characteristic of whatever system will be used to create presentations. Often grouped with what we call file formats, it tells someone or something (such as a formatter with a set of installed drivers) about the size and format of a document (examples: letter, legal, and portrait and landscape views), fonts, and other production information.

* Descriptive, which allows, authors to identify certain elements of their data as belonging to a specific family of text. The common word- processing tag BT (for basal text) is au example: When a text formatter encounters that code, it consults, and then follows, a predefined set of roles that tell it what to do to display or print the characters associated with that code. If changes become necessary, you only need to change the roles, not each BT tag in the document.

* Referential, which refers to separate physical or electronic entities (that is, located external to the document being processed) that will be imported and placed in the proper sequence during document processing.

* Metamarkup, which provides the ability to control the definition and interpretation of markup tags, and to extend the vocabulary of derivative markup languages. Metadata, the concept of information about information, is related to this concept, if you would like to read the Coombs, Renear, and DeRose paper that the preceding definitions were taken from, you can find it online at www.oasis-open.org/cover/coombs.html#Figure1.

Markup, in summary, is the inserting of characters or symbols into a document to indicate the document's physical and logical structure, to indicate how the information in a document should appear, or to provide some other form of instruction. The primary goal of markup is to separate the treatment (for example, the appearance or structure) of a document from the actual data in the document.

XML Is a Markup Language markup language

Standard text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship among its parts. The most widely used markup languages are SGML, HTML, and XML.
 and a Metalanguage A language used to describe another language.

1. metalanguage - [theorem proving] A language in which proofs are manipulated and tactics are programmed, as opposed to the logic itself (the "object language").
 

There are over two dozen categories of computer languages; you are probably familiar with some of them already. For example, machine languages consist entirely of numbers and are only understood by computers; assembly languages are symbolic representations of the machine language of a specific computer; programming languages such as COBOL COBOL: see programming language.
COBOL
 in full Common Business-Oriented Language.

High-level computer programming language, one of the first widely used languages and for many years the most popular language in the business community.
, C++, Java, and Fortran instruct computers to do specific tasks; and fourth-generation languages fourth-generation language (4GL)

Fourth-generation computer programming language. 4GLs are closer to human language than other high-level languages and are accessible to people without formal training as programmers.
, whose syntax is closer to human languages.

Some language categories are separate and discrete, dedicated to specific functions; some languages are subsets of others; and some are hybrids of other languages.

For a more comprehensive listing of computer languages and their respective definitions, consult the The Language List Web site, maintained by Bill Kinnersley of the Computer Science Department, University of Kansas The University of Kansas (often referred to as KU or just Kansas) is an institution of higher learning in Lawrence, Kansas. The main campus resides atop Mount Oread. , at http:llcui.unige.ch/OSG/info/Langlistlintro.htmi.

XML doesn't fall into any of the categories previously listed, but it falls into two other categories: It's a markup language and a metalanguage.

Markup Languages

Extrapolating the definition of markup, markup languages are those that allow us to create documents consisting of plaint PLAINT, Eng. law. The exhibiting of any action, real or personal, in writing; the party making his plaint is called the plaintiff.  ext data and other entities, plus markup codes that define the logical components and structure, as well as describe the appearance or other aspects of the data. The markup codes, called tags, are located adjacent to their respective data. In addition, the data and tags are usually composed of common text characters, so they can remain independent of platform and operating system operating system (OS)

Software that controls the operation of a computer, directs the input and output of data, keeps track of files, and controls the processing of computer programs.
. Why use markup languages? These days, with the proliferation proliferation /pro·lif·er·a·tion/ (pro-lif?er-a´shun) the reproduction or multiplication of similar forms, especially of cells.prolif´erativeprolif´erous

pro·lif·er·a·tion
n.
 of computer networks across the world, with their myriad of applications, operating systems Operating systems can be categorized by technology, ownership, licensing, working state, usage, and by many other characteristics. In practice, many of these groupings may overlap. , and proprietary network devices, the data transmitted over the wire, through the air, and through space must include all the information necessary for automated systems (such as computers, routers, firewalls, and hubs) to transmit, receive, and otherwise deal with the data. The receiver needs the markup tags to interpret the message: the format and content of database data, multimedia graphic files or audio files, debit card debit card, card that allows the cost of goods or services that are purchased to be deducted directly from the purchaser's checking account. They can also be used at automated teller machines for withdrawing cash from the user's checking account.  transactions, credit card authorizations, or any other various document types.

Metalanguages

In the What Is Markup? section, we provided a listing of markup types. One of the types was called metamarkup, which provides the capability to control the definition and interpretation of markup tags, and to extend the vocabulary of derivative markup languages. That is consistent with the definition found at Mr. Kinnersley's Web site, where he defines a metalanguage as a 'language used for formal description of another language." It is also consistent with other definitions of metalanguages, which describe them as languages that provide for conformance-proving mechanisms. XML permits developers to create their own specialized derivative languages, but all of those languages have one thing in common: They meet XML specifications. If languages and documents contravene con·tra·vene  
tr.v. con·tra·vened, con·tra·ven·ing, con·tra·venes
1. To act or be counter to; violate: contravene a direct order.

2.
 the XML specifications, the XML processors in their respective applications may or may not process them. Even if they do, they will likely generate error messages DOS and Windows error messages are listed individually in this database by the message that is displayed when they occur. See also DOS error messages and Application Error.

.

From XML in 60min per day. Wiley ISBN ISBN
abbr.
International Standard Book Number


ISBN International Standard Book Number

ISBN n abbr (= International Standard Book Number) → ISBN m 
 0-4711-42254-1
COPYRIGHT 2003 A.P. Publications Ltd.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2003, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Teach-In
Author:McKinnon, Al
Publication:Database and Network Journal
Date:Oct 1, 2003
Words:1479
Previous Article:The RDF triple.
Next Article:HansaWorld Version 4.1.



Related Articles
WORLD WIDE WEB CONSORTIUM ISSUES XML SCHEMA AS A W3C RECOMMENDATION.
Printing from XML. (Monograph).
Economics: the sexy social science? (Symposium).
Food safety publications from NEHA Training LLC.
NEHA Training creates exciting new opportunities in food safety.
Proud of public accounting background.
A recipe for making art.
Reusing educational material for teaching and learning: current approaches and directions.

Terms of use | Copyright © 2012 Farlex, Inc. | Feedback | For webmasters | Submit articles