Quilt--an XML query language.XML XML
in full Extensible Markup Language.
Markup language developed to be a simplified and more structural version of SGML. It incorporates features of HTML (e.g., hypertext linking), but is designed to overcome some of HTML's limitations. is an extremely versatile markup language markup language
Standard text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship among its parts. The most widely used markup languages are SGML, HTML, and XML. , capable of labeling the information content of diverse data sources including structured and semi-structured documents, relational databases relational database
Database in which all data are represented in tabular form. The description of a particular entity is provided by the set of its attribute values, stored as one row or record of the table, called a tuple. , and object repositories. A query language A generalized language that allows a user to select records from a database. It uses a command language, menu-driven method or a query by example (QBE) format for expressing the matching condition. of similar versatility is needed to realize the potential of XML as a universal medium for data interchange. Most existing proposals for XML query languages are robust for particular types of data sources but weak for other types. In this paper, the authors combine features from several sources to propose a new query language called Quilt, which is designed to be broadly applicable across all types of XML data sources. As increasing amounts of information are stored in XML, exchanged in XML, and presented as XML through various interfaces, the ability to intelligently query XML data sources becomes increasingly important. The data model of XML is quite different from the data models of traditional databases, and traditional database languages are not well suited to querying XML sources. In addition, XML blurs the distinction between data and documents, allowing documents to be treated as data sources and traditional data sources to be treated as documents. Query languages, including XML query languages, still tend to be designed either for documents or for data. Since XML may represent a rich variety of information coming from many sources and structured in many ways, an XML query language must try to provide the same rich variety in the queries that may be formulated.
Quilt is a query language for XML. It originated when the authors attempted to apply XML query languages such as XML-QL XML-QL Extensible Markup Language Query Language , XPath, XQL XQL XML Query Language
XQL Extensible Query Language , YATL, and XSQL XSQL Extended Sql to a variety of use cases. We found that each of these languages had strong advantages for some of the queries we examined, but was unable to express other queries we found equally important. Therefore, we chose to take some of the best ideas from these languages, plus some ideas from SQL SQL
in full Structured Query Language.
Computer programming language used for retrieving records or parts of records in databases and performing various calculations before displaying the results. and OQL (Object Query Language) A query language that supports complex data types (multimedia, spatial, compound documents, etc.) that are stored as objects. Defined by the ODMG, it is a superset of the SQL-92 query language. , integrating them with a fresh syntactic Dealing with language rules (syntax). See syntax. approach. Our goal was to design a small, implementable language that met the requirements specified in the W3C XML W3C XML World Wide Web Consortium Extensible Markup Language Query Working Group's http://www.w3.org/XML/Group/xmlquery-reg XML Query Requirements. During our design work, we have adapted features from various languages, carefully assembling them to form a new design-hence the name "Quilt". The resulting language supports queries that draw information from various sources and patch them together in a new form. This is another reason for the name "Quilt". Although Quilt borrows from many languages, its conceptual integrity comes from a deep reliance on the structure of XML, which is based on hierarchy, sequence, and reference. Quilt is able to express queries based on document structure and to produce query results that either preserve the original document structure or generate a new structure. Quilt can operate on flat structures, such as rows from relational databases, and generate hierarchies based on the information contained in these structures. It can also express queries based on parent/child relationships or document sequence, and can preserve these relationships or generate new ones in the output document. Although Quilt can be used to retrieve data from objects, relational databases, or other non-XML sources, this data must be expressed in an XML view, and Quilt relies solely on the structure of the XML view in its data model.
We begin by describing the Quilt language, and then illustrate how the language may be used for a variety of queries, including queries typically used for relational data, semi-structured data, and structured documents.
A first look at Quilt
Two languages that were particularly influential in the design of Quilt were XML-QL and XQL. We will begin our description of Quilt by discussing some of the concepts it inherited inherited
received by inheritance.
inherited achondroplastic dwarfism
see achondroplastic dwarfism.
inherited combined immunodeficiency
see combined immune deficiency syndrome (disease). from these two languages. In the discussion that follows, we conceptualize con·cep·tu·al·ize
v. con·cep·tu·al·ized, con·cep·tu·al·iz·ing, con·cep·tu·al·iz·es
To form a concept or concepts of, and especially to interpret in a conceptual way: the nested elements of an XML document as nodes in a tree-like data model. Sometimes we will speak in terms of the elements and attributes of the XML document, and sometimes in terms of the nodes of the tree that models the document.
The core of an XML-QL query consists of a WHERE clause that binds one or more variables, and a CONSTRUCT clause that uses the values of the bound variables to construct a new document, possibly having a structure quite different from that of the original document(s). This is a very powerful concept because it allows documents to be joined, grouped, and transformed in arbitrary ways. However, the patterns that are used in XML-QL for binding variables tend to be unnecessarily verbose Wordy; long winded. The term is often used as a switch to display the status of some operation. For example, a /v might mean "verbose mode." . More important, the result of the WHERE clause is a relation composed of scalar scalar, quantity or number possessing only sign and magnitude, e.g., the real numbers (see number), in contrast to vectors and tensors; scalars obey the rules of elementary algebra. Many physical quantities have scalar values, e.g. values, which is not sufficient to preserve all the information about the hierarchic and sequential relationships among the elements of the original document. As a result, XML-QL is a very powerful language for asking certain types of questions, but is weak at expressing queries based on hierarchy and sequence, such as 'Find the second quotation in the fourth chapter." From XML-QL, Quilt borrows the concept of constructing a new document from bound variables, but Quilt uses a different paradigm for binding the variables and a richer data model to represent the result of the binding.
XQL-98 contains a powerful syntax for navigating in a hierarchic document and for selecting a set of nodes that satisfy some complex condition. This syntax was a precursor of the abbreviated syntax of XPath, which has been adopted as a W3C Recommendation A W3C Recommendation is the final stage of a ratification process of the World Wide Web Consortium (W3C) working group concerning the standard. It is the equivalent of a published standard in many other industries. and has been used in various XML-related applications such as XPointer and XSLT (eXtensible Stylesheet Language Transformation) Software that converts an XML document into another format such as HTML, PDF or text. It may also be used to convert one XML document to another XML document with a different set of XML tags (different schema). . For selecting sets of nodes, Quilt uses abbreviated XPath expressions, enhanced by certain operators borrowed from XQL. Since an XPath expression, in general, returns a "forest' of nodes in which hierarchy and sequence are preserved, Quilt is able to express queries based on hierarchy and sequence more easily than languages such as XML-QL.
In this section, we will introduce the Quilt language using some example queries based on the following XML document:
<?xml version="1.0"?> <!DOCTYPE bib SYSTEM "books.bibitemd"> <bib> <book> <title>Harold and the Purple Crayon</title> <author> <lastname>Johnson</last> <firstname>Crockett</first> </author> <pubinfo> <publisher>Harper and Row</publisher> <price>$4.76</price> <year>1955</year> </pubinfo> </book> <book> <title>Harold's Fairy Tale</title> <author> <lastname>Johnson</last> <firstname>Crockett</first> </author> <pubinfo> <publisher>Harper and Row</publisher> <price>$4.76</price> <year>1956</year> </pubinfo> </book> <book> <title>Rise Up Singing</title> <author> <lastname>Blood</last> <firstname>Peter</first> </author> <author> <lastname>Patterson</last> <firstname>Annie</first> </author> <pubinfo> <publisher>Sing Out Corporation</publisher> <price>$15.45</price> <year>1988</year> </pubinfo> </book> </bib>
A simple form of a Quilt query consists of FOR, WHERE, and RETURN clauses. The FOR clause uses XPath expressions to bind the values of one or more variables. In general, an xpath expression evaluates to a set of nodes. The FOR clause generates an ordered list In HTML an ordered list
An unordered list
(2) A set of values passed from one programming language to another application program or to a system program such as the operating system. is generated for each possible way of binding the list of variables to nodes that satisfy their respective XPath expressions. When a node is bound to a variable, its descendant nodes are carried along with it. The WHERE clause applies a filter to the tuples, retaining only those tuples that satisfy a given search condition. The RETURN clause then generates a new document structure using the values of the bound variables.
The following example finds every book written by Crockett Johnson. The FOR clause generates a list of bindings in which $b is bound to individual book elements in the document found at the given URL URL
in full Uniform Resource Locator
Address of a resource on the Internet. The resource can be any type of file stored on a server, such as a Web page, a text file, a graphics file, or an application program. , and $a is bound to individual author elements that are descendants DESCENDANTS. Those who have issued from an individual, and include his children, grandchildren, and their children to the remotest degree. Ambl. 327 2 Bro. C. C. 30; Id. 230 3 Bro. C. C. 367; 1 Rop. Leg. 115; 2 Bouv. n. 1956.
2. of $b. The WHERE clause retains only those tuples of bindings in which the author is Crockett Johnson, and the RETURN clause uses the resulting values of $b to generate a list of books. By default, the ordering of book elements in the original document is preserved.
FOR $b IN document ("http://www.biblio.com/books.xml")//book, $a IN $b/author WHERE $a/firstname = "Crockett" AND $a/lastname = "Johnson" RETURN $b
The semantics semantics [Gr.,=significant] in general, the study of the relationship between words and meanings. The empirical study of word meanings and sentence meanings in existing languages is a branch of linguistics; the abstract study of meaning in relation to language or of comparisons in Quilt is the same as in XPath. For example, in the query above, consider the comparison Wastname = 'Johnson". In general, an XPath expression such as Wastname evaluates to a set of nodes. The comparison is considered to be True if at least one of the nodes returned by $allastname has a string-value equal to "Johnson'.
The query shown above returns a list of book elements without a common root element to bold them. Since a well-formed XML document A "well-formed" XML document is defined as an XML document that has correct XML syntax. According to W3C, this means:
tr.v. en·closed, en·clos·ing, en·clos·es
1. To surround on all sides; close in.
2. To fence in so as to prevent common use: enclosed the pasture. the results of our query. The easiest way to do this is to embed em·bed also im·bed
v. em·bed·ded, em·bed·ding, em·beds
1. To fix firmly in a surrounding mass: embed a post in concrete; fossils embedded in shale. the entire query within an element constructor, as illustrated in the following query:
<Result> ( FOR $b IN document ("http://www.biblio.com/books.xml")//book, $a IN $b/author WHERE $a/firstname = "Crockett" AND $a/lastname = "Johnson" RETURN $b ) </Result>
In the above queries, the result consists of selected nodes from the original document. Each selected node carries its descendants with it to the result document, preserving part of the structure of the original document. Many queries, however, need to create an output document that is structured in a new way. One example of such a transformation is a query that inverts the hierarchy of the input data, converting it from authors nested inside books to books nested inside authors. This inversion inversion /in·ver·sion/ (in-ver´zhun)
1. a turning inward, inside out, or other reversal of the normal relation of a part.
2. a term used by Freud for homosexuality.
3. can be accomplished by the following query:
<Result> ( FOR $a IN DISTINCT document ("http://www.biblio.com/books.xml")//author RETURN <BooksByAuthor> <Author> $a/lastname/text() </Author> ( FOR $b IN document ("http://www.biblio.com/books.xml")//book[author=$a] RETURN $b/title SORTBY (title) ) </BooksByAuthor> SORTBY (Author) ) </Result> The result of this query on the sample data shown above is as follows: <Result> <BooksByAuthor> <Author>Blood</Author> <title>Rise Up Singing</title> </BooksByAuthor> <BooksByAuthor> <Author>Johnson</Author> <title>Harold and the Purple Crayon</title> <title>Harold's Fairy Tale</title> </BooksByAuthor> <BooksByAuthor> <Author>Patterson</Author> <title>Rise Up Singing</title> </BooksByAuthor> </Result>
The following points are worth noting about the above example:
1. This example illustrates how a query can be nested inside another query. The variable $A, defined in the outer query, ranges over the set of distinct authors in the input document. The variable $b, defined in the inner query, ranges over the individual books written by a given author.
2. The data contained in the 'Author" elements of the output document is selected from the 'lastname' elements of the input document. The contents of these elements are extracted using XPath expressions that use the text ( function, and then enclosed en·close also in·close
tr.v. en·closed, en·clos·ing, en·clos·es
1. To surround on all sides; close in.
2. To fence in so as to prevent common use: enclosed the pasture. in new tags in the RETURN clause.
3. The order of elements in the output document is controlled by SORTBY clauses in both the inner and outer queries.
4. The predicate In programming, a statement that evaluates an expression and provides a true or false answer based on the condition of the data. [author = $a] involves a comparison of complex types. Quill quill: see pen. compares elements that have substructure substructure /sub·struc·ture/ (-struk-chur) the underlying or supporting portion of an organ or appliance; that portion of an implant denture embedded in the tissues of the jaw.
n. by normalizing whitespace and testing for the same content and structure, in the same sequence. One concrete way to do this is by serializing content and markup (text) markup - In computerised document preparation, a method of adding information to the text indicating the logical components of a document, or instructions for layout of the text on the page or other information which can be interpreted by some automatic system. . For example, the normalized string value of an element might be formed by serializing the markup, removing leading and trailing whitespace for each element, and removing whitespace between tags. Here is a sample <author> element, normalized using this scheme:
Once a complex element has been converted to a normalized string form, it can be compared to the normalized string value of another complex element. The queries discussed above explicitly s~ the URL of the document to be queried. This is not always necessary, and is sometimes undesirable. Some queries operate on a set of nodes for which there is no URL, or for which it is not convenient to provide a URL; for instance, a query may operate on the set of documents found in a web site or document repository, the set of nodes found in a collection used in some programming language, or the results of a previous query. Furthermore, it is useful to he able to write queries that may be applied to a variety of data sources, including data sources that do not exist at the time that the query is written. If the document (function does not occur in a variable binding, the binding is applied to the set of input nodes for the query, which is implicitly determined by the environment in which the query is executed. The following query is equivalent to the first query shown in this paper, except that it binds the variable $b to all book elements contained within the implicit set of input nodes for the query:
Editorial Note: The above is a brief introductory section from the original paper which runs to some 24 pages and is available on the Internet under the same title.