Quilt--an XML query language.
Quilt is a query language for XML. It originated when the authors attempted to apply XML query languages such as XML-QL, XPath, XQL, YATL, and XSQL to a variety of use cases. We found that each of these languages had strong advantages for some of the queries we examined, but was unable to express other queries we found equally important. Therefore, we chose to take some of the best ideas from these languages, plus some ideas from SQL and OQL, integrating them with a fresh syntactic approach. Our goal was to design a small, implementable language that met the requirements specified in the W3C XML Query Working Group's http://www.w3.org/XML/Group/xmlquery-reg XML Query Requirements. During our design work, we have adapted features from various languages, carefully assembling them to form a new design-hence the name "Quilt". The resulting language supports queries that draw information from various sources and patch them together in a new form. This is another reason for the name "Quilt". Although Quilt borrows from many languages, its conceptual integrity comes from a deep reliance on the structure of XML, which is based on hierarchy, sequence, and reference. Quilt is able to express queries based on document structure and to produce query results that either preserve the original document structure or generate a new structure. Quilt can operate on flat structures, such as rows from relational databases, and generate hierarchies based on the information contained in these structures. It can also express queries based on parent/child relationships or document sequence, and can preserve these relationships or generate new ones in the output document. Although Quilt can be used to retrieve data from objects, relational databases, or other non-XML sources, this data must be expressed in an XML view, and Quilt relies solely on the structure of the XML view in its data model.
We begin by describing the Quilt language, and then illustrate how the language may be used for a variety of queries, including queries typically used for relational data, semi-structured data, and structured documents.
A first look at Quilt
Two languages that were particularly influential in the design of Quilt were XML-QL and XQL. We will begin our description of Quilt by discussing some of the concepts it inherited from these two languages. In the discussion that follows, we conceptualize the nested elements of an XML document as nodes in a tree-like data model. Sometimes we will speak in terms of the elements and attributes of the XML document, and sometimes in terms of the nodes of the tree that models the document.
The core of an XML-QL query consists of a WHERE clause that binds one or more variables, and a CONSTRUCT clause that uses the values of the bound variables to construct a new document, possibly having a structure quite different from that of the original document(s). This is a very powerful concept because it allows documents to be joined, grouped, and transformed in arbitrary ways. However, the patterns that are used in XML-QL for binding variables tend to be unnecessarily verbose. More important, the result of the WHERE clause is a relation composed of scalar values, which is not sufficient to preserve all the information about the hierarchic and sequential relationships among the elements of the original document. As a result, XML-QL is a very powerful language for asking certain types of questions, but is weak at expressing queries based on hierarchy and sequence, such as 'Find the second quotation in the fourth chapter." From XML-QL, Quilt borrows the concept of constructing a new document from bound variables, but Quilt uses a different paradigm for binding the variables and a richer data model to represent the result of the binding.
XQL-98 contains a powerful syntax for navigating in a hierarchic document and for selecting a set of nodes that satisfy some complex condition. This syntax was a precursor of the abbreviated syntax of XPath, which has been adopted as a W3C Recommendation and has been used in various XML-related applications such as XPointer and XSLT. For selecting sets of nodes, Quilt uses abbreviated XPath expressions, enhanced by certain operators borrowed from XQL. Since an XPath expression, in general, returns a "forest' of nodes in which hierarchy and sequence are preserved, Quilt is able to express queries based on hierarchy and sequence more easily than languages such as XML-QL.
In this section, we will introduce the Quilt language using some example queries based on the following XML document:
<?xml version="1.0"?> <!DOCTYPE bib SYSTEM "books.bibitemd"> <bib> <book> <title>Harold and the Purple Crayon</title> <author> <lastname>Johnson</last> <firstname>Crockett</first> </author> <pubinfo> <publisher>Harper and Row</publisher> <price>$4.76</price> <year>1955</year> </pubinfo> </book> <book> <title>Harold's Fairy Tale</title> <author> <lastname>Johnson</last> <firstname>Crockett</first> </author> <pubinfo> <publisher>Harper and Row</publisher> <price>$4.76</price> <year>1956</year> </pubinfo> </book> <book> <title>Rise Up Singing</title> <author> <lastname>Blood</last> <firstname>Peter</first> </author> <author> <lastname>Patterson</last> <firstname>Annie</first> </author> <pubinfo> <publisher>Sing Out Corporation</publisher> <price>$15.45</price> <year>1988</year> </pubinfo> </book> </bib>
A simple form of a Quilt query consists of FOR, WHERE, and RETURN clauses. The FOR clause uses XPath expressions to bind the values of one or more variables. In general, an xpath expression evaluates to a set of nodes. The FOR clause generates an ordered list of tuples, each containing of a value for each of the bound variables. A tuple is generated for each possible way of binding the list of variables to nodes that satisfy their respective XPath expressions. When a node is bound to a variable, its descendant nodes are carried along with it. The WHERE clause applies a filter to the tuples, retaining only those tuples that satisfy a given search condition. The RETURN clause then generates a new document structure using the values of the bound variables.
The following example finds every book written by Crockett Johnson. The FOR clause generates a list of bindings in which $b is bound to individual book elements in the document found at the given URL, and $a is bound to individual author elements that are descendants of $b. The WHERE clause retains only those tuples of bindings in which the author is Crockett Johnson, and the RETURN clause uses the resulting values of $b to generate a list of books. By default, the ordering of book elements in the original document is preserved.
FOR $b IN document ("http://www.biblio.com/books.xml")//book, $a IN $b/author WHERE $a/firstname = "Crockett" AND $a/lastname = "Johnson" RETURN $b
The semantics of comparisons in Quilt is the same as in XPath. For example, in the query above, consider the comparison Wastname = 'Johnson". In general, an XPath expression such as Wastname evaluates to a set of nodes. The comparison is considered to be True if at least one of the nodes returned by $allastname has a string-value equal to "Johnson'.
The query shown above returns a list of book elements without a common root element to bold them. Since a well-formed XML document must have precisely one root element, we may desire to generate a new element to enclose the results of our query. The easiest way to do this is to embed the entire query within an element constructor, as illustrated in the following query:
<Result> ( FOR $b IN document ("http://www.biblio.com/books.xml")//book, $a IN $b/author WHERE $a/firstname = "Crockett" AND $a/lastname = "Johnson" RETURN $b ) </Result>
In the above queries, the result consists of selected nodes from the original document. Each selected node carries its descendants with it to the result document, preserving part of the structure of the original document. Many queries, however, need to create an output document that is structured in a new way. One example of such a transformation is a query that inverts the hierarchy of the input data, converting it from authors nested inside books to books nested inside authors. This inversion can be accomplished by the following query:
<Result> ( FOR $a IN DISTINCT document ("http://www.biblio.com/books.xml")//author RETURN <BooksByAuthor> <Author> $a/lastname/text() </Author> ( FOR $b IN document ("http://www.biblio.com/books.xml")//book[author=$a] RETURN $b/title SORTBY (title) ) </BooksByAuthor> SORTBY (Author) ) </Result> The result of this query on the sample data shown above is as follows: <Result> <BooksByAuthor> <Author>Blood</Author> <title>Rise Up Singing</title> </BooksByAuthor> <BooksByAuthor> <Author>Johnson</Author> <title>Harold and the Purple Crayon</title> <title>Harold's Fairy Tale</title> </BooksByAuthor> <BooksByAuthor> <Author>Patterson</Author> <title>Rise Up Singing</title> </BooksByAuthor> </Result>
The following points are worth noting about the above example:
1. This example illustrates how a query can be nested inside another query. The variable $A, defined in the outer query, ranges over the set of distinct authors in the input document. The variable $b, defined in the inner query, ranges over the individual books written by a given author.
2. The data contained in the 'Author" elements of the output document is selected from the 'lastname' elements of the input document. The contents of these elements are extracted using XPath expressions that use the text ( function, and then enclosed in new tags in the RETURN clause.
3. The order of elements in the output document is controlled by SORTBY clauses in both the inner and outer queries.
4. The predicate [author = $a] involves a comparison of complex types. Quill compares elements that have substructure by normalizing whitespace and testing for the same content and structure, in the same sequence. One concrete way to do this is by serializing content and markup. For example, the normalized string value of an element might be formed by serializing the markup, removing leading and trailing whitespace for each element, and removing whitespace between tags. Here is a sample <author> element, normalized using this scheme:
Once a complex element has been converted to a normalized string form, it can be compared to the normalized string value of another complex element. The queries discussed above explicitly s~ the URL of the document to be queried. This is not always necessary, and is sometimes undesirable. Some queries operate on a set of nodes for which there is no URL, or for which it is not convenient to provide a URL; for instance, a query may operate on the set of documents found in a web site or document repository, the set of nodes found in a collection used in some programming language, or the results of a previous query. Furthermore, it is useful to he able to write queries that may be applied to a variety of data sources, including data sources that do not exist at the time that the query is written. If the document (function does not occur in a variable binding, the binding is applied to the set of input nodes for the query, which is implicitly determined by the environment in which the query is executed. The following query is equivalent to the first query shown in this paper, except that it binds the variable $b to all book elements contained within the implicit set of input nodes for the query:
Editorial Note: The above is a brief introductory section from the original paper which runs to some 24 pages and is available on the Internet under the same title.