RDF: an introduction.
RDF is very much like that elephant, and we're very much like the blind people, each grabbing at a different aspect of the specification, with our own interpretations of what it is and what it's good for. And we're discovering what the blind people discovered: not all interpretations of RDF are the same. Therein lies both the challenge of RDF as well as the value.
The main RDF specification web site is at http:llwww.w3.orgIRDFI. You can access the core working group's efforts at http://www.w3.orgl 2001/sw/RDFCorel. In addition, there's an RDF Interest Group forum that you can monitor or join at http:llwww.w3.org/RDFIInterestl.
The Semantic Web and RDF: A brief History
RDF is based within the Semantic Web effort. According to the W3C (World Wide Web Consortium) Semantic Web Activity Statement:
The Resource Description Framework (RDF) is a language designed to support the Semantic Web, in much the same way that HTML is the language that helped initiate the original Web. RDF is a framework for supporting resource description, or metadata (data about data), for the Web. RDF provides common structures that can be used for interoperable XML data exchange. The Resource Description Framework (RDF) offers developers a powerful toolkit for making statements and connecting those statements to derive meaning. The World Wide Web Consortium (W3C) has been developing RDF as a key component of its vision for a Semantic Web, however RDF's capabilities fit well in many different computing contexts. RDF offers a different, and in some ways more powerful, framework for data representation than XML or relational databases, while remaining far more generic than object structures.
RDF's foundations are built on a very simple model, but the basic logic can support large-scale information management and processing in a variety of different contexts. The assertions in different RDF files can be combined, providing far more information together than they contain separately. RDF supports flexible and powerful query structures, and developers have created a wide variety of tools for working with RDF. Though not as well known as other specifications from the W3C, RDF is actually one of the older specifications, with the first working draft produced in 1997. The earliest editors, Ora Lassila and Ralph Swick, established the foundation on which RDF rested-a mechanism for working with metadata that promotes the interchange of data between automated processes. Regardless of the transformations RDF has undergone and its continuing maturing process, this statement forms its immutable purpose and focal point.
In 1999, the first recommended RDF specification, the RDF Model and Syntax Specification(RDF M&S), again coauthored by Ora Lassila and Ralph Swick, was released. A candidate recommendation for the RDF Schema Specification, coedited by Dan Brickley and R.V. Guha, followed in 2000. In order to open up a previously closed specification process, the W3C also created the RDF Interest Group, providing a view into the RDF specification process for interested people who were not a part of the RDF Core Working Group.
As efforts proceeded on the RDF specification, discussions continued about the concepts behind the Semantic Web. At the time, the main difference between the existing Web and the newer, smarter Web is that rather than a large amount of disorganized and not easily accessible data, something such as RDF would allow organisation of data into knowledge statements-assertions about resources accessible on the Web. From a Scientific American article published May 2001, Tim Berners-Lee wrote..
'The Semantic Web will bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users. Such an agent coming to a clinic's Web page will know not just that the page has keywords such as 'treatment, medicine, physical, therapy' (as might be encoded today) but also that Dr. Hartman works at this clinic on Mondays, Wednesdays and Fridays and that the script takes a date range in yyyy-mm-dd format and returns appointment times.'
As complex as the Semantic Web sounds, this statement of Berners-Lee provides the key to understanding the Web of the future. With the Semantic Web, not only can we find data about a subject, we can also infer additional material not available through straight keyword search. For instance, RDF gives us the ability to discover that there is an article about the Giant Squid at one of my web sites, and that the article was written on a certain date by a certain person, that it is associated with three other articles in a series, and that the general theme associated with the article is the Giant Squid's earliest roots in mythology.
Additional material that can be derived is that the article is still 'relevant' (meaning that the data contained in the article hasn't become dated) and still active (still accessible from the Web). All of this information is easily produced and consumed through the benefits of RDF without having to rely on any extraordinary computational power.
However, for all of its possibilities, it wasn't long after the release of the RDF specifications that concerns arose about ambiguity with certain constructs within the document. For instance, there was considerable discussion in the RDF Internet Group about containers. Are separate semantic and syntactic constructs really needed?--as well as other elements within RDF/XML. To meet this growing number of concerns, an RDF Issue Tracking document was started in 2000 to monitor issues with RDF. This was followed in 2001 with the creation of a new RDF Core Working Group, chartered to complete the RDF Schema (RDFS) recommendation as well as address the issues with the first specifications. The RDF Core Working Group's scope has grown a bit since its beginnings. According to the Working Group's charter, they must now
* Update and maintain the RDF Issue Tracking document
* Publish a set of machine-processable test cases corresponding to technical issues addressed by the WG
* Update the errata and status pages for the RDF specifications
* Update the RDF Model and Syntax Specification (as one, two, or more documents) clarifying the model and fixing issues with the syntax
* Complete work on the RDF Schema 1.0 Specification
* Provide an account of the relationship between RDF and the XML family of technologies
* Maintain backward compatibility with existing implementations of RDF/XML The WG was originally scheduled to close down early in 2002, but, as with all larger projects, the work slid until later in 2002.
WG issued the W3C Last Call drafts for all six of the RDF specification documents, early in 2003.
As stated earlier, the RDF specification was originally released as one document, the RDF Model and Syntax, or RDF M&S. However, it soon became apparent that this document was attempting to cover too much material in one document, and leaving too much confusion and too many questions in its wake. Thus, a new effort was started to address the issues about the original specification and, hopefully, eliminate the confusion. This work resulted in an updated specification and the release of six new documents: RDF Concepts and Abstract Syntax, RDF Semantics 'RDF/XML Syntax Specification (revised), RDF Vocabulary Description Language 1.0 RDF Schema, the RDF Primer, and the RDF Test Cases.
The RDF Concepts and Abstract Syntax and the RDF Semantics documents provide the fundamental framework behind RDF: the underlying assumptions and structures that makes RDF unique from other metadata models (such as the relational data model). These documents provide both validity and consistency to RDF-a way of verifying that data structured in a certain way will always be compatible with other data using the same structures. The RDF model exists independently of any representation of RDF, including RDF/XML.
The RDF/XML syntax, described in the RDF/XML Syntax Specification (revised), is the recommended serialization technique for RDF. Though several tools and APIs can also work with N-Triples (See RDF Triple-Resource Description Framework) or N3 notation, most implementation of and discussion about RDF, focus on RDF/XML
The RDF Vocabulary Description Language defines and constrains an RDF/XML vocabulary. It isn't a replacement for XML Schema or the use of DTDS; rather, it's used to define specific RDF vocabularies; to specify how the elements of the vocabulary relate to each other. An RDF Schema isn't required for valid RDF (neither is a W3C XML Schema or an XML 1.0 Document Type Definition-DTD), but it does help prevent confusion when people want to share a vocabulary. A good additional resource to learn more about RDF and RDF/XML is the RDF Primer. In addition to examples and accessible descriptions of the concepts of RDF and RDFS, the primer also, looks at some uses of RDF. The final RDF specification document, RDF Test Cases, contains a list of issues arising from the original RDF specification release, their resolutions, and the test cases devised for use by RDF implementers to test their implementations against these resolved issues. The primary purpose of the RDF Test Cases is to provide examples for testing specific RDF issues as the Working Group resolved them. Unless you're writing an RDF/XML parser or something similar, you probably won't need to spend much time with that document
When to Use and Not Use RDF
RDF is not a replacement for other technologies, and its use is not appropriate in all circumstances. Became data is on the Web, or accessed via the Web,it doesn't mean it has to be organized with RDF. Forcing RDF into uses that don't realize its potential will only result in a general reaction against RDF in its entirety-including in uses in which it positively shines. When should we, and when should we not, use RDF? More specifically, since much of RDF focuses on its serialization to RDF/XML, when should we use RDF/XML and when should we use non-RDF XML?
A company called Semaview have published a graphic depicting the differences between XML and RDF/XML (found at http://www.semaview.comlc/RDFvsXML.html). Among those listed was one about the tree-structured nature of XML, as compared to RDF's much flatter triple-based pattern. XML is hierarchical, which means that all related elements must be nested within the elements they have been related to. pattern. RDF does not require this nested structure.
RDF and XML
However, this difference in structure can make it more difficult for people to read the RDF/XML document and actually see the relationships between the data, one of the more common complaints about RDF/XML.
When processing XML, an element isn't actually complete until you reach its end tag. If an application is parsing an XML document into elements in memory before transferring them into another persisted form of data, this means that the elements that contain other elements must be retained in memory until their internal data members are processed. This can result in some fairly significant strain on memory use, particularly with larger XML documents.
RDF/XML, on the other hand, would allow you to process the first element quickly because its "contained' data is actually stored in another element somewhere else in the document. As long as the relationship between the two elements can be established through the URI, we'll always be able to reconstruct the original data regardless of how it's been transformed.
Another advantage to the RDF/XML approach is when querying the data. Again, in XML, if you're looking for a specific piece of data, you basically have to provide the entire structure of all the elements preceding the piece of data in order to ensure you have the proper value. In RDF/XML, all you have to do is remember the triple nature of the specification, and look for a triple with a pattern matching a specific resource URI, such as a property URI, and you'll find the specific value.
If you've worked with database systems before, you'll recognize that many of the differences between RDF/XML and XML are similar to the differences between relational and hierarchical databases.
Hierarchical databases also have a physical location dependency that requires related data to be bilocated, while relational databases depend on the use of identifiers to relate data.
Another reason to use RDF/XML over non-RDF XML is the ability to join data from two disparate vocabularies easily, without having to negotiate structural differences between the two. Since the XML from both data sets is based on the same model (PDF) and since both make use of namespaces (which prevent element name collision-the same element name appearing in both vocabularies), combining data from both vocabularies can occur immediately, and with no preliminary work. This is essential for the Semantic Web, the basis for the work on RDF and RDF/XML. However, this is also essential in any business that may need to combine data from two different companies, such as a supplier of raw goods and a manufacturer that uses these raw goods.
As excellent as these two reasons (less strain on memory and joining vocabularies) are for utilizing RDF as a model for data and RDF/XML as a format, for certain instances of data stored on the Web, RDF is clearly not a replacement As an example, RDF is not a replacement for XHTML for defining web pages that are displayed in a browser. RDF is also not a replacement for CSS, which is used to control how that data is displayed. Both CSS and XHTML are optimized for their particular uses, organizing and displaying data in a web browser. RDF's purpose differs-it's used to capture specific statements about a resource, statements that help form a more complete picture of the resource. RDF isn't concerned about either page organization or display.
Now, there might be pieces of information in the XHTML and the CSS that could be reconstructed into statements about a resource, but there's nothing in either technology that specifically says "this is a statement, an assertion if you will, about this resource' in such a way that a machine can easily pick this information out. That's where RDF enters the picture. It lays all assertions out in such a manner that even the most amoeba-like RDF parser can find each individual statement without having to pick around among the presentational and organizational constructs of specifications such as XHTML and CSS.
Additionally, RDF/XML isn't necessarily well suited as a replacement for other uses of XML, such as within SOAP or XML-RPC. The main reason is, again, the level of complexity that RDF/XML adds to the process. A SOAP processor is basically sending a request for a service across the Internet and then processing the results of that request when it's answered. There's a mechanism that supports this process but the basic structure of SOAP is request service, 'get answer process answer'. In the case of SOAP, the request and the answer are formatted in XML.
Though a SOAP service call and results are typically formatted in XML, there really isn't the need to perpetuate these outside of this particular invocation, so there really is little drive to format the XML in such a way that it can be combined with other vocabularies at a later time, something that RDF/XML facilitates. Additionally, one hopes that we keep the SOAP request and return as small, lightweight, and uncomplicated answers as possible, and RDF/XML does add to the overhead of the XML.
Ultimately, the decision about using RDF/XML in place of XML is based on whether there's a good reason to do so-that is a business need, rather than a technical need, to use the model and related XML structure. If the data isn't processes automatically, if it isn't retained and combined with data room other vocabularies, and if you don't need RDF's optimized querying capability, you should use non-RDF XML. However, if you do need these things, consider the use of RDF/XML. Why Is RDF Not in the Mainstream? The Resource Description Framework has been a W3C Recommendation (synonymous with standard) since February 22, 1999, only slightly more than a year after the XML 1.0 Recommendation. Many people have never heard of RDF. Outside of the digital library and artificial intelligence communities, RDF has not achieved mindshare with developers or corporate management. Why has RDF adoption been so weak? There are multiple reasons:
RDF doesn't yet mix well with XML documents:
At the time of writing you cannot validate RDF embedded in other XML or XHTML documents because of RDF's open grammar. In other words, RDF allows you to mix in any namespace-qualified elements you want. Additionally, there is a fairly esoteric issue regarding a difference between how XML Schema and RDF process namespaces. This has led many people to view RDF and XML documents as two separate paths for meta data. Therefore, the businesses traveling along the XML/XHTML path assume that their direction is incompatible with the RDF path. This is not true. The fact that RDF is serialized as XML means that both XML Schema and RDF share a common syntax. In the debates on this subject, it is clear that the intent of the W3C RDF Working Group is to resolve these differences in order that RDF can be successfully embedded in XHTML and XML documents. Additionally, several tools mix RDF and HTML, so it is clear that bridges are being built to resolve this issue. Lastly, several solutions to this issue were proposed by Sean Palmer in the document "RDF in HTML: Approaches," available at http://infomesh.net/2002/rdfinhtml/.
Parts of RDF are complex
Several factors combined make RDF significantly more complex than XML-documents. The three chief culprits in this equation are mixing metaphors, the serialization syntax, and reification significantly more complex than XML-documents.
First, the Model mixes metaphors by using terms from different data representation communities to include linguistic, object-oriented, and relational data. This type of flexibility is a double-edged sword: Good because it unifies modeling concepts from different domains, yet bad in that it causes confusion. The main influences have come from the Web standardization community itself in the form of HTML meta data and PICS, the library community, the structured document community in the form of SGML and more importantly XML, and also the knowledge representation (KR) community." One other potential problem with such metaphor unification is frustration by inexact or poor mapping to the original concepts specified by each community.
Second, RDF syntax the RDF graph to be serialized via attributes or elements. In other words, you can express one RDF model in two different ways. This can be yet another problem for validation due to too much flexibility.
Third the hierarchical RDF/XML syntax is difficult to author by hand and is better left to tools. In general, it is confusing to represent lists of statements as a hierarchical tree. The current method used in the RDF/XML syntax makes differentiating between objects and properties very difficult. Lastly, reification has not yet proven itself and adds another level of abstraction to the RDF model. For XML developers first trying to move from the simple name/value pairs of XML to the triple, statements about statements are too complex. While reification matches natural language, it is a foreign concept to all of the other data communities. In other words, if you deem a triple to be a fact, you don't want assertions saying it is not a fact. Most applications treat data as facts and implement data integrity procedures to ensure that axiom holds true. With reification, nothing is bedrock; everything is just an assertion, and you must follow a potentially infinite chain of assertions about assertions where one may contradict another at any time. Several RDF implementations and knowledge bases disallow the use of reification. Reification is a feature of RDF that is not for every application and can be safely avoided.
Sources: This article is intended to highlight the relationship of two significant developments that whilst interlinked can each in certain conditions stand alone in its own right. Based on the following recently published books only the most relevant issues have been mentioned. Readers should consult these publications to fully appreciate the significance of developments.
1.0 Practical RDF. Shelley Powers, O'Reilly. ISBN 0-596-00263-7 331 Pages. www.o'reilly.com
2.0 The Semantic Web. M.C. Daconta, Leo J. Obrst, Wiley. Kevin T. Smith. ISBN 0-471-43257-1 331 Pages. www.wiley.com
3.0 The Semantic Web. An Introduction. Sean B. Palmer http://informesh.net/2001/swintro/15 Pages. This paper is replete with associated links and therefore effectively of greater pagination.
<?xml version="1.0"?> <resource> <uri>http://burningbird.net/articles/monsters3.htm</uri> <history> <movement> <link>http://www.yasd.com/dynaearth/monsters3.htm</link> <reason>New Article</reason> </movement> </history> </resource>
In RDF/XML, you can associate two separate XML structures with each other through a Uniform Resource Identifier. With the URI, you can link one XML structure to another without having to embed the second structure to the first:
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:pstcn="http://burningbird.net/postcon/elements/1.0/" xml:base="http://burningbird.net/articles/"> <pstcn:Resource rdf:about="monsters3.htm"> <!-resource movements-> <pstcn:history> <rdf:Seq> <rdf:_3 rdf:resource=http://www.yasd.com/dynaearth/ monsters3.htm /> </rdf:Seq> </pstcn:history> </pstcn:Resource> <pstcn:Movement rdf:about="http://www.yasd.com/dynaearth/monsters3.htm"> <pstcn:movementType>Add</pstcn:movementType> <pstcn:reason>New Article</pstcn:reason> </pstcn:Movement> </rdf:RDF>