Printer Friendly

Embedded XML for device management. (Soft Copy: Embedded Systems).

The emergence of intelligent network appliances--devices with built-in intelligence and connectivity--represents a decided shift in computing to a distributed model. Much like the evolution of mainframe to mini to PC to handheld computing, commercial networks are in the midst of a transformation to smaller, distributed devices. Instead of a single, centralized computer performing all tasks within an organization, contemporary commercial networks consist of numerous embedded devices and computers working together, with localized intelligence. Some examples include networked automation equipment on a factory floor; embedded computers inside vehicles; or interconnected point of sale (POS) terminals, inventory control systems and price labeling systems in a store. While devices may have different manufacturers, in order to perform their individual tasks in synch they need to communicate according to common protocols.

Getting different devices to communicate is not simply a matter of putting them all on the same network. In order to exchange information, they have to understand each other's data formats. That means manufacturers have to agree on a common data format, specify it in an unambiguous way, then implement the communications firmware according to those specifications. Extensible Markup Language (XML) has emerged as the de facto standard for accomplishing this level of interoperability

A Common Language

XML is a tool for specifying the format of data records. It is based on Standard Generalized Markup Language (SGML), but is much simpler. Developers specify the format of data records by creating an XML schema--data files describing XML record formats. XML parsers read schema files and use their information to encode and decode data records. Text-based XML is flexible enough to encode almost any type of data, except for large amounts of binary data, or binary data streams, which are rarely encountered in the world of intelligent network devices.

Companies using XML share their data formats by publishing the schemas. Since the schemas are read by software, errors in interpretation caused when different development teams read a paper specification are eliminated.

uPnP and Away

Besides being used as a language for data interpretation, XML is critical in two additional technologies that are likely to be used on future networks. The first, Microsoft's Universal Plug-n-Play (uPnP), is akin to the "plug and play" technology we are familiar with for devices such as keyboards attached to PCs. Microsoft extended this useful concept to providing the framework for computers and devices to find and identify themselves without human intervention (Plug and Play also removes the complexity of obtaining an IP address).

The second technology, Simple Object Access Protocol (SOAP), allows a device to "advertise" its services, providing a framework different devices and computers to work together.

Devices that implement the full set of Microsoft's uPnP protocol need to support XML. uPnP specifies how a device on a network should get an IP address, offer network services, and access network services. uPnP uses SOAP to encode Remote Procedure Call (RPC) requests and responses between clients and servers. SOAP is an XML schema that defines a general-purpose message format for transferring data and is used for encoding RPC requests and replies. Originally developed by a coalition of companies including Microsoft, SOAP does not depend upon Microsoft technology and is operating system and machine-independent.


There are two types of XML schemas: Document Type Definition (DTD) files and XML Schema files. DTD files are the original way of specifying XML schemas. However, they have some severe limitations. DTD files only support character data types and have no mechanism for supporting user-defined data types. This means schemas defined with DTD files express numeric data as strings, preventing XML parsers from verifying that the correct type of data is present in data records, forcing the application to perform the last step in parsing the data.

Another problem is that many applications need to use data records defined in separate DTDs. DTDs written by different authors may use the same name for different data records. In this case, the old DTD mechanism has no way of resolving the collision other than for a developer to manually edit one of the DTD files and change the symbol's name and all references to it. Finally, DTD files are not written in XML. This means an XML parser that supports DTD has to have a separate parser, and developers have to learn two types of syntax.

The World Wide Web Consortium (W3C) has addressed these limitations. In 1999, it published a recommendation that extended DTD files to support namespaces. Namespaces provide a way of resolving collisions. This allows applications to use DTD files developed by different authors (or vendors) without modifying them. In 2001, the W3C published a new recommendation that replaces DTD files with a new schema format named XML Schema. XML Schema adds support for many new data types (including numeric data types), as well as the ability for developers to create their own data types. XML Schema files are written in XML, thereby allowing XML parser developers to reuse the same parsing code to process schema definitions.

However, XML Schema is much more complex than DTD due to the large number of new data types and structures added. Many of these new features will be used infrequently, especially in embedded applications. The added complexity can make the new schemas harder to understand and increase the memory footprint for the parser. Despite these issues, XML Schema offers many new features that are very desirable for many applications.

In the embedded environment, it is likely that a subset of XML Schema that leaves out rarely-used features will be adopted, giving embedded developers useful features without making XML too complex or too large.


When receiving data records, an XML parser processes both schemas and data records. It uses the schemas to decode data records, then extracts the information contained within the data record and passes it on to the application. When generating data records, an XML parser uses the rules within the schemas to encode information.

In addition to simply decoding and encoding data records, XML parsers can also validate data records. A valid data record is one that conforms to the schema definition. Using XML Schema, it is possible to define exactly what data is appropriate for each of the different fields in a data record. Suppose, for example, the speed setting for an electric motor is transmitted in an XML data record, and for this application, the only allowed values are between 100 RPM and 500 RPM. When the parser processes these records, it will verify that the value in the speed field is correct before passing it onto the application. Doing this sort of range checking before passing on the data can simplify the rest of the application.

There are two standard Application Programming Interfaces (APIs) for XML parsers. The Simple API for XML (SAX) is event-driven. The SAX API defines a set of functions that must be implemented in the application. The parser calls these functions when certain events occur as it parses data records. These events include encountering the beginning and end of data records and fields within data records, and encountering data con tained within the fields. The other standard API is the Document Object Model (DOM). When the DOM API is used, the parser processes the entire data record and builds DOM tree (a linked list data structure) to represent it. The DOM tree is passed to the application in its entirety after the parser has finished processing the data record. To generate a data record, the application uses the DOM API to construct a DOM tree and passes it to the parser, which encodes it.

Porting a Standard Parser

The most obvious way of implementing support for XML in an embedded system is to port an existing parser. There are many XML parsers available on the Internet. Many of these parsers are very good and are kept up to date with the latest developments in XML. Free source code is available in C/C++ and Java. Porting one of them to an embedded application is a fast way of implementing full-featured support for XML.

However, since these parsers are written for workstations, not embedded systems, they are not written to conserve memory. As an example, version 2.2.0 of the libXML parser supports DTDs and the SAX API, but not validation. The footprint for this port was approximately 250 K of ROM and RAM. NetSilicon estimates that an XML Schema-based parser with DOM and SAX API support, and validation would require about 500 K of ROM and RAM. Additional memory would be required to actually parse data.

Another issue is that the APIs supported by the standard parsers are not desirable for most embedded systems. When the DOM API is used to parse incoming data, it builds a DOM tree representation of the entire data record. The size of the DOM tree is dependent upon the size of the data record received. XML allows data records to be defined with fields that can be repeated any number of times. This means that arbitrarily large data records complying with the schema can be generated. Since the parser will build a DOM tree to represent the entire record if the DOM API is used, the amount of memory potentially used by the DOM API is unbounded.

In theory, this is not an issue since the system should be designed so that the sender never sends a record that the receiver is unable to handle. In actual practice, this is unacceptable because different companies might manufacture the sender and receiver. The only assumption that the manufacturer of the receiving device can make is that the sender will send data records that comply with the schema. Therefore, the parser on the receiving side needs to discard data records it cannot process, for whatever reason, even if they comply with the schema, and the DOM API does not support this. To allow a process in an embedded system to run until malloc fails can cause the entire system to crash.

The SAX API calls application-supplied functions as it encounters the beginning and end of each field in a data record and to pass it data contained in each field. To use this API, the application must be written to keep track of where the parser is in the parsing process so it knows which field is being processed. This forces the application code to become intimately involved in the parsing process, having a significant impact on its design.

Neither the DOM nor the SAX APIs pass the data to the application in a form that is useful. Most developers using XML as a means of encoding data records will not want to use either of these APIs. For most embedded applications, the parser should pass the parsed data to the application as a C structure, or a Java object.

Implementing an Embedded Parser

Most embedded applications need a small parser with an API that delivers parsed data in a form that is easy for the application to use. The standard XML parsers developed for desktop workstations do not fit this need well. Special purpose embedded parsers need to be developed that minimize memory usage while providing an API useful for embedded applications.

As mentioned, one way to save memory is to only implement a subset of XML Schema. A useful subset to implement is support for XML messages that can be represented in C structures. This approach has the advantage of including all of the parts of XML Schema most likely to be useful for embedded systems and keeps parser size small. Limiting the schemas in this way also places limits on maximum message size and keeps schemas simple, but still powerful enough to support the types of schemas used in embedded applications.

Another way to save code space is to drop support for dynamic schemas. A standard parser processes both the XML data and its schemas at runtime. This gives the parser the ability to process any type of XML data as long as its schemas are available. This allows the parser and its application to process data records in formats that were developed after the application was written. However, most other applications have to be written so the application code knows what the data inside a record represents and what to do about it. For these types of applications, the ability to parse new types of data records containing data and fields that the application doesn't understand is of little value.

Dropping support for dynamic schemas allows the parser to be broken up into two pieces: a schema compiler and a data record parser and generator. The schema compiler is a tool used by developers on their workstations, which parses the schemas into a set of data structures and C structure definitions to be incorporated into the parser. The parser is used at runtime as part of the application firmware and uses the data structures generated by the schema compiler, which represent the supported schemas, to parse (or generate) XML data records. Since a tool on a workstation now does the schema parsing, this functionality no longer has to be supported in firmware, and the firmware memory requirements are reduced.

Also, some applications do not need validation. In theory, this simplifies the process of acting upon parsed data since it does not have to perform these checks itself However, this is only true if the XML parser is the only thing that can pass these values to the application. If other modules can pass data to the application (SNMP, a Telnet server, or control panel, for example), then the application still has to verify that the data is in range. In this case, the validation feature of the parser is redundant and can be dropped to save memory.

Unless a desktop application that already uses these APIs is being ported to the embedded system, the DOM and SAX APIs may also be dropped. They may be replaced with a simple API that passes data records between the application code and the parser in the form of C structures (or Java objects). A simple C structure is usually the easiest format to pass data to an application, and since the schemas for the XML records will be fixed at the time the application is written, the structures can be defined when the application is written.

NetSilicon used the above approach to develop a prototype schema compiler and parser. This parser implemented support for a subset of XML Schema that supported schemas that only used integer and string data types, including a version of the SOAP schema, and name spaces. The prototype required only 20 K of ROM and RAM space. NetSilicon believes a viable, full-featured version of this parser can be developed with a 30 K foot print.

XML Pros and Cons

XML's biggest advantage is that it provides developers with a tool that concisely and unambignously defines the format of data records. If a device manufacturer decides its products should communicate with other devices, they can publish their schemas. Since schemas are interpreted by software, human errors in interpretation are eliminated.

Another advantage to XML is that it is a required building block for SOAP and uPnP. Any device using uPnP to access network services, or provide network services, must support XML. XML is also used in many B2B applications and is therefore desirable for embedded devices (such as POS terminals and inventory control scanners) that need to communicate with B2B systems. There are also a large number of free tools available for XML including many full-featured parsers.

XML's biggest disadvantage is that XML parsers tend to be large, requiring up to 500 K of memory. Secondly, since XML is text-based, XML data records are inefficient compared with binary records, requiring more communications bandwidth than binary records, and requiring more CPU time to process. The larger size of XML data records may be an issue for some applications. This can be dealt with by compressing the data before transmitting it or writing it to disk.


XML and its associated technologies, uPnP and SOAP, form the basis of intercommunications, interoperability, and thus, the control of computers and devices designed by differing companies. The Tower of Babel need not be reproduced as embedded devices move into mainline business applications. Its main flaws being size and complexity in parser development and understanding can be eliminated with careful specification and design or by partnering with a vendor that understands the technology and the needs of the embedded environment.

Write in Number or Reply Online

I found this article:

Very Useful  Useful  Not Useful

5177          5178      5179

Charles Gordon is Principal Engineer for NetSilicon, Inc., which is located at 411 Waverley Oaks Rd., Bldg. 227, Waltham, MA 02452; (78) 647-1234; Fax: (781) 893-1338;;

Write in 5176 or
COPYRIGHT 2002 Advantage Business Media
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2002 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Gordon, Charles
Publication:ECN-Electronic Component News
Date:May 15, 2002
Previous Article:Safety critical embedded systems. (Soft Copy: Embedded Systems).
Next Article:Microprocessor controlled, multifunction time delay relay. (Electromechanical/Mechanical Devices).

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters