Extensible Markup Language: A New Technology Tool for the Public Sector.
Although technology changes are fast paced, the fundamental tasks of government rarely change. The dilemma for public managers is how much time they should devote to learning about and implementing the new technologies as opposed to addressing the basic problems of government with one of the existing technologies. The goal of this article is to examine the technologies that governments use to share information on the Internet and to propose the use of a relatively new language, Extensible Markup Language, or XML.
The article begins with a discussion of the strengths and limitations of current Internet technologies. It then examines the benefits of and the barriers to implementing XML and discusses its relevance to seven different governmental applications: purchasing, tax reporting, financial reporting, budgeting, human resources, grant management, and performance measurement. The article concludes that while XML is a viable technology for some of these applications, it may never be viable for others. Consequently, depending upon the area of concentration, some public managers should spend the time to learn about XML while others probably should continue to use existing technologies.
A Brief History of the Internet
The Internet was developed in the early 1960s by the Department of Defense's Advanced Research Project Agency. The purpose was to network supercomputers among researchers located throughout the United States. In the late 1960s, four universities were allowed access to the Internet. The Internet was available only to education and government (primarily defense) until 1991 when the National Science Foundation first allowed commercial entities to use it. Since then, the number of Internet hosts has grown to more than 25 million. With the proliferation of the Internet, however, it has become increasingly difficult to find certain pieces of information.
Prior to the development of XML in 1998, most general information on the Internet was presented in Web pages using either HyperText Markup Language (HTML) or the Portable Document Format (PDF). Exhibit 1 provides a brief comparison of HTML, PDF, and the newer XML. Both HTML and PDF are relatively easy to use for both Internet publishers and users. This ease of use has been a major factor in the proliferation of the number of Web sites.
HTML and XML are "open standards" that are maintained by the World Wide Web Consortium or W3C (www.w3.org), meaning they are free and available to all Internet publishers and users. PDF is a proprietary product of the Adobe Corporation, but the Adobe Acrobat Reader is free, so any Web user can access PDF data after downloading the Reader. Publishers must purchase software from Adobe in order to create PDF documents.
When making Web pages, HTML provides a layout that includes text, images, and push buttons. PDF is merely a scanned version of text or pictures, while XML creates catalogs of information. The advantage of XML is that it gives end users the flexibility to dynamically access and use data because the catalog specifies the location and description of the various pieces of data used in Web documents.
In comparison, HTML and PDF statically deliver information to end users, much like a fax machine. Once the information is received or accessed, it is very difficult to manipulate. Only slow searches are available in HTML, while PDF cannot be searched until the user locates and opens the PDF file. Unless the user knows the exact location or name of the PDF file, the data within PDF files are virtually hidden.
Cataloging information in XML can be accomplished in multiple languages. HTML, however, is language-specific, making translations between languages very difficult. Since PDF is an image, it does not discriminate between languages. In terms of moving around the Internet, HTML allows hot links from the host Web site to other Web sites and pages. Similarly, XML provides X-links to other Web databases. X-links, however, are more efficient in linking to the exact data desired and reduce missed Web pages. Users can link out from a PDF document, but cannot link into a PDF document from other Web sites.
XML allows dynamic, on-the-spot data analysis. Data from XML Web sites can be seamlessly downloaded onto application programs for spreadsheet or statistical analysis. HTML and PDF also can be used for data analysis, but not without cutting and pasting or retyping, which can require significant time and effort, especially as the number of sources increases.
In summary, XML provides faster search and data analysis capabilities than either HTML or PDF, which can potentially reduce the workload of Web users. The next section explains in some detail how XML works. The purpose is to introduce those concepts that are important for public managers to understand in order to conceptualize how XML might improve the applications for which they are responsible.
Two unique features of XML enable the cataloging of data described in the previous section: "tags" and "taxonomies." A tag is a defining label attached to data presented on the Internet. The taxonomy lists all of the tags for a specific application and the exact rules for how the tagged data will be presented. The technical term for a taxonomy is a Document Type Definition, or DTD. The more popular term "taxonomy" will be used for this presentation.
To illustrate tags and taxonomies for text, assume the characters "Ken" and "Smith" appeared on a Web page. In XML, tags such as "FirstName" and "LastName" would probably be attached to "Ken" and "Smith" respectively. The taxonomy might require that the first letter of these elements be capitalized while the remaining letters are lowercase. Tags also can be more restrictive such as "AuthorFirstName."
To illustrate tags and taxonomies for numbers, assume the characters "$120,000.00" and "10000" appeared on a Web page. The tags might be "AnnualSalary" and "MonthlySalary" respectively. Notice that the format differs in at least three ways: use of dollar sign, use of comma, and number of decimal places. The taxonomy would specify how the information was to be presented on at least these three dimensions so that there would be no confusion among users. The first tag could also be "AnnualSalaryTimes3," "DesiredSalary," or "Parent'sSalary."
How do the relatively simple concepts of tags and taxonomies facilitate a radical change in the Internet that is not possible with HTML and PDF technologies? Recall that two of the major problems with the Internet relate to speed and search. One of the reasons for the search difficulty is that most of the data is not organized, cataloged, or defined (i.e., tagged) in any manner. As a result, it is difficult to find all relevant occurrences of "Ken Smith" in HTML or PDF Web sites. The task would be much faster and more accurate in XML if the search were for "AuthorFirstName" equals "Ken" and "AuthorLastName" equals "Smith."
Even for those pieces of data that can be located via a traditional search engine, the presentation of the data is rarely in a consistent format across Web pages. If a user were looking to purchase basic white copy paper, for example, pricing could be by the sheet, by the ream, or by the box. As such, the user would have to spend significant time interpreting this data. XML, however, makes it possible to search for all vendors with "product" equals "20lbWhitePaper." The search would return all product vendors, and the user could then inquire as to the "PricePerSheet" or whatever tag was established in the taxonomy.
Another significant benefit of XML is that users can perform data analysis on their own computers. In the white copy paper search above, the user would obtain all of the product vendors along with all of the other fields in the "taxonomy" (e.g., delivery date, credit terms, address, phone number, contact person, quantity discounts, quantity available, etc). Once data from all vendors is acquired, the user can analyze the data using his or her own constraints, whether they are price, delivery, or even local preferences. Also, the analysis would be done on the user's computer rather than having to access a distant server for every inquiry or sort. This saves a significant amount of time for both the user and the publisher's server.
Barriers and Costs to Using XML
Like any other new technology, there are barriers and costs associated with XML. Exhibit 2 summarizes these barriers and provides an estimate of the range of costs for each barrier in terms of both money and time. The minimum costs for publishers and users can be extremely low, while the maximum costs can be quite high, especially for publishers. Depending on their activities, governments can be both publishers and users. Both of these perspectives are discussed below.
Since most governments already have Web sites and computer equipment, they should not have to make significant investments in new hardware unless the number of transactions increases substantially enough to require additional servers. Additional servers also may be needed if a particular government wants to coordinate the XML databases of a consortium of other governments. As users, there will be very little need for governments to upgrade or add hardware since XML uses existing hardware technology. If employees can access the Web, they have the hardware for XML applications.
Both publishers and users may need to purchase user-friendly software rather than using the open standards. Just as most Web pages today are created with user-friendly software packages rather than raw HTML, it is expected that vendors will create easy-to-use software for both publishing and viewing XML data. Regardless of which software is used, both publishers and users will need to learn how to use it in order to reap the full benefits of XML.
Developing the taxonomy is a very simple process in some applications and nearly impossible in others. The key to developing a successful taxonomy is to obtain consensus among the publishers and users. While users tend to want as much flexibility as possible in the taxonomy, publishers usually want the taxonomy to be as similar to their current databases as possible. Because there are often several groups of stakeholders, it is critical to manage the political issues that inevitably arise during taxonomy development (unless one party has the power to simply dictate the taxonomy). Successful taxonomy development is probably the most important step in moving an application to XML.
Tagging data is a concern only for publishers, but it may be so prohibitive as to keep an application from being implemented in the first place. For those applications where taxonomy development is relatively easy, it is likely that there are common definitions of data such that existing databases will be similar to the taxonomy or easily converted. For those applications where taxonomy development is difficult, the tagging will probably also be difficult since there is not significant agreement as to what data should be processed.
At first it may seem that data analysis is only a concern for users, just as tagging is only a concern for publishers. However, there are instances where the publishers of the data will need to verify the accuracy of the information they present (e.g., financial statements). In addition, publishers may need to monitor information presented by other publishers (e.g., vendors in a purchasing application). These requirements may inhibit some applications from being implemented; however, this extra effort is simply one of the new costs of the new economy. Since there are so many benefits to both suppliers and buyers, applications like purchasing will move to XML in spite of these extra costs to publishers. Also, some of these costs are offset by the cost and time savings detailed below.
This section discusses how XML might be applied in seven different governmental applications. The choice of these seven was mostly arbitrary, although the authors intentionally selected some applications that are strong candidates for XML (e.g., purchasing) and others that are not (e.g., financial reporting). Hundreds of governmental applications might be considered as to their likelihood of improvement with XML. Managers are encouraged to develop an estimate of the costs and benefits of the applications for which they are responsible through an approach similar to Exhibit 3.
As government managers estimate these costs and benefits they should consider XML not just from the publisher's point of view (sending XML data), but also from the user's point of view (receiving XML data). For example, a procurement manager might set up an XML Web site to advertise requests for proposals (RFP) to potential vendors. Alternatively, a purchasing manager may wish to search and download lists of vendors, products, and prices in order to make decisions about which vendor to use to purchase a particular product. Each case requires government managers to understand XML, but from different perspectives.
Purchasing is probably the perfect application for this new technology. The private sector already has established several "markets" in which buyers can access vendor data and order products online. Governments also have established and are participating in these markets. There are several reasons why this application is so suited to XML.
It is very easy to establish tags for both generic items (e.g., paper, staples, computers, desks) and specific items (e.g., Dell Dimension with 486 Intel processor or Xerox 3452 copier). It is also relatively easy for vendors and buyers to agree upon a taxonomy for how to share information. Most of these vendors and buyers already transact with each other and are accustomed to sharing the information in the taxonomy. Vendors (publishers) might save a great amount of time because every one of their buyers now uses the exact same terminology.
Neither vendors nor buyers should have to invest in a significant amount of hardware because they already have hardware to process a large number of transactions. There could even be a reduction in hardware needs as the common terminology reduces the need to communicate differently with every buyer. Both vendors and buyers will need to invest in software to expedite the transactions they desire to conduct. Based on the number of markets already in existence, it appears that there is a perceived return on this investment.
Finally, the tagging of the data does not appear to be a significant cost or barrier. Since most vendors already have databases of the quantity and description of the items for sale, it should be a relatively simple process to add the agreed upon tags to their existing data.
In 1998, a Seattle CPA named Charles Hoffman was intrigued by the possibilities of XML for reporting financial statements on the Internet. Hoffman recognized that it was difficult to find and process financial statement information on the Web using HTML or PDF, especially for analysts. Hoffman developed a taxonomy for financial statements called XBRL (Extensible Business Reporting Language). More than 100 major companies, including some of the largest accounting, investing, and technology firms, have joined the effort to develop XBRL (www.xbrl.org). In February 2001, Morgan Stanley Dean Witter became the first company to report financial statements to the SEC using XML.
Although a federal government taxonomy is in the draft stage, a taxonomy for state and local government accounting has not been developed to date. In spite of the strong interest in XBRL, this particular application of XML does not appear to be well suited to governments. Governmental accountants currently have an increased workload because of the demands of implementing the requirements of GASB Statement No. 34. It is unlikely that publishers have the time to also convert their financial statements to XBRL. GASB Statement No. 34 also complicates the development of a taxonomy. Smaller governments are not required to implement GASB Statement No. 34 for two more years, so not all publishers may be using the same reporting format.
Another obstacle to XBRL implementation in the public sector is the time and effort required to tag the data from existing accounting systems. In contrast to a vendor's product database, many governmental accounting databases are not completely centralized or standardized. Thus, it will take some time to convert or create a database of the accounting records that agrees with the taxonomy.
The need for additional hardware to support XBRL is unclear because there has not traditionally been a large interest in governmental accounting reports; however, if publishing financial statements in XML increases usage, governments may need to acquire new hardware to serve this increased traffic. An investment in software also may be required. Since there are not any appreciable benefits anticipated (in contrast to purchasing), it is unlikely that governments will seek to invest in software specifically for XBRL. However, if the government already has acquired user-friendly XML software for other applications, that software could be borrowed by the accounting function.
Finally, there is very little consensus as to the level of access users should have to either corporate or governmental accounting data. Although corporations certainly do not have to be completely open with their financial statements, governments face far greater scrutiny because of their accountability to taxpayers. This is in sharp contrast to the purchasing example in which suppliers understand that they must be completely open in order to effectively compete.
Given all of these concerns, the costs of XBRL for government clearly outweigh the direct benefits. As the costs of implementing XBRL decline, however, governments should reconsider the benefits of greater access to financial records to creditors, the public, and special interest groups.
Tax Reporting (Form 990)
Another application discussed by Charles Hoffman is the use of XML for reporting the information on Federal Tax Form 990. This form is used by nonprofit entities to report various pieces of information to the federal government, including the salaries of the highest-paid executives. According to Hoffman, the IRS currently retypes information from Form 990 into another database for more widespread distribution. He notes that a taxonomy could be created using the exact lines of the tax form. A simple XML form would not require any additional work on the part of tax accountants. However, transmission of these forms in XML would result in tremendous savings to the federal government. Since the data from all reporting entities would be in the same format, the data could be analyzed without any additional re-entry. Hoffman estimates that this relatively simple application could save as much as $5 million dollars.
This application certainly appears to have substantial benefits with limited costs. The IRS, of course, would have to develop the taxonomy and provide any necessary software to facilitate online reporting. The reporting entities would probably not incur any additional hardware or software costs, and the process of preparing the form would be significantly streamlined. They would not have any difficulty in tagging the data since it is already being reported according to IRS specifications. For the parties that are interested in analyzing this data, the use of XML allows them to get quicker, easier, and more consistent access to the desired data. The IRS and other agencies receiving standardized data stand to gain significant benefits by using XML.
Using XML, budget managers could receive proposed and actual budgets from subordinate agencies, consolidate the numbers into their budget, and send the consolidated budget to a superior agency electronically. This activity could occur from the lowest level of government to the highest level of government to create a standardized budgeting process. The elements of the taxonomy could be taken from the OMB standard budget form. Although this particular application is particularly suited to the federal budget process, XML also holds enormous potential for highly standardized state and local government budget processes. The use of XML for budgeting is similar to a concept proposed by XBRL.org to consolidate the financial statements of large conglomerates.
The costs or barriers to the use of XML for budgeting depend on the level of standardization of the budgeting process. If the process is highly standardized, XML could significantly improve the efficiency of data collection (via the Web), data analysis, and budget revisions. Since XML works just as well with text as it does with data (e.g., program descriptions), it facilitates the printing and presentation of budget documents in various user-defined formats. Using XML, managers could tailor budget documents to different users by simply selecting which data to include and which to exclude. One city council member, for example, may wish to receive only high-level summaries while another may desire line-item detail. Both could be easily accommodated using XML. XML is probably nor appropriate for governments that lack standardized budget processes. To benefit from the advantages of XML, these governments would first need to develop standard meanings and definitions.
Another application of XML that holds significant promise for governments is the collection and dissemination of data about jobs and job applicants (resumes). This is one application in which governments might conceivably assume the roles of both publisher (job openings) and user (resume bank). In either case, governments stand to benefit from this technology by reducing the turnaround time for filling openings and by improving their chances of selecting the most-qualified candidates. Although there may be some hardware, software, and tagging costs, the major obstacle to this application is the development of a workable taxonomy.
The government itself will probably be responsible for developing the taxonomy. The taxonomy should include all of the fields that a government will need to screen or evaluate prospective employees. The taxonomy may require different tags for different positions (e.g., a professional certificate field for an accountant). Once the taxonomy is established, the government can accept resumes via a Web page. This would require applicants to manually enter their data in the format chosen by the government. Once the resume bank is filled, the hiring department could quickly search for all candidates meeting certain criteria. The search could be quickly restricted or expanded depending on the number of candidates meeting the criteria. An automatic e-mail also could be sent to applicants that had been pre-screened as eligible.
State and local governments receive and distribute millions of dollars in grants each year. XML could be used to facilitate both grant applications and ongoing reporting requirements. These applications are similar to taxes in that any existing forms could easily be converted to XML and save or re-allocate the time spent in re-entry. Given the more timely and accurate reporting, resources could be shifted from data entry to data analysis. The time required for tagging is likely to be unimportant since governments already report the data to be tagged.
Government managers should view this application from the perspective of both the publisher and the user. Through XML, governments can provide real-time access to grant information and even facilitate compliance with reporting and other requirements. Likewise, governments could more effectively search the Web for information on potential sources of funding. XML has the potential to dramatically reduce the amount of paperwork involved in applying for and complying with grants.
The Internet presents a tremendous opportunity for governments to improve accountability by reporting performance measures. In addition to the benefits of using performance measures internally, proponents of performance reporting also recognize the value of using performance measures to benchmark to other entities. Benchmarking allows governments to assess their performance by comparing similar measures to similar jurisdictions. Effective benchmarking, however is difficult and requires the consideration of mitigating factors as diverse as the impact of population density on the crime rate or annual snowfall on snow removal costs. In spite of the potential usefulness of performance measures in XML, the costs and barriers are substantial.
One of the most significant obstacles to benchmarking with XML is the development of a taxonomy. Many performance measures are specific to a department or program of a particular entity. A primary criticism of performance measurement is that certain functions of government are not measurable at all. This kind of dissension greatly complicates the development of a workable taxonomy.
Like financial statements, there may be some governments that are not willing to share performance data with the public. And some governments that have engaged in performance measurement programs have found it necessary to upgrade their computer systems. It is possible that the use of XML in performance measurement will require additional hardware and software. Finally, given the disparity in performance measures between governments, tagging data is likely to be a costly endeavor.
At least for the time being, it does not appear that the benefits of using XML for the purpose of benchmarking are significant enough to justify the costs of developing large-scale taxonomies. However, several organizations already have undertaken extensive government benchmarking projects, which suggests that a taxonomy probably already exists. Time will tell whether or not the taxonomies already in development could become accepted by enough jurisdictions to allow for widespread benchmarking using XML technology.
XML is the latest Internet technology with the potential to improve government operations. Public managers should understand this new technology and carefully analyze the costs and benefits of moving applications to the Internet and using XML versus HTML, PDF, or some other technology. XML already is being used for purchasing applications and has enormous potential for human resource applications. However, the costs of some applications still exceed the benefits, making their implementation unlikely at least until the barriers are overcome or the benefits enhanced.
KEN SMITH, PH.D., CPA, is an assistant professor at Pepperdine University in Malibu, California. He completed his dissertation, "Performance Reporting in United States Cities," in 2001 from the University of Missouri. He previously served as an audit manager/government practice leader for a large international CPA firm. MOHAMMAD ABDOLMOHAMMADI, DBA, CPA, is the John E. Rhodes Professor of Accounting at Bentley College in Waltham, Massachusetts. He has a doctorate degree in business administration from Indiana University, and has published many articles in academic and professional accounting, behavioral sciences, and ethics journals. JON HARRIS, MPP, is currently in the Master of Science in Accountancy program at Bentley College. Prior to entering Bentley, he earned a Master of Public Policy degree from Harvard University's Kennedy School of Government. He also won honorable mention at the First Annual Academic XBRL Competition.
Exhibit 1 A COMPARISON OF XML WITH HTML AND PDF Criterion HTML Web Page Creates Web page layout Creation (text, image, push buttons) Delivery Sends Web pages as fancy fax documents that can be viewed on various computers (appearance may differ from computer to computer) Search Allows searches, but often Efficiency slow and inaccurate Culture and Language specific, making Language exchange between languages difficult Link to Other Provides hot Web links to Web Pages various Web pages, but can result in Error 404 (File Not Found) Data Very limited to where and Definitions how data appears; cannot (tags) define what data means Data Static processing--must Processing "cut and paste" from each by Users Web site and, if formats differ, also must standarize Publishing Relatively easy with Data templates in packages such as Microsoft FrontPage and Claris Home Page Criterion PDF Web Page Creates scanned version of Creation printed documents for Web pages Delivery Sends Web pages as fancy fax documents that can be viewed on various computers (appearance is exactly the same on all computers) Search Does not allow searches Efficiency unless already within PDF document Culture and Language is not an issue (a Language document in any language can be scanned) Link to Other No Web link into the PDF Web Pages file from any Web page, but links out to Web pages Data N/A. Because it is a picture Definitions there are no pieces of data (tags) other than the picture itself Data N/A. Because it is a picture Processing there are no pieces of data by Users other than the picture itself Publishing Use Acrobat PDF Writer to Data translate text documents or scan documents for attachments Criterion XML Web Page Creates Web-enabled Creation catalogs of information Delivery Specifies the location and description of individual data items that appear on the Web Search Allows high speed and Efficiency accurate searches Culture and The data entered in a Language generic form can be used by Web pages in any language by a simple translation program Link to Other Provides X-links to data Web Pages bases; no Error 404 (File Not Found) Data Almost unlimited ability to Definitions define the meaning of each (tags) piece of data Data Dynamic processing allows Processing users to download entire by Users data set of multiple publishers and analyze on desktop software packages Publishing May be easy to tag if current Data database is same as taxonomy, but will require more time and effort as existing data differs more from taxonomy Portions of this exhibit are adapted from Jon Bosak and Tim Bray, "XML and the Second-Generation Web," "Scientific American (May 1999): 89-93. Exhibit 2 BARRIERS AND ESTIMATED COSTS OF XML Publisher Cost Range Possible Barriers Cash Outlay Time and Effort Minimum Maximum Minimum Maximum Hardware None Moderate None Low Software Low Moderate Low High Taxonomy None Moderate Low High Data Tagging Low High Low High Data Analysis None Moderate None Moderate User Cost Range Possible Barriers Cash Outlay Time and Effort Minimum Maximum Minimum Maximum Hardware None Low None Low Software None Moderate Low High Taxonomy None Moderate Low High Data Tagging None None None None Data Analysis None Moderate Low High Exhibit 3 BARRIERS AND BENEFITS OF XML FOR SEVEN GOVERNMENTAL APPLICATIONS BARRIERS BENEFITS Application Publisher User Publisher User Purchasing Low Low High High Tax Reporting Low Low None Moderate Financial High Moderate Low Low Reporting Budgeting Low-High Low-High Low-High Low-High Human Low-Moderate Low-Moderate Moderate-High Moderate-High Resources Grant Low-High Low-High Low-High Low-High Management Performance Moderate-High Low-High Low Low-High Measurement Application Prediction Purchasing Already being used and growing. Tax Reporting Needs an innovative publisher or mandate by user. Financial Publishers will resist, Reporting users not likely to demand. Budgeting Yes, for those with standard processes. No, for others. Human Innovators will use; Resources more benefits to larger entities. Grant Yes, for those with Management standard reports. No, for others. Performance Smaller groups (i.e., Measurement North Carolina) will use, widespread use is doubtful until good taxonomies.