Should PDF Be Used for Archiving Electronic Records?

Preserving and archiving electronic records for extended periods of time requires attention to both technology and business issues. With the proliferation of software to produce electronic documents comes a growing need to store those documents in a standard electronic format.

Printing documents to paper for long-term archiving may be a convenient and reasonable solution when documents must be retained for 10 years or more. Printing documents avoids the need to address long-term technology and data storage issues. However, storing documents in paper format requires large amounts of space for storage and lengthy time for retrieving even thoroughly indexed documents. In addition, conversion to paper format negates many benefits of managing records electronically, such as the ability to search document content and to transmit documents over computer networks quickly.

Several issues must be addressed to ensure that records produced and retained electronically will be available and readable in the distant future. Hardware and software that can read the records must be available, or the records must be converted accurately and authentically to readable formats for display by newer computer technology. Data files must be sufficiently standardized to enable those other than the record's creator to view its content. Otherwise, the record's usefulness is compromised. Because most office automation products, such as word processors, spreadsheets, graphics, and database software, produce data files in formats that are proprietary to their vendors, there is an increasing need for these files to be stored in a common standard file format that can be easily created and viewed by the general population.

To be universally usable, a document format must be readable without regard to the specific software available on individuals' desktops. Users, then, must not be required to have a specific vendor's software nor be bound by software versions, operating systems, and other local computer infrastructure issues. If someone external to the organization sends a document, the recipient should be able to accept, read, and print it. In addition, everyone should be able to produce documents themselves in a format universally usable by others.

Portable document format (PDF) files can be created from most any desktop application with Adobe Exchange software, a product increasingly hailed as a de facto standard for universal access to electronic documents over the Internet. So why not use this easy and readily available solution for producing all records in electronic format? As we will see, there are issues that affect PDF's usefulness for creating, distributing, and storing electronic documents designated as records for retention. Hardware and software technology, metadata capture, business processes used in file creation, and the intricacies of PDF make this file format right for certain applications while possibly inappropriate for others.

Data Formats Proliferate

The best solution for preservation of electronic documents will vary with the business application and the expectations of document use over time. Smaller organizations often use native file formats such as Microsoft Word as "standards" for electronic document storage so that they can control software versions used to produce documents and keep costs minimal. However, this simple means of establishing an electronic document standard often unravels after about two version upgrades, which is when many older files become less readable or presentable in print format. This happens when the software vendor changes the small, internal computer applications that determine how documents are displayed or printed in succeeding versions of software. In addition, the software "standard" might change when the organization's customers, politically powerful internal workgroups, or least-cost procurement decisions dictate that a completely different software package be used to create new documents.

Several file formats are often used instead of native file formats to standardize document data with the intention of preserving documents or making them more universally accessible over time. These formats include PDF, tagged image format (TIF), standard generalized markup language (SGML), hypertext markup language (HTML), and extensible markup language (XML). Other universally used file formats include joint photographic experts group (JPEG) and graphic interchange format (GIF), which are used for color digital images and are not typically employed to preserve documents. (Considerable information about file formats is accessible at the Internet's free online dictionary of computing [FOLDOC] at

The most important consideration is that a generic document format must be universally useable, standard in technical specification over time, and sufficiently robust in capabilities to allow accurate, authentic content preservation and document format presentation.

One common solution for these document archiving and distribution challenges is the creation of PDF files using Adobe PDFWriter software. PDF files can be readily viewed by anyone thanks to the royalty-free availability of Adobe Acrobat Viewer software, offered to all and downloadable at To create PDF files from standard office desktop software, simply install the PDFWriter software's printer driver, then select it as the printer of choice from a desktop Print menu. "Printing" the file to the PDFWriter printer driver directs the print data stream to a Filename_Of_Your_Choice.PDF file on one's computer disk rather than to an actual hard copy printer for production on paper media.

PDF documents excel in usability and can be produced relatively easily (though at some expense). One great lesson in propagation is that the availability of low-cost browser software (such as Microsoft Internet Explorer and Netscape's Navigator) made universal Internet use occur very quickly. A similar situation occurred when Adobe Systems gave away Acrobat Viewer software free of charge. Anyone could read PDF electronic documents with the free viewer, and the use of PDF files became common very quickly.

However, one still must buy Adobe software, such as Exchange, to create the files, though the cost of this software is relatively low compared to other desktop software. PDF files can contain text and graphics, as well as internal indexes to pages that can be displayed in reduced form as "thumbnails." PDF formatted files can also be created by scanning documents using Adobe Capture to create image files.

PDF Competitors

Other file formats also have advantages. TIF files are typically scanned, digitized images that display as a series of black and white dots (pixels) on a computer screen similar to images produced by Adobe Capture software. Document imaging systems that scan paper forms into electronic files for computer-based document management often use TIF because it is a standard file format in wide use by many imaging system vendors and system implementers. TIF files do come in a variety of claimed "standards"; however, most TIF files can be viewed at a basic level with any TIF-compatible document viewer software. This software is available from most document scanner vendors and a simple TIF file viewer (by Wang) has been supplied with the Microsoft Windows operating system.

TIF files are not easily altered by casual document users (although they can be marked up with special software that applies layers of annotation over the image). TIF results in transmittable and easily viewable electronic documents that are useful for archival purposes since they can not be edited or altered without the probability of detection.

However, TIF files do not contain true American standard code for information interchange (ASCII) text characters recognizable to computers unless those computers are using special software that can interpret the TIF image dots as text characters. A TIF file's text content can be displayed for viewing, but it cannot be easily copied for use with other software or data files.

The various "flavors" of TIF used in the computer software industry mean that TIF viewers do not accurately see all file elements -- for example, multi-page TIF files with pagination or files with complex color renderings. TIF file size can be significantly larger than corresponding native files and may be limited in readability if the image is created quickly with a low resolution scanning device. Despite these limitations, TIF files provide some data standardization that is often used when creating archival electronic documents, especially in the case of large size engineering drawings, where PDF file formats do not fully support the creation of large document display sizes.

Files produced by software in a "markup language" format are very powerful in their ability to consistently display electronic document information across various computer operating systems and software. These text and graphics markup languages include SGML, HTML, and XML. SGML has been used for many years in sophisticated document publishing software systems, but it can be cumbersome to learn and use. HTML is used to display formatted document pages on Web sites and is the default standard for displaying simple pages of text and graphics over the Internet. XML is the most modern and powerful markup language. It contains sophisticated text and graphics tagging commands that link document components to dynamically changing data in external databases. XML has other features that improve managing document content as well.

PDF, TIF, and markup language documents all are becoming more standardized in technical specification over time. There is some thought that open, non-proprietary file specifications theoretically give TIF, SGML, HTML, and XML a technical edge for long-term document preservation. TIF viewer software is widely available, and Internet browser software can view most HTML or XML document renditions.

However, a major challenge is that none of these markup languages are in common use on the desktop computer systems used by most document creators -- even though some word processors can save documents in basic HTML format. Document editing using markup languages is not easy to learn, and these document-formatting languages are primarily oriented toward presenting (viewing) documents on a computer screen, rather than printing complex, sophisticated documents.

In contrast, PDF files' viewing and printing capabilities are so robust that the format has been accepted for workflow management and document production in the reprographics industry almost as extensively as TIF and other specialized, high-quality print files (Beal 2000). Many different kinds of organizations are also finding PDF file formats to be business assets (Doyle 2000).

PDF files are very capable of accurately preserving document content and presentation format. Although UNIX versions of PDF files are not as well supported asMicrosoft Windows and Apple Macintosh platforms, PDF files are universally used throughout the Internet's Web sites for direct display or download of documents. The free Acrobat Reader software combined with strong view and print capabilities has led to PDF becoming one of the few accepted data standards for electronic document storage and retrieval.

Business Processes Influence Utility

Despite the generally recognized usefulness of PDF files for document distribution and archiving, close scrutiny reveals the need for a few areas for improvement before PDF becomes the perfect solution for long-term electronic document retention. Creating PDF documents still requires special software (Adobe Exchange) in addition to the native software already resident on most personal computers. This factor poses a significant cost and installation barrier to any electronic document archiving implementation strategy. Although it is possible to minimize costs by designating specific workstations or individuals to create PDF documents, both document migration and repository strategies must be developed and practiced for creating archival documents organization-wide. Processes must be in place to designate specific records for archiving and to transmit them to appropriate personnel for conversion to PDF.

Most electronic recordkeeping systems depend on the capture of accurate, standard metadata for indexing electronic documents and are designed to capture this information at the time documents are saved as records. PDF electronic documents do possess a facility for the storage of a basic set of metadata. However, inconsistencies may arise among the native file format metadata (properties) created when a document is initially saved, the metadata captured when the electronic recordkeeping system saves a document to its repository, and the intrinsic metadata captured and stored within the PDF file itself. What will be the most authoritative metadata? Although some electronic recordkeeping and document management systems can take advantage of the existing "properties" metadata in a file, the best mechanism for this to work is not clearly established.

The business processes used in file creation can actually alter the content of a PDF file. In the new Microsoft Exchange 4.0 version, significant text and graphics editing can be performed. PDF files can also have notes added, pages cropped, pages inserted, internal hyperlinks altered, and a variety of other document-editing activities performed. In fact, these easy document editing and improvement features are major attractions for reprographics firms that want to use PDF files for adding value to documents and enhancing the final print output. How can one prove conclusively that the PDF version of a native file was accurately converted from the original file and is an authentic copy of the original file for legal and regulatory audit purposes?

The difficulties of converting native file format documents to accurate PDF renditions have been discussed for many years. It is not uncommon for pagination changes, altered graphics displays, and different text fonts to occur when native format documents are converted to PDE "Changing target printers will often affect the layout of your publication -- in line endings, font substitution, or number of pages" (Adobe Magazine 2000).

Viewing documents online can be equally frustrating. "Viewing an online PDF file involves several components: a Web browser, the Acrobat viewing plug-in for the browser, the Acrobat viewing program itself, ... and the server." (Adobe Magazine 2000). In addition, to ensure accurate document production, it is standard operating procedure in PDF file creation to use Adobe Distiller software instead of Adobe Exchange when the source documents contain encapsulated postscript (EPS) data. This raises the question of whether future users will have similar software installed on their computer systems to read PDF documents.

For these reasons, the procedures used to create PDF documents for archiving should be strictly controlled and documented to ensure they can be successfully audited. Without these electronic records management controls, the authenticity of PDF documents could be easily questioned.

PDF is Here to Stay!

Despite challenges in creating PDF documents, PDF format is one of the best cross-platform document storage standards in use today. Its status can be expected to continue for some time. The freely available Acrobat reader software and the format's robust capabilities overall will continue to ensure that PDF files are universally useful. PDF document production software is increasingly used in business settings for both producing electronic documents and storing them for future use.

The use of PDF document formats is an appropriate part of any well-considered data and document migration strategy to ensure information availability. There will be few disappointments in using the PDF file format for this purpose as long as plans include measures to address identified deficiencies.


Beal, Stephen. "In Production Joins the PDF Ranks." Electronic Publishing 24, no.9 (September 2000): 73-74.

Doyle, Audrey. "Museum Cuts Costs with a PDF Workflow." Electronic Publishing 24, no.8 (August 2000): 50, 52.

FOLDOC -- The Free Online Dictionary of Computing. Available at: (accessed 15 September 2000).

"Q&A -- Acrobat." Adobe Magazine 11 no. 2 (March/April 2000): 62.

John Phillips, CRM, is the owner of Information Technology Decisions, a management consulting firm. He has more than 20 years' experience in information resources management, specializing in automated records management systems and other technology-related areas. He can be reached at
Title Annotation:Portable document format
Date:Jan 1, 2001
