Printer Friendly
The Free Library
19,122,083 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Should PDF Be Used for Archiving Electronic Records?


Preserving and archiving electronic records for extended periods of time requires attention to both technology and business issues. With the proliferation of software to produce electronic documents comes a growing need to store those documents in a standard electronic format.

Printing documents to paper for long-term archiving may be a convenient and reasonable solution when documents must be retained for 10 years or more. Printing documents avoids the need to address long-term technology and data storage issues. However, storing documents in paper format requires large amounts of space for storage and lengthy time for retrieving even thoroughly indexed documents. In addition, conversion to paper format negates many benefits of managing records electronically, such as the ability to search document content and to transmit documents over computer networks quickly.

Several issues must be addressed to ensure that records produced and retained electronically will be available and readable in the distant future. Hardware and software that can read the records must be available, or the records must be converted accurately and authentically to readable formats for display by newer computer technology. Data files must be sufficiently standardized to enable those other than the record's creator to view its content. Otherwise, the record's usefulness is compromised. Because most office automation products, such as word processors, spreadsheets, graphics, and database software, produce data files in formats that are proprietary to their vendors, there is an increasing need for these files to be stored in a common standard file format that can be easily created and viewed by the general population.

To be universally usable, a document format must be readable without regard to the specific software available on individuals' desktops. Users, then, must not be required to have a specific vendor's software nor be bound by software versions, operating systems Operating systems can be categorized by technology, ownership, licensing, working state, usage, and by many other characteristics. In practice, many of these groupings may overlap. , and other local computer infrastructure issues. If someone external to the organization sends a document, the recipient should be able to accept, read, and print it. In addition, everyone should be able to produce documents themselves in a format universally usable by others.

Portable document format (file format) Portable Document Format - (PDF) The native file format for Adobe Systems' Acrobat. PDF is the file format for representing documents in a manner that is independent of the original application software, hardware, and operating system used to create those documents.  (PDF (Portable Document Format) The de facto standard for document publishing from Adobe. On the Web, there are countless brochures, data sheets, white papers and technical manuals in the PDF format. ) files can be created from most any desktop application with Adobe Exchange software, a product increasingly hailed as a de facto standard Hardware or software that is widely used, but not endorsed by a standards organization. Contrast with de jure standard.

de facto standard - A widespread consensus on a particular product or protocol which has not been ratified by any official standards body, such as ISO,
 for universal access to electronic documents over the Internet. So why not use this easy and readily available solution for producing all records in electronic format? As we will see, there are issues that affect PDF's usefulness for creating, distributing, and storing electronic documents designated as records for retention. Hardware and software technology, metadata capture, business processes used in file creation, and the intricacies of PDF make this file format right for certain applications while possibly inappropriate for others.

Data Formats Proliferate

The best solution for preservation of electronic documents will vary with the business application and the expectations of document use over time. Smaller organizations often use native file formats such as Microsoft Word A full-featured word processing program for Windows and the Macintosh from Microsoft. Included in the Microsoft application suite, it is a sophisticated program with rudimentary desktop publishing capabilities that has become the most widely used word processing application on the market.  as "standards" for electronic document storage so that they can control software versions used to produce documents and keep costs minimal. However, this simple means of establishing an electronic document standard often unravels after about two version upgrades, which is when many older files become less readable or presentable pre·sent·a·ble  
adj.
1. That can be given, displayed, or offered: presentable gifts; presentable attire.

2. Fit for introduction to others: presentable relatives.
 in print format. This happens when the software vendor changes the small, internal computer applications that determine how documents are displayed or printed in succeeding versions of software. In addition, the software "standard" might change when the organization's customers, politically powerful internal workgroups, or least-cost procurement decisions dictate that a completely different software package be used to create new documents.

Several file formats are often used instead of native file formats to standardize document data with the intention of preserving documents or making them more universally accessible over time. These formats include PDF, tagged image format (TIF TIF Tagged Image File (file name extension)
TIF Tax Increment Financing
TIF Temporary Internet Files
TIF Transport Innovation Fund (UK)
TIF Telecommunications Infrastructure Fund
), standard generalized markup language (language, text) Standard Generalized Markup Language - (SGML) A generic markup language for representing documents. SGML is an International Standard that describes the relationship between a document's content and its structure.  (SGML SGML
 in full Standard Generalized Markup Language

Markup language for organizing and tagging elements of a document, including headings, paragraphs, tables, and graphics.
), hypertext markup language (hypertext, World-Wide Web, standard) Hypertext Markup Language - (HTML) A hypertext document format used on the World-Wide Web. HTML is built on top of SGML. "Tags" are embedded in the text. A tag consists of a "<", a "directive" (in lower case), zero or more parameters and a ">".  (HTML HTML
 in full HyperText Markup Language

Markup language derived from SGML that is used to prepare hypertext documents. Relatively easy for nonprogrammers to master, HTML is the language used for documents on the World Wide Web.
), and extensible markup language See XML.

(language, text) Extensible Markup Language - (XML) An initiative from the W3C defining an "extremely simple" dialect of SGML suitable for use on the World-Wide Web.

http://w3.org/XML/.
 (XML XML
 in full Extensible Markup Language.

Markup language developed to be a simplified and more structural version of SGML. It incorporates features of HTML (e.g., hypertext linking), but is designed to overcome some of HTML's limitations.
). Other universally used file formats include joint photographic experts group (image, body, file format, standard) Joint Photographic Experts Group - (JPEG) The original name of the committee that designed the standard image compression algorithm. JPEG is designed for compressing either full-colour or grey-scale digital images of "natural", real-world scenes.  (JPEG JPEG
 in full Joint Photographic Experts Group

Standard computer file format for storing graphic images in a compressed form for general use. JPEG images are compressed using a mathematical algorithm.
) and graphic interchange format (GIF GIF
 in full Graphics Interchange Format

Standard computer file format for graphic images. GIF files use data compression to reduce the file size. The original version of the format was developed by CompuServe in 1987.
), which are used for color digital images and are not typically employed to preserve documents. (Considerable information about file formats is accessible at the Internet's free online dictionary of computing Dictionary of Computing - Free On-line Dictionary of Computing  [FOLDOC FOLDOC - Free On-line Dictionary of Computing ] at www.foldoc.org.)

The most important consideration is that a generic document format must be universally useable, standard in technical specification over time, and sufficiently robust in capabilities to allow accurate, authentic content preservation and document format presentation.

One common solution for these document archiving and distribution challenges is the creation of PDF files using Adobe PDFWriter software. PDF files can be readily viewed by anyone thanks to the royalty-free availability of Adobe Acrobat Viewer software, offered to all and downloadable at www.Adobe.com. To create PDF files from standard office desktop software, simply install the PDFWriter software's printer driver, then select it as the printer of choice from a desktop Print menu. "Printing" the file to the PDFWriter printer driver directs the print data stream to a Filename_Of_Your_Choice.PDF file on one's computer disk rather than to an actual hard copy printer for production on paper media.

PDF documents excel in usability and can be produced relatively easily (though at some expense). One great lesson in propagation is that the availability of low-cost browser software (such as Microsoft Internet Explorer See Internet Explorer.  and Netscape's Navigator) made universal Internet use occur very quickly. A similar situation occurred when Adobe Systems Adobe Systems Incorporated (pronounced a-DOE-bee IPA: /əˈdoʊbiː/) (NASDAQ: ADBE) (LSE: ABS) is an American computer software company headquartered in San Jose, California, USA.  gave away Acrobat Viewer software free of charge. Anyone could read PDF electronic documents with the free viewer, and the use of PDF files became common very quickly.

However, one still must buy Adobe software A list of Adobe Systems products.

Current
  • Adobe Acrobat Capture
  • Adobe Acrobat Connect (formerly Macromedia Breeze)
  • Adobe Presenter
  • Adobe Acrobat Distiller
  • Adobe Acrobat Reader
  • Adobe AIR
  • Adobe Audition
, such as Exchange, to create the files, though the cost of this software is relatively low compared to other desktop software. PDF files can contain text and graphics, as well as internal indexes to pages that can be displayed in reduced form In social science and statistics, particularlly econometrics, a reduced form equation is a method of dealing with endogeneity. A reduced form equation is defined by James Stock & Mark Watson (2007) in the following way:  as "thumbnails." PDF formatted files can also be created by scanning documents using Adobe Capture to create image files.

PDF Competitors

Other file formats also have advantages. TIF files are typically scanned, digitized images that display as a series of black and white dots (pixels) on a computer screen similar to images produced by Adobe Capture software. Document imaging systems that scan paper forms into electronic files for computer-based document management often use TIF because it is a standard file format in wide use by many imaging system vendors and system implementers. TIF files do come in a variety of claimed "standards"; however, most TIF files can be viewed at a basic level with any TIF-compatible document viewer See file viewer and document exchange software.  software. This software is available from most document scanner An optical scanner geared to office documents rather than photographs. Also called "office scanners," "enterprise scanners" and "business scanners," desktop models have automatic document feeders that can scan in the range of approximately 15 to 100 pages per minute.  vendors and a simple TIF file viewer (by Wang) has been supplied with the Microsoft Windows See Windows.

(operating system) Microsoft Windows - Microsoft's proprietary window system and user interface software released in 1985 to run on top of MS-DOS. Widely criticised for being too slow (hence "Windoze", "Microsloth Windows") on the machines available then.
 operating system.

TIF files are not easily altered by casual document users (although they can be marked up with special software that applies layers of annotation over the image). TIF results in transmittable and easily viewable electronic documents that are useful for archival purposes since they can not be edited or altered without the probability of detection The Probability of Detection is a term used in Radar sets. The radar system must detect, with greater than or equal to 80% probability at a definied range, a one square meter radar cross section. The received and demodulated echo signal is processed by a threshold logic. .

However, TIF files do not contain true American standard code for information interchange American Standard Code for Information Interchange: see ASCII.


See ASCII.

American Standard Code for Information Interchange - The basis of character sets used in almost all present-day computers.
 (ASCII ASCII or American Standard Code for Information Interchange, a set of codes used to represent letters, numbers, a few symbols, and control characters. Originally designed for teletype operations, it has found wide application in computers. ) text characters recognizable to computers unless those computers are using special software that can interpret the TIF image dots as text characters. A TIF file's text content can be displayed for viewing, but it cannot be easily copied for use with other software or data files.

The various "flavors" of TIF used in the computer software industry mean that TIF viewers do not accurately see all file elements -- for example, multi-page TIF files with pagination (1) Page numbering.

(2) Laying out printed pages, which includes setting up and printing columns, rules and borders. Although pagination is used synonymously with page makeup, the term often refers to the printing of long manuscripts rather than ads and brochures.
 or files with complex color renderings. TIF file size can be significantly larger than corresponding native files and may be limited in readability if the image is created quickly with a low resolution scanning device. Despite these limitations, TIF files provide some data standardization that is often used when creating archival electronic documents, especially in the case of large size engineering drawings, where PDF file formats do not fully support the creation of large document display sizes.

Files produced by software in a "markup language" format are very powerful in their ability to consistently display electronic document information across various computer operating systems and software. These text and graphics markup languages include SGML, HTML, and XML. SGML has been used for many years in sophisticated document publishing software See document exchange software, document imaging and desktop publishing.  systems, but it can be cumbersome to learn and use. HTML is used to display formatted document pages on Web sites and is the default standard for displaying simple pages of text and graphics over the Internet. XML is the most modern and powerful markup language. It contains sophisticated text and graphics tagging commands that link document components to dynamically changing data in external databases. XML has other features that improve managing document content as well.

PDF, TIF, and markup language documents all are becoming more standardized in technical specification over time. There is some thought that open, non-proprietary file specifications theoretically give TIF, SGML, HTML, and XML a technical edge for long-term document preservation. TIF viewer software is widely available, and Internet browser software can view most HTML or XML document renditions.

However, a major challenge is that none of these markup languages are in common use on the desktop computer systems used by most document creators -- even though some word processors can save documents in basic HTML format. Document editing using markup languages is not easy to learn, and these document-formatting languages are primarily oriented toward presenting (viewing) documents on a computer screen, rather than printing complex, sophisticated documents.

In contrast, PDF files' viewing and printing capabilities are so robust that the format has been accepted for workflow management and document production in the reprographics Duplicating printed materials using various kinds of printing presses and high-speed copiers.  industry almost as extensively as TIF and other specialized, high-quality print files (Beal 2000). Many different kinds of organizations are also finding PDF file formats to be business assets (Doyle 2000).

PDF files are very capable of accurately preserving document content and presentation format. Although UNIX UNIX

Operating system for digital computers, developed by Ken Thompson of Bell Laboratories in 1969. It was initially designed for a single user (the name was a pun on the earlier operating system Multics).
 versions of PDF files are not as well supported asMicrosoft Windows and Apple Macintosh platforms, PDF files are universally used throughout the Internet's Web sites for direct display or download of documents. The free Acrobat Reader software combined with strong view and print capabilities has led to PDF becoming one of the few accepted data standards for electronic document storage and retrieval.

Business Processes Influence Utility

Despite the generally recognized usefulness of PDF files for document distribution and archiving, close scrutiny reveals the need for a few areas for improvement before PDF becomes the perfect solution for long-term electronic document retention. Creating PDF documents still requires special software (Adobe Exchange) in addition to the native software already resident on most personal computers. This factor poses a significant cost and installation barrier to any electronic document archiving implementation strategy. Although it is possible to minimize costs by designating specific workstations or individuals to create PDF documents, both document migration and repository strategies must be developed and practiced for creating archival documents organization-wide. Processes must be in place to designate specific records for archiving and to transmit them to appropriate personnel for conversion to PDF.

Most electronic recordkeeping systems depend on the capture of accurate, standard metadata for indexing electronic documents and are designed to capture this information at the time documents are saved as records. PDF electronic documents do possess a facility for the storage of a basic set of metadata. However, inconsistencies may arise among the native file format metadata (properties) created when a document is initially saved, the metadata captured when the electronic recordkeeping system saves a document to its repository, and the intrinsic metadata captured and stored within the PDF file itself. What will be the most authoritative metadata? Although some electronic recordkeeping and document management systems can take advantage of the existing "properties" metadata in a file, the best mechanism for this to work is not clearly established.

The business processes used in file creation can actually alter the content of a PDF file. In the new Microsoft Exchange 4.0 version, significant text and graphics editing can be performed. PDF files can also have notes added, pages cropped, pages inserted, internal hyperlinks altered, and a variety of other document-editing activities performed. In fact, these easy document editing and improvement features are major attractions for reprographics firms that want to use PDF files for adding value to documents and enhancing the final print output. How can one prove conclusively that the PDF version of a native file was accurately converted from the original file and is an authentic copy of the original file for legal and regulatory audit purposes?

The difficulties of converting native file format documents to accurate PDF renditions have been discussed for many years. It is not uncommon for pagination changes, altered graphics displays, and different text fonts to occur when native format documents are converted to PDE PDE Pennsylvania Department of Education
PDE Plug-In Development Environment
PDE Partial Differential Equation
PDE Phosphodiesterases
PDE Personal Digital Entertainment
PDE Pulse Detonation Engine
PDE Product Data Exchange
PDE Present-Day English
 "Changing target printers will often affect the layout of your publication -- in line endings, font substitution, or number of pages" (Adobe Magazine 2000).

Viewing documents online can be equally frustrating. "Viewing an online PDF file involves several components: a Web browser The program that serves as your front end to the Web on the Internet. In order to view a site, you type its address (URL) into the browser's Location field; for example, www.computerlanguage.com, and the home page of that site is downloaded to you. , the Acrobat viewing plug-in for the browser, the Acrobat viewing program itself, ... and the server." (Adobe Magazine 2000). In addition, to ensure accurate document production, it is standard operating procedure standard operating procedure Medtalk A technique, method or therapy performed 'by the book,' using a standard protocol meeting internally or externally defined criteria; a formal, written procedure that describes how specific lab operations are to be performed.  in PDF file creation to use Adobe Distiller software instead of Adobe Exchange when the source documents contain encapsulated postscript (EPS (Encapsulated PostScript) A PostScript file format used to transfer a graphic image between applications and platforms. EPS files contain PostScript code as well as an optional preview image in TIFF, WMF, PICT or EPSI, the latter being an ASCII-only format. ) data. This raises the question of whether future users will have similar software installed on their computer systems to read PDF documents.

For these reasons, the procedures used to create PDF documents for archiving should be strictly controlled and documented to ensure they can be successfully audited. Without these electronic records management controls, the authenticity of PDF documents could be easily questioned.

PDF is Here to Stay!

Despite challenges in creating PDF documents, PDF format is one of the best cross-platform document storage standards in use today. Its status can be expected to continue for some time. The freely available Acrobat reader software and the format's robust capabilities overall will continue to ensure that PDF files are universally useful. PDF document production software is increasingly used in business settings for both producing electronic documents and storing them for future use.

The use of PDF document formats is an appropriate part of any well-considered data and document migration strategy to ensure information availability. There will be few disappointments in using the PDF file format for this purpose as long as plans include measures to address identified deficiencies.

REFERENCES

Beal, Stephen. "In Production Joins the PDF Ranks." Electronic Publishing 24, no.9 (September 2000): 73-74.

Doyle, Audrey. "Museum Cuts Costs with a PDF Workflow." Electronic Publishing 24, no.8 (August 2000): 50, 52.

FOLDOC -- The Free Online Dictionary of Computing. Available at: www.foldoc.org (accessed 15 September 2000).

"Q&A -- Acrobat." Adobe Magazine 11 no. 2 (March/April 2000): 62.

John Phillips, CRM (Customer Relationship Management) An integrated information system that is used to plan, schedule and control the presales and postsales activities in an organization. , is the owner of Information Technology Decisions, a management consulting firm. He has more than 20 years' experience in information resources management, specializing in automated records management systems and other technology-related areas. He can be reached at jtpid@usit.net.
COPYRIGHT 2001 Association of Records Managers & Administrators (ARMA)
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2001, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Portable document format
Author:PHILLIPS, JOHN T.
Publication:Information Management Journal
Date:Jan 1, 2001
Words:2525
Previous Article:The Electronic Signatures in Global and National Commerce Act: A Sea Change in Electronic Records Law.
Next Article:The Tools and Technologies Needed for Knowledge Management.
Topics:



Related Articles
ADOBE SYSTEMS, PALM COMPUTING TO INTEGRATE ADOBE TECHNOLOGIES AND PALM OS SOFTWARE.
ADOBE WELCOMES FEDERAL GOVERNMENT REGULATIONS REQUIRING ENHANCED WEB ACCESSIBILITY FOR INDIVIDUALS WITH DISABILITIES.
Jaws PDF. (IT News).
The challange of web site records preservation: managing electronic records in fast-paced, technology-driven web environments has frustrated...
Sample Forms for Archival and Records Management Programs.
Will PDF prevent a digital dark age? (Up front: news, trends & analysis).
Coming in 2005: international PDF-archive standard.
Adobe helps AstraZeneca streamline global document processes.
Acrobat enhances collaboration tools.
Digital archiving in the pharmaceutical industry: while relatively new as a retention method in the drug industry, e-archiving of records is a...

Terms of use | Copyright © 2012 Farlex, Inc. | Feedback | For webmasters | Submit articles