Innovations in information management technologies: thanks to sweeping new regulations and an investigatory climate in corporations worldwide, fine minds are turning their attention to solving--or at least salving--the difficulties associated with managing electronic records.
Examines three products that have records management implications:
* EMC's Centera Compliance Edition
* Princeton Softech's Active Archiving Solutions
* Microsoft's Sharepoint Portal Server version 2.0
Genius is the introduction of a new element into the intellectual universe; it is the application of powers to objects on which they had not before been exercised in such a manner as to produce effects hitherto unknown.--Wordsworth
Broad new regulations and an investigatory climate have created lucrative new markets for information management products with records capabilities. "Compliance" has become the mantra of technology sellers and buyers alike, and true innovation is a promise often made but rarely kept. Now and then, however, products emerge that have real implications for the improved creation, distribution, storage, preservation, and disposal of business records. With records managers now commonly invited to share their insights and expertise with eager legal and information technology (IT) departments, it pays to know what is on the horizon for managing the records lifecycle.
Three products represent new approaches to electronic issues and have implications for records managers:
* EMC's Centera Compliance Edition offers storage hardware and a unique software-based method for storing records and their associated metadata so that they remain unalterable and authentic according to predefined retention rules.
* Princeton Softech's Active Archiving Solutions present new capabilities for encapsulating selected portions of structured databases so they can be stored more effectively and retained as part of a company's total records solution, yet remain easily searchable for end users.
* Microsoft's SharePoint Portal Server version 2.0 is included as part of Windows Shared Services, and businesses with Microsoft enterprise licenses already have SharePoint available to them. This bundling will likely make Sharepoint a default tool for collaboration, information access, and sharing for some firms. Several companies have developed records management tools for SharePoint.
No one product answers all needs, but each one does represent a new idea or approach. Also, the products showcased are representative; already, other vendors are preparing competitive products around similar ideas. Finally, the products are diverse in their capabilities but share one common factor: All require an underlying procedural foundation of policies, procedures, retention, and classification in order to work effectively--all the building blocks with which records managers are most familiar.
EMC Centera's Content Addressed Storage
Centera Compliance Edition is designed to store fixed content--such as documents, presentations, images, video, and audio files--that are complete, finalized, and not expected to change. Once items are considered records, they pass from the applications used to create them (Word, Excel, Powerpoint, etc.) into Centera by way of an application program interface (API) that acts as a conduit between the two systems. For its part, Centera creates a one-of-a-kind digital fingerprint for each record that serves as the address for where the record is stored.
The record's digital fingerprint is a function of two things: the record's content and its metadata. First, Centera uses algorithms to calculate a unique, 128-bit hash of the record's content, a kind of digital claim check. Next, the record's claim check, along with metadata about the record (such as file name, creation date, etc.), are inserted into an extensible markup language (XML) file called a C-Clip descriptor file (CDF). Centera then creates a content address based on the CDF (which now contains the record's claim check and its XML file). Centera passes this content address back to the application that created the record. Centera stores the record itself as a binary large object (BLOB) within its storage array. A mirror copy is also stored. The stored record is non-rewritable throughout the retention period.
To avoid storing duplicates of the same record, Centera checks the digital fingerprints (the claim checks) of incoming records. If an incoming record has the exact fingerprint of an already stored item, Centera will create another pointer to the stored record, not another copy of the record. Within Centera it is possible to have multiple CDFs with different retention periods pointing to the same stored record, but only one copy of the record is stored--a good thing for regulated industries and an important factor in reducing risk associated with storing duplicates. The record will remain non-erasable until the longest retention period expires. Centera destroys records by using digital "shredding" techniques to overwrite the stored items seven times.
When Centera is used with a records management application, the metadata in the CDF can include the retention period for stored records. Because each record has its own CDF, Centera associates a retention period with an individual record rather than with containers of records grouped together in virtual folders. It should be noted, however, that Centera does not take the place of records management application software but works in concert with it.
Centera's content address is not tied to path names or directory structures that can change over time as electronic records are off-loaded or re-associated with different folders. For records management functions, this means more effective production of records responsive to discovery orders. Destruction holds are effected by lengthening the retention period on the stored records. Centera cautions that this should be done in increments because it is not possible to shorten a retention period once it has been assigned.
Centera is a hardware device that uses a redundant array of independent nodes where each node has its own instance of software, processing power, and disk storage capacity. Centera is based on magnetic storage media. In instances where disposal of records is important, magnetic storage allows more flexibility for records management than does conventional write once, read many (WORM) optical.
EMC Centera's list price begins at $64,000 for hardware and $84,000 for software, representing $148,000 for a 4-terabyte system configuration.
Princeton Softech's Active Archiving
Unlike EMC's Centera or MS Sharepoint, Princeton Softech's active archiving technology is meant to work with structured data, namely the tables and relationships typically found in databases. This is important because many Web sites, e-commerce sites, and enterprise applications such as PeopleSoft and SAP are actually databases that maintain rows and columns of data that comprise tables. Like any database, the relationships established between and among tables provide the functionality for search, retrieval, workflow, reporting, and other business needs.
For the IT department, database growth brings several problems. As the amount of data in a database increases, performance suffers. The amount of time it takes for the system to execute simple commands, such as search requests, grows longer and longer until it reaches unacceptable levels. The other consequence of ever-expanding databases is the amount of storage required to keep all data ratline. While storage is cheap to buy, the total cost of ownership (i.e., labor and other costs to maintain the storage over its lifetime) is not. More data also means longer back-up times and can mean longer recovery time in the event of disaster.
For records managers, databases have always offered unique challenges. Without a doubt, databases are company records of transactions with customers, vendors, and employees. Up to now, retention for databases consisted of keeping the annual back-up tape for some period of time according to the retention schedule. Older data, moved to tape to improve database performance, may not have had any retention associated with it.
Princeton Softech's Active Archiving is a suite of products for structured data. It works by encapsulating selected database tables and their relationships for movement to other storage media, which may include magnetic disk, tape, or specialized storage devices such as random array of inexpensive disk (RAID). It differs from traditional data archiving products in that the company's active archives solutions can selectively segregate older data according to a company's business rules. Consider the ability to archive accounts-payable transactions that are unlikely to be retrieved but must be retained until an ongoing tax audit is completed. A rule written for Active Archiving may identify that all accounts-payable information older than two years be automatically archived.
Princeton's product provides the ability to define data from many different tables and keep all information related to it. In addition, it produces an active archive directory that is searchable by end users without having to restore the archived data. In fact, it is transparent to end users whether data is online or archived offline. This is a significant difference because in the past, many database archiving products were "all-or-none" propositions when it came to restoring data that had been off-loaded to tape. Searching could only take place after all the archived data was restored. For records managers, an active archive directory means being able to search older structured data files as part of a response to regulatory audit, discovery, or investigation more quickly and without having to restore the entire archive.
Active Archiving assures data integrity by calculating check sums for data blocks moved during the archiving process. If archived data is restored to the application and any changes are made, Active Archiving will compare the new check sum with the old upon re-archiving, conclude that changes have occurred, and save the newly archived data as a new file.
One significant advantage of Active Archiving is that the archived data and tables remain accessible even if the original database model (the way that tables relate to one another) has changed. Data retrieved from Active Archiving solutions maintains all relationships and can be mapped to new database models or upgrades as required. Because data can be segregated by age before it is archived, disaster recovery efforts could concentrate on the most critically needed data first.
In order to work, Active Archiving requires the business organization to define its data model and business policy and to specify how often the active archiving should occur--an area in which records managers have much to contribute. Customers must also specify the media--tape, optical, RAID devices --on which the archived data should be saved. Active Archiving can work with EMC Centera devices. If the oldest three years of accounts-receivable data is archived to a Centera device, for example, a retention period could be imposed on it under the control of Centera. Note that the Active Archiving product itself cannot manage records retention as yet.
Active Archiving solutions are priced according to the number of processes involved. A single application server device may have up to four central processing units. A typical business application may have one processing unit running the application and another running the database. Active Archiving's initial pricing of $50,000 includes up to four processes, with $6,250 for each additional process.
Originally positioned as a document management product that was cheap at $79 per license, version 1.0 of Microsoft's Sharepoint Portal Services was primarily designed for sharing information in a Web-like setting.
Version 1.0 provided three ways to classify a document for later retrieval: via a structured folder hierarchy, by capturing metadata and keywords to a document profile, or by way of user-defined categories designed for browsing. The product's workflow offered some rudimentary approval processes that focused on publishing content to a Web page. Permissions to view, alter, or publish content were dependent on a scheme of public vs. private folders. The product also featured an out-of-the-box portal that could be easily customized to display content from various sources and create sites where sharing of documents and other content could occur.
[Editor's Note: Also see "The Joys of Enterprise Portals" by Jerrold G. Rose in The Information Management Journal, September/October 2003].
Microsoft promoted Sharepoint version 1.0 as an enterprise product, but its structure was complicated. For example, metadata fields had to be determined in advance because document profiles could not be altered to accommodate new metadata fields; developing categories implied a uniform way for everyone to understand an organization's information assets--not easy in any environment. For this reason, Sharepoint seemed easier to implement in smaller departments and work teams.
Microsoft Sharepoint Version 2.0
With version 2.0, Microsoft positions Sharepoint as a collaboration tool set, de-emphasizing its document management capabilities. (According to Microsoft, standard document management features such as versioning and check-in and check-out will be standard in MS Office 2003 products). Profiling of documents is no longer required and workflow has been completely removed from the product, though Microsoft will rely on third parties to provide workflow tools for Sharepoint.
Instead, Microsoft has given Sharepoint 2.0 a structured query language (SQL) database and Web parts that will make it easier to personalize the portal and to expose information and data from other applications within Sharepoint. For example, a graph generated from SAP financial data could appear in a Sharepoint environment personalized for an executive, but not accessible by other users. Project teams could construct their own sites to comment on work produced, share meeting minutes, post slide presentations, and collaborate on team tasks. Internally, Microsoft uses Sharepoint customization to allow employees to see their own pay stubs electronically, rather than issuing them on paper. Permissions are now based on roles such as reader, contributor, Web designer, and administrator. View privileges are applied using a concept called audiences, making it easier to determine who may see what content.
As in version 1.0, the concept of grouping related content into categories still exists within Sharepoint 2.0 and has been renamed "topics." Topics provide an information architecture to aid those who may not be completely familiar with a subject and wish to browse. Every topic has an owner who functions as a gatekeeper, either giving or denying permission for a new item to be included in a topic. Explanations of what topics encompass are not accessible within the system, meaning that all users must understand a topic in the same way based on procedure. Microsoft agrees that deciding what the topics should be is tough and recommends starting simple, possibly with user focus groups. It is possible to make changes to the topics. In demos, Sharepoint topics are arranged by organization functions--marketing, sales, finance, etc.
This is a knotty problem because it is appealing to think that existing records series could easily provide the topics an organization needs. The problem is that records series are often defined by the laws that govern their retention--for example, a separate series might be created for two items with the same retention because they are governed by different laws. Most people outside records management do not characterize their documents by area of law, or even by the document's function, hut rather by the subject(s) discussed in the document. One of Sharepoint's strengths is that users can subscribe to a topic with information delivered to them automatically whenever it is added to the topic.
Search capabilities in Sharepoint include full text, metadata attributes, and topic browsing. Searches can be set with specific scopes detailing which Sharepoint sites are included in the search. Sites may include other Sharepoint environments or external Web sites. Users can also search on information types such as images, pictures, documents, and spreadsheets.
Sharepoint has no records management capabilities included in the product. Companies such as eManage, C-Exec, and MDY FileSurf have records management integrations with Sharepoint version 1.0 and are working on integration with version 2.0. As of November 2003, Belfast-based Meridio alone has developed a tightly integrated records management application for Sharepoint version 2.0.
Whether these products represent genius or not remains to be seen. What is certain is that fine minds throughout the world are turning their attention to solving--or at least salving--the difficulties associated with managing electronic records. As records managers become more involved in the realm of electronic records, their ability to separate truly new possibilities from "old wine in new bottles" will become stronger, and the standard for true innovation will rise accordingly.
Julie Gable, CRM, CDIA, FAI, is Associate Executive Editor of The Information Management Journal and principal and founder of Gable Consulting. She may be contacted at email@example.com.
|Printer friendly Cite/link Email Feedback|
|Publication:||Information Management Journal|
|Date:||Jan 1, 2004|
|Previous Article:||Iron Mountain launches consulting service.|
|Next Article:||E-mail, voice mail, and instant messaging: a legal perspective: an organization that uses messaging faces a legal landscape that urges, if not...|