Will data conversion lose your records?
Management does sometimes approve more-than-adequate budgets. Most new software is a vast improvement over older products. However, converting data from older to newer systems requires such extremely detailed project planning and execution that new system start-up problems are all too common.
This year, many information management departments will select new software systems for implementation due to the perceived inadequacies of existing systems with regard to Y2K requirements. Other reasons to buy new software and systems include the lack of integration between existing systems, or inability of those systems to meet organizational requirements for ease of use or special functions.
A common reason for buying new software is to gain more control of documents and records. Imagine the unpleasant surprise if a new software system actually caused an organization to have less control of some records series, due to poor data conversion or actual data loss. It has happened before and will happen again.
Software is sometimes reviewed, bought, and scheduled for delivery before any significant data migration planning occurs, under the assumption that data migration will be a parallel task once the system arrives and is installed. When there is relatively little existing data from past manual systems, data migration is seldom a major project challenge. In such cases, the data must be manually entered. However, legacy systems that contain considerable amounts of data can present serious challenges to successful completion of records management system projects. Data migration can also consume a large portion of an otherwise adequate budget.
Why is accurate data conversion so difficult? What are the danger signals that indicate possible data loss? Are there any standards for data conversion? Who should be responsible for data conversion, and how do you know that data has been successfully converted? These are a few of the questions to address before successfully implementing a software system and demonstrating the magical return on investment that technology project managers seek.
Bytes to Bytes
The aphorism "garbage in/garbage out" is never truer than on a data conversion project. Data is the raw material for the machinery of computing. Without data, software is an empty shell. In fact, software versions or vendors can be changed often, but the data that resides in a computer system has long-term meaning and importance. The data and the documents and records built from it - are the real knowledge or corporate memory of an enterprise. Erroneous data can mean misleading information, false conclusions, and lost business records.
Even though software programming languages, operating systems, and physical hardware may vary with different applications, the data remains constant. New software will have new modules, functions, screens, reports, and data repositories (databases), but the new computer system will probably need to start with the same data used by the old system, although it may require it in a slightly different format. Names, addresses, social security numbers, employee numbers, birth dates, department codes, account numbers, file folder labels, document titles, and box bar codes can all be migrated, transferred, or converted between old and new systems.
Conversion means that the computer bytes of data used by one computer application must be transferred and/or transformed into meaningful bytes that can be used by the new application.
A byte is typically composed of eight bits (a bit being composed of 1s and 0s) and is the smallest addressable unit of information storage. Usually a byte stores just one character. These characters are what must be made available to the new software. Bytes (characters) grow into data that can be measured in kilobytes (KB - thousands of bytes), megabytes (MB - millions of bytes), and gigabytes (GB - billions of bytes). Because these bytes are really just 1s and 0s, different kinds of software can read data, documents, and records created by other software - if they are properly presented in an understandable format.
This data is completely indecipherable to humans unless it is presented in something recognizable characters, words, sounds, colors, pictures, or other forms of meaningful information. The most commonly used form of byte-stored data is character data - text and numbers. The most commonly used format for this data is the mapping of bytes to specific characters as represented by the American Standard Code for Information Interchange (ASCII) whereby specific combinations of 1s and 0s are simply interpreted as specific text and numbers. ASCII data is usually the raw material for data conversion projects. Structured ASCII data is used to populate computer screens, create database reports, or compose documents (such as this article) to be read by humans.
Standards for Loading Data
Because ASCII data is universally useful, understandable, and manageable, many software programs use it for information storage. Although other forms of computer information may represent colors, fonts, sounds, or pictures, ASCII data is of most interest in managing documents and records. Names, addresses, records series, bar codes, and retention dates can all be represented by the same ASCII data codes in word processors, spreadsheets, graphics programs, and databases.
For these reasons, ASCII is an often used data exchange, data migration, or data conversion format or standard. Sometimes new business application software (such as recordkeeping software) may use the same database software (such as SQL Server or Oracle) as the old application. In such cases, it is easy to transfer the data between old and new systems by loading it directly from a database table in the old system to a similar database table in the new software. However, in many cases, this is not possible, and converting the data in the old system to ASCII format is required so it can be easily read by the new software and loaded into the new system.
Many software vendors expect system buyers to transform existing character data in structured databases into ASCII data files for import into the systems that they sell. To assist with this effort, vendors may offer data conversion support services to assure that customers have useful systems (with clean data) after the new software is installed and operational.
All software packages today have export/import utilities that allow dumping to or loading from other software systems. Utility modules enable programs to use existing data and assure that rekeying every character into the new database is not required. A good example of a simple database import/export utility can be found in Microsoft Access. It leads the computer user through a series of questions that aid in formatting data for export available through the File/Save As/Export menu bar selection. The computer user provides:
* Computer file format desired for export - database, spreadsheet, text, or others
* Data export format required
a. Delimited (using a character such as a comma between data fields)
b. Fixed-length (using a space and a fixed width to indicate the field length)
* Text qualifiers or field header information if needed in the export file.
Once the user makes these decisions, data from the old software repositories is exported and subsequently imported into the new software system. Now the real fun begins. What exactly does one do with this data?
Data Mapping Challenges
Transferred data must arrive in the appropriate field in the new software system. Obviously, one does not want personal names transferred into inappropriate fields such as address or bar code. In addition, all software must distinguish among expected character, numeric, and date formats for data so that the new software will sort, index, display, and perform calculations correctly.
In some cases, data from old software fields must be split into two or more discreet units for loading into fields in the new software. An example is the case where first, middle, and last name are in one field in the old software, but must now be placed into separate fields in the new database for more sophisticated name sorting and reporting. Data conversion specialists achieve this feat - for a fee- by writing special programs to perform this function on a field-by-field basis during data conversion.
To transfer data from the old system's database fields into the correct fields in the new system requires a data map. This may be a conceptual understanding of how data will be exported and then imported into the correct fields in the new system.
If the systems are simple, with relatively little alteration in the basic units of data, then loading fields of data in proper order into the new database may dump the correct data into the required field. However, to do this one must understand precisely which fields in the old database map to which fields in the new database.
One method of generating a data map is to create a spreadsheet-style matrix that compares the old field names, new field names, data types, field lengths, and data validation notes. This effectively maps the data in the old database to the data locations and parameters in the new database. The number of fields that must be thoroughly understood and mapped in this manner can be daunting. Typical fields include:
* Personnel, user, and patron data: Name, address, phone number, fax number, e-mail address, and security level
* Records, reports, drawings, and other documents: Record number, title, author, media, date, folder number, and series number
* Boxes and containers: Box number, bar code number, date range, contents title, and shelf space location
* Retention basis or schedule: Regulation, citation, retention code, and calculation method
* Operational information: Current document holders, records hold status, special records or system notes
Outsourcing Data Conversion
Assuring that converted data functions properly requires considerable technical skill and significant attention to operational issues. One must be very familiar with the data that resides within the present system and understand the database structures, file formats, and program expectations of the software into which the new data will be loaded. Since most software vendors are reluctant to share detailed internal architectures of their products, they usually request that the present data be made available to them in ASCII delimited format or in a known database format such as Access, SQL Server, or Oracle.
Data conversion requires knowing each and every field in both systems, how the data is manipulated, and how the application works with the data. It also requires understanding how data validation routines must be run, the relationships among the database tables, and user expectations regarding queries and data retrieval speed. Special data conversion software may make execution and documentation easier, but the process will still require complete familiarity with export and import.
This is a challenging undertaking for most individuals, especially if they have other day-to-day duties. For this reason, most new system buyers outsource at least some data conversion to the software vendor. The activity can be approached with differing levels of responsibility on both sides. In some instances, the buyer assumes responsibility for offloading data from the old system into an acceptable format for loading. In cases where the data resides in a standard format such as COBOL data files, the vendor will assume offloading responsibility after learning how the old system operates. In most cases, the software vendor will insist on managing the actual data loading process to assure that the new software receives data that will function properly.
Data Quality Issues
A major issue in all conversions is data cleanliness or quality. Data may have errors or other anomalies even as it resides in the old system. Typical problems may include:
* Inconsistent data formats and special alphanumeric formats
* Acronyms due to limited field lengths
* Data in the wrong fields
* Improper or inconsistent date formats
* Inconsistent classification or naming conventions
* Commas or other punctuation (affects delimited ASCII data dumps)
* Outdated authority files, keywords, or other classification nomenclature
Should quality issues be addressed by a data cleanup or "massaging" effort while the data is in the old familiar system, or should a final data cleanup occur after the data is loaded into the new system? Most vendors prefer that data cleanup issues be addressed before the data loads into their software or during the actual data conversion process. The advantages are that the new system will operate properly with quality data, and that data cleanup in the old system is not dependent on training in the new system. Users may already be familiar with ways to quickly find data errors and correct them in the old system.
However, if the old system has limited reporting and data visualization capabilities, it may be better to address data anomalies after loading in the new software. For example, viewing data in older COBOL-based software systems gives few easy ways to sort data or query the system for specific data problems without actually programming new reports.
In contrast, once data is loaded into new database software, such as SQL Server, the SQL query language or modern report writers such as Seagate's Crystal Reports can be used to query data and locate anomalies for correction. In addition, searching, sorting, and changing data in SQL databases is much easier than performing these activities in COBOL fiat files. Performing data cleanup activities in the new software also provides a good way to familiarize personnel with the new system.
The Final Test
The new software system data must be thoroughly tested and compared to the data that was evident in the old system. After the final data load into the new software, it is best to execute reports and queries on both systems to compare the records recalled and determine if any records are missing. Record counts are especially important does the new system indicate that it has the same number of records as the old system? A variety of queries need to be run to determine beyond any doubt that no phantom records exist (i.e., records that are reported to have been seen at one time, but now seem to have disappeared).
Only a well-managed data conversion project and extensive data testing can prevent losing access to records that may be required for future business activities or to meet evidentiary requirements.
John T. Phillips, CRM, is the owner of Information Technology Decisions, a management consulting firm. He has more than 20 years' experience in information resources management, specializing in automated records management systems and other technology-related areas. He can be contacted at firstname.lastname@example.org
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||data migration planning; Technology|
|Author:||Phillips, John T.|
|Publication:||Information Management Journal|
|Date:||Oct 1, 1999|
|Previous Article:||Commercial regulation in the United States: a constitutional perspective.|
|Next Article:||Personal business records in an electronic environment.|