The simple answer is to move inactive data into storage, thus freeing up server space and improving performance time. But reality--as usual is a lot messier. The main issues are:
* How do you define "inactive?" It's not as simple as a flat time period. For example, some data from a drug discovery trial may be three years old, but if the FDA wants it now, the pharmaceutical had better produce it yesterday.
* Once you consult business rules and policies and identify inactive data, how do you move it without continuous manual intervention? It's not easy archiving data in the first place, especially when from a relational database such as DB2, Oracle, Sybase, SQL Server, or Informix. These databases are usually spread across a myriad of tables sharing multiple relationships. Further, records may be in use at any given time, meaning that unless the archiving routine has some way to freeze the database without users noticing, it's possible to corrupt the data with no easy way to restore it.
* Where do you move the data? If a database must occasionally query inactive data, that data must remain available to it. That leaves out off-site tape storage. Near-line storage such as tape libraries and cheaper servers may help, but that begs the question--how does the database know where the inactive data resides, and how can it access it immediately when it's needed?
How do you tier storage and organize databases without sacrificing response times? Enter a relatively new term: active archiving. The concept itself dates back 30 years or more to early mainframes, where DBAs identified less frequently used data and moved it to less expensive storage. The effort was always time-consuming, and there have been many initiatives to automate the process. One especially famous approach--Hierarchical Storage Management (HSM)--rose and fell in the mid-90's due to poor implementation. But the concept refers to the ability to automatically move data between different levels of storage devices, depending on user-defined parameters. Active archiving, a subset of HSM, uses business policies to automatically identify less frequently used data. It then distinguishes between truly historical data and data that must remain immediately available, and moves the latter onto economical networked servers, such as a node with an attached disk farm.
For example, Princeton Softech offers advanced database tools that automate the data distinction process according to defined business rules. Princeton Softech Professional Services consults with the client to identify policies according to business rules and critical processes. Once customized with the resulting business policies, the application identifies data that is presently inactive but must remain available, and moves it to another networked server. Princeton Softech refers to this data as "active reference data." When queries are sent to the production database that may require the active reference data, the database transparently queries to this secondary location as well as the primary database files.
Jim Lee, Princeton Softech's VP of Product Marketing said, "It's a very simple concept, and the payback is multifold. But the technology is very complex." Lee sees three primary benefits of active archiving, including increasing performance by reducing size, reducing costs with less provisioning and manual intervention, and availability from faster backups and shortened maintenance time. Another important benefit is improved disaster recovery: If your data is 70% online and 30% reference data, you will save time by bringing up that 70% immediately. The 30%, which are active reference data, can come up later.
Sybase's Tom Traubitz, Senior Product Marketing Manager for Adaptive Server Enterprise (ASE), defined active archiving this way: "Let's take this to another level of abstraction--you are tiering performance of access, or directness of accessibility based on frequency of use. This idea of tiering storage has been around for a long time." What active archiving does offer is a level of innovation from computer scientists--automating the process of directing types of data into different holders by defining access patterns without human intervention. This work is based on patterns, not on predefined policies by user, application, or other parameters. "This is a fruitful area of research, and why it continues to advance." Traubitz sees this as active archiving's greatest contribution--reducing labor costs while managing storage based on temperature (classifying data by user patterns along a spectrum of hot to cold values). Sybase's Sethu Meenakshisundaram, Director of Server Development for ASE, agreed that the technology is complex. He pointed out that the main challenge for sensing and placing data based on usage patterns "is what's categorized as hot data or as cold data. It might change quickly in high transaction environments. So much data needs some kind of human intervention."
IBM approaches active archiving as part of a larger chain of events based more on business views than on technical storage. Jeff Jones, Director of Strategy for Data Management Solutions, defines archiving as a three-tiered approach that he calls atomic, micro, and compound levels. The levels combine to produce a high-level business approach to archiving, which is gathering archived data for business purposes, then conceptualizing for historical purposes. Active archiving occurs at the micro level.
Atomic level. In this lowest level, relational databases work with underlying storage software to keep the most active data easily accessible. Data management at this low level happens through hardware-based interventions such as memory pools and caching techniques. In DB2's case, the database application predicts the need for related data, then fetches and delivers the data to a fast access memory pool.
Micro level. This is the HSM concept, where applications such as Tivoli Storage Manager use predictive algorithms to define active and inactive data. But since active/inactive is not a binary idea--it is a spectrum--the application assigns the data to different types of hierarchical storage, often called online (fast disk), nearline (slower disk/fast tape), and offline (tape/optical). Jones remarked, "To the degree that you're good at predicting, you're good at HSM. The trick is, how good are you at predicting and managing the process of keeping the right data active?"
Compound level. Using middleware (Content Manager in IBM's case) to capture structured and unstructured data on any given subject, and to archive it together for ease of retrieval and management. Jones said that this level is "not so much about application speed, but better organization and better business intelligence, better collective understanding on the totality of electronic info."
Jones lists two primary challenges to successful active archiving implementations.
Educate the customer. When asked which data is most critical, they usually respond, "All of it." As this is not true--and is unmanageable in any case--the vendor needs to lead and negotiate with the customer to identify critical levels and data types. IBM follows up with intelligent strategies via middleware for sending and predicting the data under its control by application usage. "You must have strategy up front, but you must also have the tools working for you as a detective, to discover the rows and tables that work for you most frequently."
Good design. This includes good database design, good application design, and a good strategic design for prioritizing from a business view. Done this way, all elements work well in tandem with the predictive capability of databases.
In the market, the business case for moving inactive data off of production databases has existed for years. The new story is automation, which most vendors and analysts agree is a necessity given today's exploding data and static IT budgets.
|Printer friendly Cite/link Email Feedback|
|Publication:||Computer Technology Review|
|Date:||Feb 1, 2002|
|Previous Article:||ServPoint to lower total cost of networked storage.|
|Next Article:||Storage virtualization and the full impact of storage disruptions: relief and ROI.|