Active archiving.High transaction databases grow as fast as summer weeds in overdrive (processor) Overdrive - An Intel Pentium processor which fits into a socket designed to accomodate an Intel 486, or into a special upgrade socket on the motherboard. . Database administrators (DBAs) have made strides in capacity planning Determining the required future configuration of hardware and software for a network, datacenter or Web site. There are numerous capacity planning tools on the market used to monitor and analyze the performance of the current hardware and software. , but it's hard to constantly configure and reconfigure database servers to handle their data loads. Meanwhile, performance suffers as databases take longer and longer to load, unload, search, reorganize re·or·gan·ize v. re·or·gan·ized, re·or·gan·iz·ing, re·or·gan·iz·es v.tr. To organize again or anew. v.intr. To undergo or effect changes in organization. , index, and optimize. This has a nasty impact on all sorts of database metrics, including response times, access to needed information, and service level agreements. The simple answer is to move inactive data into storage, thus freeing up server space and improving performance time. But reality--as usual is a lot messier. The main issues are: * How do you define "inactive?" It's not as simple as a flat time period. For example, some data from a drug discovery trial may be three years old, but if the FDA FDA abbr. Food and Drug Administration FDA, n.pr See Food and Drug Administration. FDA, n.pr the abbreviation for the Food and Drug Administration. wants it now, the pharmaceutical had better produce it yesterday. * Once you consult business rules and policies and identify inactive data, how do you move it without continuous manual intervention? It's not easy archiving data in the first place, especially when from a relational database relational database Database in which all data are represented in tabular form. The description of a particular entity is provided by the set of its attribute values, stored as one row or record of the table, called a tuple. such as DB2, Oracle, Sybase, SQL Server An earlier relational DBMS from Sybase and from Microsoft. Sybase introduced SQL Server in 1988 for various Unix versions. In that same year, with help from IBM, Sybase created an OS/2 version that Microsoft licensed and branded as Microsoft SQL Server. , or Informix. These databases are usually spread across a myriad of tables sharing multiple relationships. Further, records may be in use at any given time, meaning that unless the archiving routine has some way to freeze the database without users noticing, it's possible to corrupt the data with no easy way to restore it. * Where do you move the data? If a database must occasionally query inactive data, that data must remain available to it. That leaves out off-site tape storage. Near-line storage such as tape libraries and cheaper servers may help, but that begs the question--how does the database know where the inactive data resides, and how can it access it immediately when it's needed? Active Archiving Moving data to a secondary storage medium that can be readily accessed if required. Active archiving enhances the performance of production databases by eliminating records that are not accessed daily, but may be needed for reference from time to time. How do you tier storage and organize databases without sacrificing response times? Enter a relatively new term: active archiving. The concept itself dates back 30 years or more to early mainframes, where DBAs identified less frequently used data and moved it to less expensive storage. The effort was always time-consuming, and there have been many initiatives to automate the process. One especially famous approach--Hierarchical Storage Management (HSM (1) (Hierarchical Storage Management) The automatic movement of files from hard disk to slower, less-expensive storage media. The typical hierarchy is from magnetic disk to optical disc to tape. )--rose and fell in the mid-90's due to poor implementation. But the concept refers to the ability to automatically move data between different levels of storage devices, depending on user-defined parameters. Active archiving, a subset of HSM, uses business policies to automatically identify less frequently used data. It then distinguishes between truly historical data and data that must remain immediately available, and moves the latter onto economical networked servers, such as a node with an attached disk farm. For example, Princeton Softech offers advanced database tools that automate the data distinction process according to according to prep. 1. As stated or indicated by; on the authority of: according to historians. 2. In keeping with: according to instructions. 3. defined business rules. Princeton Softech Professional Services (job) professional services - A department of a supplier providing consultancy and programming manpower for the supplier's products. consults with the client to identify policies according to business rules and critical processes. Once customized with the resulting business policies, the application identifies data that is presently inactive but must remain available, and moves it to another networked server. Princeton Softech refers to this data as "active reference data." When queries are sent to the production database that may require the active reference data, the database transparently queries to this secondary location as well as the primary database files. Jim Lee, Princeton Softech's VP of Product Marketing said, "It's a very simple concept, and the payback Payback The length of time it takes to recover the initial cost of a project, without regard to the time value of money. is multifold mul·ti·fold adj. Numerous and varied; manifold. . But the technology is very complex." Lee sees three primary benefits of active archiving, including increasing performance by reducing size, reducing costs with less provisioning and manual intervention, and availability from faster backups and shortened maintenance time. Another important benefit is improved disaster recovery: If your data is 70% online and 30% reference data, you will save time by bringing up that 70% immediately. The 30%, which are active reference data, can come up later. Sybase's Tom Traubitz, Senior Product Marketing Manager for Adaptive Server Enterprise See ASE. (database) Adaptive Server Enterprise - (ASE) The relational database management system that started life in the mid-eighties s "Sybase SQL Server". For a number of years Microsoft was a Sybase distributor, reselling the Sybase product for OS/2 and (later) Windows (ASE (Adaptive Server Enterprise) A relational DBMS from Sybase that runs on Windows NT/2000, Linux and a variety of Unix platforms. ASE is a comprehensive and robust data management product with a long history dating back to the late 1980s. ), defined active archiving this way: "Let's take this to another level of abstraction--you are tiering performance of access, or directness of accessibility based on frequency of use. This idea of tiering storage has been around for a long time." What active archiving does offer is a level of innovation from computer scientists--automating the process of directing types of data into different holders by defining access patterns without human intervention. This work is based on patterns, not on predefined policies by user, application, or other parameters. "This is a fruitful area of research, and why it continues to advance." Traubitz sees this as active archiving's greatest contribution--reducing labor costs while managing storage based on temperature (classifying data by user patterns along a spectrum of hot to cold values). Sybase's Sethu Meenakshisundaram, Director of Server Development for ASE, agreed that the technology is complex. He pointed out that the main challenge for sensing and placing data based on usage patterns "is what's categorized cat·e·go·rize tr.v. cat·e·go·rized, cat·e·go·riz·ing, cat·e·go·riz·es To put into a category or categories; classify. cat as hot data or as cold data. It might change quickly in high transaction environments. So much data needs some kind of human intervention." IBM (International Business Machines Corporation, Armonk, NY, www.ibm.com) The world's largest computer company. IBM's product lines include the S/390 mainframes (zSeries), AS/400 midrange business systems (iSeries), RS/6000 workstations and servers (pSeries), Intel-based servers (xSeries) approaches active archiving as part of a larger chain of events based more on business views than on technical storage. Jeff Jones There are several notable people named Jeff Jones, including:
Atomic level. In this lowest level, relational databases work with underlying storage software to keep the most active data easily accessible. Data management at this low level happens through hardware-based interventions such as memory pools and caching techniques. In DB2's case, the database application predicts the need for related data, then fetches and delivers the data to a fast access memory pool. Micro level. This is the HSM concept, where applications such as Tivoli Storage Manager use predictive algorithms to define active and inactive data. But since active/inactive is not a binary idea--it is a spectrum--the application assigns the data to different types of hierarchical storage, often called online (fast disk), nearline (slower disk/fast tape), and offline (tape/optical). Jones remarked, "To the degree that you're good at predicting, you're good at HSM. The trick is, how good are you at predicting and managing the process of keeping the right data active?" Compound level. Using middleware Software that functions as a conversion or translation layer. It is also a consolidator and integrator. Custom-programmed middleware solutions have been developed for decades to enable one application to communicate with another that either runs on a different platform or comes from a (Content Manager in IBM's case) to capture structured and unstructured data Data that does not reside in fixed locations. Free-form text in a word processing document is a typical example. Contrast with structured data. See free-form database. on any given subject, and to archive it together for ease of retrieval and management. Jones said that this level is "not so much about application speed, but better organization and better business intelligence, better collective understanding on the totality of electronic info." Jones lists two primary challenges to successful active archiving implementations. Educate the customer. When asked which data is most critical, they usually respond, "All of it." As this is not true--and is unmanageable in any case--the vendor needs to lead and negotiate with the customer to identify critical levels and data types. IBM follows up with intelligent strategies via middleware for sending and predicting the data under its control by application usage. "You must have strategy up front, but you must also have the tools working for you as a detective, to discover the rows and tables that work for you most frequently." Good design. This includes good database design, good application design, and a good strategic design for prioritizing from a business view. Done this way, all elements work well in tandem Adv. 1. in tandem - one behind the other; "ride tandem on a bicycle built for two"; "riding horses down the path in tandem" tandem with the predictive capability of databases. In the market, the business case for moving inactive data off of production databases has existed for years. The new story is automation, which most vendors and analysts agree is a necessity given today's exploding data and static IT budgets. www.princetonsoftech.com www.sybase.com www.ibm.com |
|
||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion