Printer Friendly
The Free Library
6,672,335 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Database Overload Overwhelms Database Administrators.


The business intelligence/data warehousing market is expected to grow at an annual aggregate growth rate of 43% through 2003 to reach more than $143 billion. The reasons for this astounding a·stound  
tr.v. a·stound·ed, a·stound·ing, a·stounds
To astonish and bewilder. See Synonyms at surprise.



[From Middle English astoned, past participle of astonen,
 growth are not hard to see. The value of the data contained in corporate databases is enormous and business intelligence applications and data warehouses are the tools that managers and knowledge workers use to turn this data into information that can be acted upon to gain or maintain a competitive advantage.

However, as databases, data marts A subset of a data warehouse for a single department or function. A data mart may have tens of gigabytes of data rather than hundreds of gigabytes for the entire enterprise. See data warehouse. , and data warehouses grow in size and complexity, it can take longer and longer to get information out of them. The problem arises from a fundamental aspect of relational database relational database

Database in which all data are represented in tabular form. The description of a particular entity is provided by the set of its attribute values, stored as one row or record of the table, called a tuple.
 technology: their schema of rows and columns in tables is designed for efficient storage of data, not necessarily efficient access. While it is possible to extract data in almost any conceivable combination, given a properly designed database, some queries take only seconds, while others can take hours or even days, and potentially bring the system to its knees. Database Administrators (DBAs) have devised many tactics to overcome the limitations of databases, but they can't cover all the bases.

Tables And Columns And Rows, Oh My!

Consider a typical sales database. It will consist of a number of tables. For instance, the customer table will consist of many rows of data, where each row is a unique customer. The first field in each row might be the customer ID number, followed by fields identifying first name, last name, address (that's actually several fields), phone number, and so forth. Another table might identify products with a unique product ID and fields identifying price, brand, inventory level, etc.

Now, imagine a simple query: how many customers are there in California? The simplest way to do this is called a table scan: merely read every row sequentially and count the number of times you find a STATE field containing CA. In a large database, of course, this will take a very long time. In this case, a simple index will greatly speed data access: in this case, indexing the customer table by the state field. Of course, this means that a Database Administrator (DBA) has to create the index and keep it up to date.

Now, consider a more complex query: how many customers in California bought something in December? Now, the database must perform what is called a table join: it must look at both the Orders table, qualifying the ORDER DATE field in terms of a date range, and the Customer table, looking at the STATE field. Table joins are also very slow. In this case, a DBA might tune the database by creating a summary table containing customer orders by month and, again, the DBA would have to keep it up to date.

An even more complex query would be to ask for the records of customers in California that bought a certain product in December. This is a multidimensional query Asking for a multidimensional view of data. See multidimensional views and MDX.  that requires building a multi-part key multi-part key - compound key  for another index or one might call for an aggregated result, for instance: what was the average sale per customer in California in December, which requires all of the above operations plus the mathematical averaging of the $AMOUNT fields from all the qualifying Orders. Again, a DBA can create and update a summary table to handle this kind of query.

Data marts and data warehouses face similar problems, made even more urgent by the wider range of queries users are likely to want to make against them. One approach is to create multiple, specialized "single-topic" data marts out of the main data warehouse; but this merely multiplies the problem. Multidimensional databases See OLAP.  optimized for complex queries are another approach, but they are limited in the size of the data sets with which they can deal.

Ironically, the booming market in OLAP (OnLine Analytical Processing) Decision support software that allows the user to quickly analyze information that has been summarized into multidimensional views and hierarchies. OLAP tools are used to perform trend analysis on sales and financial information.  and other analytical tools, which are deceptively de·cep·tive·ly  
adv.
In a deceptive or deceiving manner; so as to deceive.

Usage Note: When deceptively is used to modify an adjective, the meaning is often unclear.
 easy to use, has ended up increasing the magnitude of this database overload problem, for they make it easy for a user to submit a query that ends up selecting most of an entire table or calls for joining two immense tables. Imagine the network and processing impact of trying to deliver ten million rows of data to a desktop PC or the impact on database performance for all other users when it tries to join two multi-million row tables. For this reason, many DBAs try to limit the queries users are allowed to submit, but this is likely to eliminate many unforeseen queries that could actually deliver useful business information.

Victor Hugo Was Right

So the problem is one of knowledge and time. Although DBAs can overcome any given set of query problems by tuning, creating indexes, summary tables, and multiple data marts, or forbidding certain kinds of queries, they must know in advance what queries users want to make and would be useful, which requires domain-specific knowledge they often don't have. While 80% of database queries are repetitive and can be optimized in the ways discussed above, 80% of the ROI (Return On Investment) The monetary benefits derived from having spent money on developing or revising a system. In the IT world, there are more ways to compute ROI than Carter has liver pills (and for those of you who never heard of that expression, it means a lot).  from database information comes from the 20% of queries that are not repetitive. The result is a loss of business or competitive advantage because of an inability to access the data in corporate databases in a timely fashion.

Throwing more hardware at the problem doesn't really help, either. Big iron and multi-processing servers can certainly accelerate the CPU-intensive parts of the process, but the bottom line of database access is disk access, so the process is I/O bound Refers to an excessive amount of time getting data in and out of the computer in relation to the time it takes for processing it. Faster channels and disk drives improve the performance of I/O bound computers. See I/O intensive.  and I/O (Input/Output) The transfer of data between the CPU and a peripheral device. Every transfer is an output from one device and an input to another. See PC input/output.

I/O - Input/Output
 doesn't scale as fast as CPU CPU
 in full central processing unit

Principal component of a digital computer, composed of a control unit, an instruction-decoding unit, and an arithmetic-logic unit.
 power. You can get around this by putting the entire database in main memory, but the cost of RAM for a multigigabyte database is likely to be a showstopper showstopper - A hardware or (especially) software bug that makes an implementation effectively unusable; one that absolutely has to be fixed before development can go on. Opposite in connotation from its original theatrical use, which refers to something stunningly *good*. .

Victor Hugo said that there is nothing so powerful as an idea whose time has come. I was reminded of that saying during a recent briefing by Dynamic Information Systems Corporation (DISC), whose 15-year old Omnidex search engine technology overcomes many of the limitations that plague large databases. (Actually, although the basic idea behind Omnidex is that old, the product has been continually improved and extended as database technology evolved.)

Omnidex is basically a middleware Software that functions as a conversion or translation layer. It is also a consolidator and integrator. Custom-programmed middleware solutions have been developed for decades to enable one application to communicate with another that either runs on a different platform or comes from a  search engine that sits between desktop clients (such as OLAP and other analytical applications) and a back-end database A back-end database is a database that is accessed by users indirectly through an external application rather than by application programming stored within the database itself or by low level manipulation of the data (e.g. through SQL commands).  or data mart/data warehouse. It indexes the database using a combination of bit map and inverse list indexes and creates both multidimensional mul·ti·di·men·sion·al  
adj.
Of, relating to, or having several dimensions.



multi·di·men
 and aggregate indexes. Then, when the user submits a SQL SQL
 in full Structured Query Language.

Computer programming language used for retrieving records or parts of records in databases and performing various calculations before displaying the results.
 query from the client, Omnidex intercepts it and searches against its indexes, which are far smaller than the main database and are organized in a far more efficient fashion than is possible using ordinary relational databases or data warehouses. The result is, the company claims, about a 50-times speed-up in query processing.

The inverse list indexes used by Omnidex have many advantages that obviate ob·vi·ate  
tr.v. ob·vi·at·ed, ob·vi·at·ing, ob·vi·ates
To anticipate and dispose of effectively; render unnecessary. See Synonyms at prevent.
 the problems discussed above. Inverse lists reverse the usual structure of data by storing the data from the database as keys to permit rapid searching of the data content with pointers back to the database as data in the index for quick retrieval of the data records. For instance, one inverted list See inverted file.  index generated from a customer table might include all the states (AK, AL, AR, etc.) as keys (columns), with pointers back to the customer records containing each state as a field value.

Inverse lists can do many things that the B-tree indexes commonly used by a relational database cannot. They permit keyword searches that are case and position insensitive (an RDBMS (Relational DataBase Management System) See relational database and DBMS.

RDBMS - relational database
 can only do this via a table scan; its indexes cannot). Inverse list indexes can also return an up-front count of records that qualify for the current search query, permitting the user to back out of a query that would return ten million rows of data. (Even though Omnidex is only searching an index and the result of this search would not be catastrophic, it obviously wouldn't be very useful, either.) Inverse lists also support "anything by anything" queries against multiple columns (fields) without limitation as to size and, unlike database summary tables and aggregates, since these inverse list indexes store pointers to the original data, you can drill down from the query results to the original data to find out what's behind the results of your query.

For data warehousing See data warehouse.

data warehousing - data warehouse
 applications, the inverse list possesses similar advantages. Unlimited multidimensional analysis In statistics, econometrics, and related fields, multidimensional analysis is a data analysis process that groups data into two basic categories: data dimensions and measurements.  is made possible by the ability to index any number of columns from different tables and combine them in the selection criteria. This greatly enhances the usefulness of OLAP tools and other analytical applications and reduces the need to build multiple data marts. In addition, inverse list indexes can be easily updated, allowing their use with real time data.

Omnidex also uses bit-mapped indexes for very high-speed instant counts, keyword searches and similar types of queries. A bit-mapped index looks like a spreadsheet with the possible values as column headings and record numbers as row headings. If a possible value exists within a given record, the intersection cell contains a 1; otherwise, it contains a 0. Such a bitmap can be searched with blinding speed, as long as it represents "low cardinality A quantity relationship between elements. For example, one-to-one, one-to-many and many-to-one express cardinality. See cardinal number.

(mathematics) cardinality - The number of elements in a set. If two sets have the same number of elements (i.e.
" data; that is, data with only a few possible values. This kind of index wouldn't work very well for the state example above: a ten million record customer database would result in a 500MB bitmap! In addition, bit-mapped indexes are not easily updated.

Omnidex is not an inexpensive solution--it starts at about $70,000--so it only makes sense for very large databases. However, with the growth of e-commerce, especially for business-to-business transactions, the number of large databases out there is likely to grow very rapidly and there appears to be few solutions to match it. So, while Omnidex is not likely to appeal to any but a fraction of CTR's readers, for those who need it, it is very much an idea whose time has come.
COPYRIGHT 2000 West World Productions, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2000, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:Trowbridge, Dave
Publication:Computer Technology Review
Date:Jun 1, 2000
Words:1662
Previous Article:First Data And Entrust Ride The Next Wave In B2B E-commerce.
Next Article:Easier, Cheaper DSL: Replacing Central Office Hardware With Access Concentrator Technology.
Topics:



Related Articles
EXTENDED SYSTEMS INTRODUCES ADVANTAGE DATABASE SERVER 5.5.
VizServer navigate and understand data.(Inxight VizServer introduced by Inxight Software)(Brief Article)(Product Announcement)
NETWORK APPLIANCE SHIPS FIRST DAFS-ENABLED STORAGE SOLUTIONS.(DAFS Database Accelerator)(Product Announcement)
Performance Centre 1.7. (Database News).(Product Announcement)
Active archiving and HSM: best-practices. (Storage Networking).
COMPANIES REDUCE DATA MANAGEMENT COSTS.
Extending SRM with active archiving to manage the data life cycle. (Managing Data).
IANYWHERE DELIVERS STORAGE AS A UTILITY TO CUSTOMERS.
MySQL Administrator for Linux 1.0.12.(IT News)
ARIEL SOFTWARE INTRODUCES CAMPAIGN ENTERPRISE 9.

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles