Printer Friendly
The Free Library
5,677,343 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Taming The Scalability Beast.


Database scalability is a many-headed hydra that's hard to define, let alone tame. This beast changes shapes at a whim with the definition changing according to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 a given user, vendor, or consultant. So to decide on a working definition for database scalability, we'll go with Richard Winter, president of Boston-based Winter Corp. He says scalability is "the ability to grow your system smoothly and economically as your requirements increase." Winter is not talking here about the simpler hardware aspect of scalable systems; he means scalable data size, speed, workload, and transaction cost, all elements that a company must measure and project at the beginning of the database planning process, not once their original database implementation is showing signs of bursting at the seams.

Why is scalability becoming so important, especially in the e-commerce world? Winter lists three primary factors for operational databases: huge numbers of concurrent users, the need for continuous availability, and extremely large stored-data volume.

* Huge numbers of concurrent users. With e-commerce handling larger and larger shares of retail products and services, it will soon require online, constantly updated databases that are being used directly by huge populations from 50 to 100 million and more. Many of these people won't need a PC, but will be accessing e-commerce sites from cheap and ubiquitous appliances. With transactions increasing as buyers pour onto the Web, e-commerce databases must be easily scalable, far more so than they truly are now.

* Continuous availability. Large numbers of concurrent users represent multiple time zones and mobile Internet Refers to gaining access to the Internet using a lightweight, handheld device. See Mobile IP, PDA, smartphone and mobile TV.  access, which means that significant e-business operations must be continuously available. The challenge of this level of availability is scaling database size and transaction volume, which grow quickly along with user volume.

* Large stored-data volume. Businesses are increasingly storing clickstream The trail of mouse clicks made by a user performing a particular operation on the computer. It often refers to linking from one page to another on the Web.  data in massive data warehouses and mining it. Winter says, "Clickstreams for large user populations are the biggest things around. It doesn't take long to accumulate a terabyte of clickstream when you have a large-scale, actively used Web site."

Measuring For Scalability Success

So how do you measure and plan for scalability in a high-transaction environment? Winter suggests a Fantastic Four This article or section needs copy editing for grammar, style, cohesion, tone and/or spelling.
You can assist by [ editing it] now.
 of data size, speed, workload, and transaction cost to define linear-scalable systems.

* Data size or "size up." It means that if your database size increases by a factor of x, given a constant hardware configuration, then your query response time will increase by no more than a factor of x. Example: If you increase your stored data volume from 100GB to 500GB and make no hardware changes, the system should increase its query and update transaction time from one second to no more than five seconds.

* Speed or "speed up." If you increase your hardware configuration's capacity by a factor of x, then your query response time will decrease by no less than a factor of x. Example: If you upgrade from one node to 64 and the node configuration is balanced for the task at hand, the system's transaction time should decrease from 64 seconds to no more than one.

* Workload or "scale up." If you increase the workload on your system by a factor of x, then you can maintain response time, throughput, or by increasing your capacity by a factor to no greater than x. Example If your transaction volume increased from 3.6 per hour to 3,600, you've got an increase of 1,000 percent. The system should continue to deliver the same response time with a capacity increase of no more than 1,000 times.

* Transaction cost. There are two considerations with transaction cost in a scalable system. First, workload increases should not increase transaction cost. If it costs seven cents to process an order when you have one processor, it should still cost no more than seven cents to process an order with 1,000 processors. Capacity should not have to increase faster than demand.

Second, if data size increases by a factor of x, transaction cost should increase by no more than a factor of x. Example: If a query consumes 15 cents of system resources (1) In a computer system, system resources are the components that provide its inherent capabilities and contribute to its overall performance. System memory, cache memory, hard disk space, IRQs and DMA channels are examples.  when it is run against a 100GB database, then it should consume no more than 75 cents of system resources on a 500GB database.

Solutions

There are, of course, software products that deal well with simpler types of questions of scalability. For example, Viathan's soon-to-be-released middleware product Internet Database System is an XML-based platform for developing and running Internet applications. This program handles the non-relational database needs of web-scale applications, per-user data such as customer data information, shopping carts, address-books, and clickstream data. Their product allows database administrators to add new servers without having to reprogram re·pro·gram  
tr.v. re·pro·grammed or re·pro·gramed, re·pro·gram·ming or re·pro·gram·ing, re·pro·grams
To program again.



re
 the existing database. (This is something its founders understand very well. They came from MSN (1) (MicroSoft Network) A family of Internet-based services from Microsoft, which includes a search engine, e-mail (Hotmail), instant messaging (Windows Live Messaging) and a general-purpose portal with news, information and shopping (MSN Directory). , where the main part of their jobs as database administrators was reprogramming Reprogramming refers to erasure and remodeling of epigenetic marks, such as DNA methylation, during mammalian development[1]. After fertilization some cells of the newly formed embryo migrate to the germinal ridge and will eventually become the germ cells  the database for ever-expanding hardware requirements. They got tired of getting calls to do this at 3 a.m. in the morning, which is why they must be the first guys ever to found an Internet start-up in order to rest.)

This type of approach is very useful with straightforward data and database installations, but if you're an integrator or consultant to an enterprise or an Internet heavy hitter heavy hitter
n.
One that is predominant, as in influence or power: "Especially when a candidate is a challenger, appearances with heavy hitters from the party lend an air of credibility" 
 and they need a heavy-duty workgroup or enterprise database like Oracle8, Microsoft SQL Server A relational DBMS from Microsoft that is a major component of the Windows Server System. It is Microsoft's high-end client/server database and is closely integrated with Microsoft Visual Studio and the Microsoft Office System. , or IBM (International Business Machines Corporation, Armonk, NY, www.ibm.com) The world's largest computer company. IBM's product lines include the S/390 mainframes (zSeries), AS/400 midrange business systems (iSeries), RS/6000 workstations and servers (pSeries), Intel-based servers (xSeries)  DB2, scalability issues mean intensive up-front planning and heavy up-front spending. This is particularly true in e-business, where customer data flows in at an alarming rate. That's great for business--until the database outgrows its own structure. Then, what do you do? Do you restructure, reprogram, rebuild and come up? Not when reprogramming a database can take months. Start too late, you're history.

Common scalability challenges exist in particular database environments such as OLAP (OnLine Analytical Processing) Decision support software that allows the user to quickly analyze information that has been summarized into multidimensional views and hierarchies. OLAP tools are used to perform trend analysis on sales and financial information. , where the system needs to effectively handle large volumes of historical data and provide real time answers. The approach found in traditional databases in dealing with large volumes of data is to use indexes. This results in the database engine scanning the indexes in high I/O (Input/Output) The transfer of data between the CPU and a peripheral device. Every transfer is an output from one device and an input to another. See PC input/output.

I/O - Input/Output
 RAM, avoiding the majority of the data stored in low I/O media. However, most data that an OLAP system needs to deal with cannot be indexed or cached without experiencing serious scalability problems. SeaTab Software's approach with PivotLink was to establish a finite amount of unique values in the key elements, which managed to curb the growth of data storage of an OLAP system. This resulted in a more scalable solution, as much of the data ended up in shared memory (1) Using part of main memory to support a low-cost display circuit that does not have its own memory. See shared video memory.

(2) The common memory in a symmetric multiprocessing system that is available to all CPUs. See SMP.

1.
 for fast processing while maintaining a more linear degradation pattern.

Other scalability challenges exist because of the nature of the project. An example is the Sloan Digital Sky Survey The Sloan Digital Sky Survey or SDSS is a major multi-filter imaging and spectroscopic redshift survey using a dedicated 2.5-m wide-angle optical telescope at Apache Point Observatory in New Mexico. The project was named after the Alfred P.  Archive Software, produced by the Johns Hopkins University Johns Hopkins University, mainly at Baltimore, Md. Johns Hopkins in 1867 had a group of his associates incorporated as the trustees of a university and a hospital, endowing each with $3.5 million. Daniel C. . The project is to produce an ultra high bandwidth database server that allows astronomers to capture data from five different filters that span the spectrum from the ultraviolet to the near infrared, detecting over 200 million objects in this area. Other phenomena such as spectra and redshifts will be measured to the brightest one million galaxies. The challenge is to enable astronomers to perform this and other large-scale surveys on the "Digital Sky" using multiple, Terabyte size databases interoperating seamlessly. Scalability represents the absolute need to balance the network speed, disk I/O, and CPU CPU
 in full central processing unit

Principal component of a digital computer, composed of a control unit, an instruction-decoding unit, and an arithmetic-logic unit.
 resources. (Johns Hopkins Noun 1. Johns Hopkins - United States financier and philanthropist who left money to found the university and hospital that bear his name in Baltimore (1795-1873)
Hopkins

2.
 believes that "astronomers will have to be just as familiar with mining data as with observing on telescopes"--and e-commerce thinks it has problems!)

Planning For Scalability

According to developer OOP/L (Object Oriented See object technology and object-oriented programming.  Pty., Ltd.), the first step in planning for scalability is a flexible system architecture. A component of that architecture might be a mature middleware product such as Orbix, which allows for various client and server objects to be easily redistributed, allowing for easier future expansion such as adding, migrating or relocating servers.

Another important scalability issue with relational databases is being able to efficiently handle high volume data access by using an object cache. Since objects in the cache are accessed at memory speed, this approach provides great flexibility in tuning the database to increase performance and throughput. Also consider how the system handles event logging, alarms and alarm filtering In network management, the ability to pinpoint the device that has failed. If one device in a network fails, others may fail as a result and cause alarms. Without alarm filtering, the management console reports all deteriorating devices with equal attention. , network and server performance monitoring, user requests monitoring, and high availability Also called "RAS" (reliability, availability, serviceability) or "fault resilient," it refers to a multiprocessing system that can quickly recover from a failure. There may be a minute or two of downtime while one system switches over to another, but processing will continue. . These kinds of issues must be met early in the design stage. Parallelism is another consideration. It depends on the number of servers the database utilizes, including cluster configurations or MPP (Massively Parallel Processing or Massively Parallel Processor) A multiprocessing architecture that uses up to thousands of processors. Some might contend that a computer system with 64 or more CPUs is a massively parallel processor. . The concept applies not only to the level of query performance, but also to any database management or data loading Copying data from one electronic file or database into another. Data loading implies converting from one format into another; for example, from one type of production database into a decision support database from a different vendor. See data entry.  such as index creation.

The new releases of Oracle8, DB2, and SQL Server all talk about how scalable they are and, to a certain extent, that's true. They're all selling in the e-commerce space where things are happening very quickly and incoming user data can quickly reach critical mass, but you can only apply scalability to your own installation by doing your upfront work.

Winter says, "The truth is that it's hard to make statements about this product's or that product's degree of scalability. If your company wants to implement a database with a scalability requirement that's near or beyond the frontier of prior experience, then you really can't be guided by any great extent by what anybody knows. You have to measure for yourself."

The database companies are, of course, extremely aware of the Internet market's fast-growing technological needs and expectations and of the tremendous challenges and opportunities this presents for highly scalable products. Of course, scalability is only a part of the database mix that includes integrated databases, middleware, and development tools. Aberdeen Group notes that database suppliers will need to add more multimedia and object support in the database engine, more push and ORB technology in the middleware, and more scalability support in the development toolset. As if that weren't enough, customers are also clamoring to incorporate oncoming technologies such as workflow or business-process support. It's a brave new world Brave New World

Aldous Huxley’s grim picture of the future, where scientific and social developments have turned life into a tragic travesty. [Br. Lit.: Magill I, 79]

See : Dystopia


Brave New World
.

Editor's note: Winter Corporation's DATABASE SCALABILITY PROGRAM 2000 is an annual survey that identifies and honors the world's largest databases.
COPYRIGHT 2000 West World Productions, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2000, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Industry Trend or Event
Author:CHUDNOW, CHRISTINE
Publication:Computer Technology Review
Date:Aug 1, 2000
Words:1682
Previous Article:ask THE SCSI EXPERT.(Questions and Answers)
Next Article:Addressing The Data Storage Dilemma With-Internet Protocol Storage Devices.(Industry Trend or Event)
Topics:



Related Articles
TRAINED LION ESCAPES, ROAMS ACTON HIGHWAY BEFORE CAPTURE.(NEWS)
WALDORF SIZES UP OPPORTUNITIES : HIS ROUND OF 69 HAS HIM IN CONTENTION - FOR SECOND PLACE.(Sports)
VIDEO A TALE AS OLD AS VHS.(U)(Review)
Remembering Homer (1955-2004).(Transitions)(Homer Avila)(Obituary)
There's a "great white" inside every SAN: and this man-eater's name is complexity.(SAN Trends)(storage area network)
Blue Titan Software and Wipro Technologies to deliver scalable SOA solutions to global enterprises.
Correction.(letters to the editor)(Correction Notice)
Beasts in the Closet.(Brief Article)(Children's Review)(Book Review)
BREWER VOWS ACTION TO AID L.A. STUDENTS FINANCIAL SUPPORT SOUGHT FROM VALLEY BUSINESSES.(News)
EDITORIAL TAMING THE BEAST BREWER SAYS HE'LL GET TOUGH WITH LAUSD BUREAUCRACY.(Editorial)(Editorial)

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles