Printer Friendly
The Free Library
14,792,844 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

A profile of adaptive addressing: a CopperEye technical white paper.


Introduction

CopperEye's Adaptive Addressing is a new general-purpose data indexing technology that out-performs conventional indexing techniques. CopperEye positions this technology as "quantum leap quantum leap
n.
An abrupt change or step, especially in method, information, or knowledge: "War was going to take a quantum leap; it would never be the same" Garry Wills.
" in the data indexing marketplace. This paper describes the new indexing technology and how it can be exploited. Adaptive Addressing has many aspects that appeal to application developers and database administrators alike; this new technology overcomes many of the drawbacks experienced with incumbent index technology.

This document discusses the general characteristics of indexes and how Adaptive Addressing compares with the current dominant index structures.

Background

Indexes are used in database and data storage/retrieval applications to enable fast access to stored data. Without indexes, applications are forced to scan entire data sets to find the required data. While indexes speed up data retrieval, they incur To become subject to and liable for; to have liabilities imposed by act or operation of law.

Expenses are incurred, for example, when the legal obligation to pay them arises. An individual incurs a liability when a money judgment is rendered against him or her by a court.
 a significant overhead during data set updates and a compromise must often be struck between data retrieval performance and data loading Copying data from one electronic file or database into another. Data loading implies converting from one format into another; for example, from one type of production database into a decision support database from a different vendor. See data entry.  performance.

Indexes can be categorised Adj. 1. categorised - arranged into categories
categorized

classified - arranged into classes
 according to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 the data types they handle. For example, an index may be used for accessing general-purpose scalar scalar, quantity or number possessing only sign and magnitude, e.g., the real numbers (see number), in contrast to vectors and tensors; scalars obey the rules of elementary algebra. Many physical quantities have scalar values, e.g.  data, such as text, number and date information. Alternatively, an index may be designed for handling structured data formats such as spatial or media data. However, the vast majority of data held in commercial database systems is of a scalar type. Scalar indexes are predominant pre·dom·i·nant  
adj.
1. Having greatest ascendancy, importance, influence, authority, or force. See Synonyms at dominant.

2.
 across all industries.

Indexes can also be categorised according to the resources they are optimised for - memory or disk. For example, an index may reside entirely in memory and/or may access entirely memory resident data. Such an index aims to minimise its memory footprint Memory footprint refers to the amount of main memory that a program uses or references while running. This includes all sorts of active memory regions like code, static data sections (both initialized and uninitialized), heap, as well as all the stacks, plus memory required to hold  and CPU CPU
 in full central processing unit

Principal component of a digital computer, composed of a control unit, an instruction-decoding unit, and an arithmetic-logic unit.
 usage. Whereas a disk-resident index focuses on reducing its disk access, since disks offer a slow medium for data access. Memory based Programs that hold all data in memory for processing. Almost all spreadsheets are memory based so that a change in data at one end of the spreadsheet can be instantly reflected at the other end.  indexes restrict both database growth and tolerance of system failure. Therefore, most commercial databases use disk resident indexes to provide sufficient database size and security.

Adaptive Addressing is a general-purpose scalar disk-optimised index. That is, Adaptive Addressing is suited to the majority of data stored in commercial systems.

For both brevity Brevity
Adonis’ garden

of short life. [Br. Lit.: I Henry IV]

bubbles

symbolic of transitoriness of life. [Art: Hall, 54]

cherry fair

cherry orchards where fruit was briefly sold; symbolic of transience.
 and clarity, specialist multi-dimensional data indexes and memory- optimised indexes will not be considered any further in this paper.

Throughout this paper, reference is often made to relational databases relational database

Database in which all data are represented in tabular form. The description of a particular entity is provided by the set of its attribute values, stored as one row or record of the table, called a tuple.
 as it is assumed the reader is likely to be familiar with them. However, Adaptive Addressing technology is not specific to relational databases and can apply to any data storage paradigm, such as object database, XML database A database that stores XML documents. There are two types. The first is the "XML-enabled database," which is a relational or object-oriented database that has been extended to hold XML data.  or flat file systems.

Industry Trends

Few radical innovations in disk-optimised scalar indexing have occurred over recent decades. Commercial database vendors have concentrated on improving performance through hardware scalability and operational parallelism An overlapping of processing, input/output (I/O) or both.

1. parallelism - parallel processing.
2. (parallel) parallelism - The maximum number of independent subtasks in a given task at a given point in its execution. E.g.
, which provide performance at a hardware cost; but even then, not all database applications are open to divide-and-conquer approaches.

The technology innovations that do address scalar data tend to optimise optimise - To perform optimisation.  query performance at great expense to loading performance or concurrency Operations that are performed simultaneously within the computer. For example, dual-core CPUs provide complete overlapping of two independent processes. See dual core, hyperthreading, multiprocessing, multitasking, multithreading, SMP and MPP.

concurrency - multitasking
.

Meanwhile, commercial database sizes are burgeoning and enterprises expect to exploit their databases more effectively. This exerts pressure on database performance and particularly the index structures used to navigate (1) "Surfing the Web." To move from page to page on the Web.

(2) To move through the menu structure in a software application.
 through the sea of data. Adaptive Addressing targets exactly these performance issues.

Dominant Indexes

The current dominant general-purpose scalar indexes used in commercial disk based (1) Refers to devices that use magnetic hard disks for storage. It often refers to portable devices such as digital music players that have hard disks rather than flash memory.

All desktop and laptop computers are presumed to have hard disks, and most servers have hard disks.
 database systems today are:

* B-Tree

* Hashed indexes

* Bit map indexes

The characteristics of these index structures are compared against Adaptive Addressing in this paper to position Adaptive Addressing against the incumbent technology. It is assumed throughout this document, that the reader is reasonably familiar with the structure of the above index types.

Each of the standard index structures has a number of variants that are used throughout industry For example, the B-Tree often surfaces as a B' Tree or B* Tree variant variant /var·i·ant/ (var´e-ant)
1. something that differs in some characteristic from the class to which it belongs.

2. exhibiting such variation.


var·i·ant
adj.
 However, from a general functionality and overall performance standpoint The Standpoint is a newspaper published in the British Virgin Islands. It was originally published under the name Pennysaver, largely as a shopping-coupon promotional newspaper, but since emerged as one of the most influential sources of journalism in the , there is little to distinguish the variants and this paper will focus its attention on the fundamental structure of each index type.

Each of the dominant index structures provides varying levels of functionality and performance. A database vendor will typically provide one or more of these as standard index types, which can be applied to data stores within the database. An application developer must choose a suitable index according to an application's functional, performance and operational requirements (programming) operational requirements - Qualitative and quantitative parameters that specify the desired capabilities of a system and serve as a basis for determining the operational effectiveness and suitability of a system prior to deployment. . Often, a developer is forced into a compromise decision because of conflicts between the application requirements and index performance constraints CONSTRAINTS - A language for solving constraints using value inference.

["CONSTRAINTS: A Language for Expressing Almost-Hierarchical Descriptions", G.J. Sussman et al, Artif Intell 14(1):1-39 (Aug 1980)].
.

Adaptive Addressing is a new alternative index structure that out-performs the current dominant structures and provides a powerful addition to the armoury of indexing techniques available to developers and database administrators.

Index Framework

An index provides a mapping from data keys to data addresses. The key labels the data and the address identifies its location. For example, a key may a customer name and the address is the location of the customer row within a customer table. The index maps from one or more keys to zero, one or more addresses. An index must support the following minimal interface in order to provide a useful mechanism for indexing scalar data:

* Insert a key and address pair into the index mapping.

* Delete To remove an item of data from a file or to remove a file from the disk. See file wipe, trash and undelete.

1. (operating system) delete - (Or "erase") To make a file inaccessible.
 a key and address pair from the index mapping.

* Retrieve a set of addresses from the index mapping that are associated with a key specification.

The insert or deletion deletion /de·le·tion/ (de-le´shun) in genetics, loss of genetic material from a chromosome.

de·le·tion
n.
Loss, as from mutation, of one or more nucleotides from a chromosome.
 of a key-address pair in an index structure is known as an index maintenance operation. Scanning or navigating (networking, hypertext) navigating - Finding your way around. Often used of the Internet, particularly the World-Wide Web.

A browser is a tool for navigating hypertext documents.
 an index structure to find key- address pairs is known as an index retrieval operation.

Within this common framework, indexes can vary enormously in their structure and access algorithms The following is a list of the algorithms described in Wikipedia. See also the list of data structures, list of algorithm general topics and list of terms relating to algorithms and data structures. . These variations lead to differences in the efficiency of index maintenance and retrieval operations; and differences in how keys are specified for retrieval operations.

Index Characteristics

When comparing index structures, it is useful to consider the following characteristics

* Key Behaviour. How successfully the index handles a diversity of key data.

* Retrieval Flexibility. What retrieval restrictions are enforced by the index

* Performance Scalability and Concurrency. How the index performs as data volumes and usage grows.

* Disk Binding. How the index mitigates disk performance.

* Resource Footprint The amount of geographic space covered by an object. A computer footprint is the desk or floor surface it occupies. A satellite's footprint is the earth area covered by its downlink. See form factor.

1.
. How much memory and disk space is consumed con·sume  
v. con·sumed, con·sum·ing, con·sumes

v.tr.
1. To take in as food; eat or drink up. See Synonyms at eat.

2.
a.
 by the index.

* Workload The term workload can refer to a number of different yet related entities. An amount of labor
While a precise definition of a workload is elusive, a commonly accepted definition is the hypothetical relationship between a group or individual human operator and task demands.
 Adaptability a·dapt·a·ble  
adj.
Capable of adapting or of being adapted.



a·dapta·bil
. How the index can adapt to application requirements.

Each of the above is considered in detail below for each index structure in the following sections.

Key Behaviour

Keys vary, but not all index structures are suitable for all key types and distributions. For example, Bit Map indexes do not handle high cardinality A quantity relationship between elements. For example, one-to-one, one-to-many and many-to-one express cardinality. See cardinal number.

(mathematics) cardinality - The number of elements in a set. If two sets have the same number of elements (i.e.
 keys well.

The aspects of key behaviour that influence index suitability include

* Type. The data type and how it is represented.

* Size. The maximum key size applicable.

* Cardinality. The proportion of distinct key values across the key population.

* Distribution. The clustering of key values across the key domain.

The aptitude of an index for certain key characteristics is fundamentally determined by its structure. For example, indexes that store the native key value internally are likely to suffer performance degradation DEGRADATION, punishment, ecclesiastical law. A censure by which a clergy man is deprived of his holy orders, which he had as a priest or deacon.  with large key sizes, because of the direct impact on index size.

Table 1 below presents a chart of index suitability against key behaviour.

The table illustrates that B-Trees do not tolerate tol·er·ate
v.
1. To allow without prohibiting or opposing; permit.

2. To put up with; endure.

3. To have tolerance for a substance or pathogen.
 large keys; while hash indexes can suffer performance problems with low cardinality or clustered key distributions; and bit map indexes do not handle high cardinality keys well. However, Adaptive Addressing provides good all-round suitability regardless of the key characteristics.

Retrieval Flexibility

The possible key specifications for retrieving data addresses from an index include:

* Exact Key Match. Only keys that exactly match the specified key are returned. For example, return all employees whose department is "SALES".

* Range Match. All keys that fall within the specified range are returned. For example, return all employees whose salaries are between $20,000 and $50,000.

* Pattern Match. All keys that match the specified key wildcard See wild cards and wildcard mask.  are returned. For example, return all employees whose surnames are like SMITH or SMYTHE etc.

* Token Match. All keys that contain the specified tokens are returned. For example, return all employees whose skill description includes the words "C++" and "ORACLE".

An index will support one or more of these specifications, but typically not all.

The retrieval methods supported by an index are fundamentally determined by its internal structure. Table 2 presents a summary of the retrieval options available for each index.

It is clear that Adaptive Addressing provides the greatest flexibility for key retrieval, whereas Hash and Bit Map are the most restrictive.

Performance Scalability and Concurrency

The size of an index structure typically grows with the volume of key data mapped and, as a result, the efficiency of the index is likely to degrade TO DEGRADE, DEGRADING. To, sink or lower a person in the estimation of the public.
     2. As a man's character is of great importance to him, and it is his interest to retain the good opinion of all mankind, when he is a witness, he cannot be compelled to disclose
. If performance does degrade significantly, this should ideally happen gracefully grace·ful  
adj.
Showing grace of movement, form, or proportion: "Capoeira is a graceful ballet of power and control, artists kicking and jumping in synchronized movement" Alisa Valdes.
, or at the very least, predictably. The features of an index structure that most affect performance are:

* Growth. The rate of structural growth against increasing key volumes. The less an index grows the less impact on its performance.

* Maintenance Scope. The extent of the structure affected by maintenance operations. The more extensive any change in index structure, the more impact there is on its maintenance overhead.

* Retrieval Scope. The extent of the structure scanned by retrieval operations. An index that scans a greater proportion of its structure will be more affected by index growth.

The scalability of an index is largely decided by how the index structure grows, how much of an index structure is affected by an index maintenance operation; and how much of an index structure must be scanned during a retrieval operation. Indexes that grow rapidly and require extensive structural re-organisation typically scale badly for index maintenance operations. Indexes that grow rapidly and require exhaustive structure scanning typically scale badly for index retrieval operations.

Performance scalability determines the suitability of the index in large volume applications and the level of administration or re-engineering required as volumes continually grow.

The concurrency of an index is largely determined by how much of an index structure is affected by an index maintenance operation. Indexes that require extensive structural re-organisation during index maintenance operations typically exhibit poor concurrency. This arises where large portions of the index structure must be locked away from other index users for the duration of a maintenance operation.

In multi-user or multi-process environments, concurrency can be a prime factor in the overall throughput The speed with which a computer processes data. It is a combination of internal processing speed, peripheral speeds (I/O) and the efficiency of the operating system and other system software all working together.

1.
 of the system.

Table 3 below presents a summary of the performance characteristics for each index.

Adaptive Addressing provides a good order of growth with local scope for both maintenance and retrieval operations; and hence exhibits good scalability and concurrency.

Disk Binding

A disk resident index naturally incurs disk I/O (Input/Output) The transfer of data between the CPU and a peripheral device. Every transfer is an output from one device and an input to another. See PC input/output.

I/O - Input/Output
 costs during maintenance and retrieval operations. Given the high latency (1) The time between initiating a request in the computer and receiving the answer. Data latency may refer to the time between a query and the results arriving at the screen or the time between initiating a transaction that modifies one or more databases and its completion.  of disk operations, this tends to dictate TO DICTATE. To pronounce word for word what is destined to be at the same time written by another. Merlin Rep. mot Suggestion, p. 5 00; Toull. Dr. Civ. Fr. liv. 3, t. 2, c. 5, n. 410.  the performance aspects of the index. A typical mechanical hard disk cannot support more than about 200 random I/O operations per second. Therefore, it is important to keep the number disk read and write requests low for the maintenance or retrieval of any given key-address pair.

When an index size is sufficiently small sufficiently small - suitably small  to fit within available physical memory, the disk bound nature of the index is likely to be masked A state of being disabled or cut off.  by cache effects. Moreover, an index with insufficient memory will incur unnecessary and excessive disk I/O traffic. Therefore, it is assumed here that each index operates within its nominal memory configuration.

Table 4 (right) presents a summary of the disk I/Os incurred, averaged over each key-address pair inserted and queried from maintenance and retrieval operations respectively. It is assumed that keys are inserted and retrieved in random order.

Thus Adaptive Addressing is significantly less disk - I/O bound Refers to an excessive amount of time getting data in and out of the computer in relation to the time it takes for processing it. Faster channels and disk drives improve the performance of I/O bound computers. See I/O intensive.  during index maintenance operations than either its B-Tree or Hash counterparts. Adaptive Addressing retrievals are also potentially less disk-bound than B-Trees for queries that return moderate numbers of key-address pairs.

Resource Footprint

A disk resident index occupies both disk space and memory.

Table 5 gives approximate estimates for the disk and memory occupancy expected with each index type.

Adaptive Addressing occupies the middle ground in resource efficiency, requiring more resources than Hash Indexes or Bit Maps, but typically less than a B-Tree. In general, Adaptive Addressing can consume less resources relative to a B-Tree when

* Key sizes are greater than 8 bytes long

* Retrieval flexibility is sacrificed

* Maintenance efficiency is favoured over retrieval efficiency

Workload Adaptability

Applications vary in their functional and performance demands of an index. For example, a fraud application will typically need to continuously load large volumes of transactional data; but will only need to query transactions in response to infrequent in·fre·quent  
adj.
1. Not occurring regularly; occasional or rare: an infrequent guest.

2.
 suspicious activity.

Standard indexes offer little or no adaptability for workload requirements; whereas the structure of Adaptive Addressing allows the following requirements to be controlled on an individual index basis.

* Retrieval flexibility can be balanced against resource and operational efficiency. It is possible to reduce query flexibility to achieve minimal resource usage and maximize operational performance.

* Maintenance efficiency can be balanced against retrieval efficiency. It is possible to increase maintenance performance at the expense of query performance or vice versa VICE VERSA. On the contrary; on opposite sides. .

* Operational efficiency can be balanced against resource efficiency. It is possible to increase operational performance at the expense of resource efficiency or vice versa.

Summary

Adaptive Addressing is a good general purpose indexing technology with an excellent performance profile It allows application developers and database administrators to

* Reduce memory usage

* Reduce disk usage

* Proliferate pro·lif·er·ate
v.
To grow or multiply by rapidly producing new tissue, parts, cells, or offspring.
 indexes for better query performance with minimal loading performance impact

* Implement efficient composite key indexes

* Tune indexes to meet the workload

These aspects promote faster application performance overall.

Implementation

Adaptive Addressing in itself is a pure technology, it is currently implemented within two products. The current products implement Adaptive Addressing in Java, C++ and Oracle database environments.

* Adaptive Addressing for Java/C++. Provides an API (Application Programming Interface) A language and message format used by an application program to communicate with the operating system or some other control program such as a database management system (DBMS) or communications protocol.  library for indexing application data directly.

* Adaptive Addressing for Oracle. Provides a plug-in to an Oracle database, allowing direct replacement of conventional indexes with Adaptive Addressing.

This section of the document discusses the features available within each of the product implementations.

General Features

All implementations of Adaptive Addressing provide the following general features

* Transactional isolation and control. Concurrent sessions experience transaction isolation and can create save-points and commit or rollback A DBMS feature that reverses the current transaction out of the database, returning the data to its former state. A rollback is performed when processing a transaction fails at some point, and it is necessary to start over. See two-phase commit.  transactions as required.

* Read-Committed Queries. Queries only include transactions committed by other sessions at the start of the query and any uncommitted operations applied within the query session's transaction.

* On-line backup and recovery mechanisms. Index files can be backed-up either cold or hot. On-line redo log In the Oracle RDBMS environment, redo logs comprise files in a proprietary format which log a history of all changes made to the database. Each redo log file consists of redo records.  files can be generated from index maintenance operations and indexes can be undone or redone re·done  
v.
Past participle of redo.
 to a point in time from redo log replay.

* Index Administration. Indexes can be created, dropped, truncated truncated adjective Shortened  and partitioned par·ti·tion  
n.
1.
a. The act or process of dividing something into parts.

b. The state of being so divided.

2.
a.
. Partitioning To divide a resource or application into smaller pieces. See partition, application partitioning and PDQ.  provides multi-level disk scaling with multi-threaded index maintenance and retrieval operations.

Adaptive Addressing for Java/C++

The Java/C++ product provides a rich set of APIs for implementing Adaptive Addressing indexes within an application'.

* Index creation

* Index opening and closing

* Index truncation

* Index drop

* Single key-property insert

* Bulk key-property insert

* Single key-property delete

* Bulk key-property delete

* Transactional session creation and closure

* Session save-point, commit and rollback

* Query cursor (1) The symbol used to point to some element on screen. On Windows, Mac and other graphics-based screens, it is also called a "pointer," and it changes shape as it is moved with the mouse into different areas of the application.  creation and closure

* Single cursor fetch Bulk cursor fetch

* Index on-line backup

* Index on-line undo To restore the last editing operation that has taken place. For example, if a segment of text has been deleted or changed, performing an undo will restore the original text. Programs may have several levels of undo, including being able to reconstruct the original data for all edits  and redo To reverse an undo operation. See undo.  logging

* Index undo and redo

* Index tuning

Adaptive Addressing for Oracle

The Oracle product provides a seamless plug-in to the Oracle database, which allows Adaptive Addressing indexes to be created on tables within the database The plug-in provides:

* A package for index creation and administration

* Automatic index maintenance

* Query operator for index retrieval

* Integrated database transaction control

* Database recovery and roll-forward synchronisation Noun 1. synchronisation - the relation that exists when things occur at the same time; "the drug produces an increased synchrony of the brain waves"
synchroneity, synchronicity, synchronism, synchronization, synchronizing, synchrony
 

General Product Limitations

The following general product limitations are currently in force:

* A composite key can contain up to 32 key segments.

* A single composite key segment cannot exceed 255 bytes.

* Pattern and token matching cursor functionality will not be commercially released until 01 2002

The above restrictions arise from product implementation strategies and not from the core technology itself. Some of these limitations may be lifted in future releases.

Oracle Product Limitations

The following Oracle plug-in product specific limitations are currently in force:

* The Oracle plug-in supports Oracle 8.1 6.0.3 or later.

* Adaptive Addressing indexes cannot enforce unique and primary key constraints.

* Adaptive Addressing indexes cannot be created on partitioned tables with ROW MOVEMENT ENABLED.

* Up to 32 Adaptive Addressing indexes can be built on a single table.

* Up to 8192 Adaptive Addressing indexes can be built in a single Oracle instance.

The above restrictions arise from product implementation strategies and Oracle limitations, not from the core technology itself. Some of these limitations may be lifted in future releases.

Environments

Adaptive Addressing products are implemented in Java and are available in all tier- one environments, namely

* Sun/Solaris

* IBMIAIX

* Compaq/True64

* HP/HP-UX

Future Products

The following additional product releases are planned:

* Adaptive Addressing EJB (Enterprise JavaBeans) A software component in Sun's J2EE platform, which provides a pure Java environment for developing and running distributed applications. EJBs are written as software modules that contain the business logic of the application.  Component. A plug and play component for Java enterprise development environments. Provides Java developers with a fast path to adopting Adaptive Addressing technology.

* Adaptive Addressing CORBA (Common Object Request Broker Architecture) A software-based interface from the Object Management Group (OMG) that allows software modules (objects) to communicate with each other no matter where they are located on a private network or the global  Component. A plug and play component for CORBA enterprise development environments. Provides developers with a fast path to adopting Adaptive Addressing technology.

* Adaptive Addressing Index loader A program routine that copies a program into memory for execution. . An ultra-fast data file loader for both Oracle and non-Oracle environments.

Intellectual Property

Adaptive Addressing is a proprietary indexing technology. The Intellectual Property Rights are wholly owned by CopperEye ltd and are protected by a series of patents in Europe and North America North America, third largest continent (1990 est. pop. 365,000,000), c.9,400,000 sq mi (24,346,000 sq km), the northern of the two continents of the Western Hemisphere. .
Behaviour              Adaptive Addressing   B-Tree   Hash      Bit Map

Character Key          Yes         Yes       Yes      Yes (1)
Number Key             Yes         Yes       Yes      Yes (1)
Date Key               Yes         Yes       Yes      Yes (1)
Composite Key          Yes         Yes       No (2)   No (2)
Large Key Size (7)     Yes         No (8)    Yes      Yes
Low Cardinality        Yes (3)     Yes (3)   No (4)   Yes
High Cardinality (5)   Yes         Yes       Yes      No
Highly Clustered       Yes         Yes       No (6)   Yes

(1) A Bit Map is only suitable if the key domain is sufficiently small
or can be surrogated to a smaller domain.

(2) Composite keys can only be handled by surrogating the
composite key combination, which restricts the key combinations
that can be retrieved.

(3) The index structure can adequately handle low cardinality
keys, but may not be as efficient as a bit map index.

(4) A hash index will typically suffer collision problems with
low cardinality key data.

(5) High cardinality is assumed to be over 216 distinct potential
key values.

(6) A hash index will typically suffer collision problems with highly
clustered key data.

(7) A large key size is assumed to be over 255 bytes long.

(8) A B-Tree stores multiple native keys in a disk block. As the key
length increases and the number of keys stored per block reduces,
the overall performance of the index degrades.
Retrieval           Adaptive Addressing   B-Tree   Hash     Bit Map

Exact Match (1)     Yes             Yes   Yes      Yes
Range Match (2)     Yes             Yes   No       No (5)
Pattern Match (3)   Yes             No    No       No
Token Match (4)     Yes             No    No       No

Notes:

(1) An SQL example of an
exact match is
SELECT ... WHERE column =
value.

(2) An SQL example of a
range match is
SELECT ... WHERE column
BETWEEN min AND max.

(3) An SQL example of a
pattern match is
SELECT ... WHERE column LIKE wildcard.

(4) An SQL example of a token match is
SELECT ... WHERE INSTR(column,'
`[parallel] includedtoken [parallel]' `)>0
AND INSTR(column,' `[parallel] excludedtoken [parallel]' `)=0

(5) Bit Map indexes can potentially support range matching, but
its low cardinaly key restriction usually makes it pointless.
Performance Factors   Adaptive Addressing   B-Tree   Hash      Bit Map

Order of Growth (1)   N          N.log(N)   N        log(N)
Maintenance Scope     Local      Local      Local/Global (2)   Local
Retrieval Scope       Local      Local      Local    Global
Good Scalability      Yes        Yes        No (3)   Yes
Good Concurrency      Yes        Yes        No (3)   No (4)

Notes

(1) The order of growth is the relative change in structure size
compared to a change in key volume size (N). The calculations
represent rough estimates of behaviour, rather than specific size
calculations.

(2) A hash index has a maximum predetermined capacity which,
when exceeded, requires an index rebuild and hence exhibits
occasional global maintenance scope.

(3) A hash index exhibits good volume scalability and concurrency
until its predetermined capacity is exceeded.

(4) Most bit map implementations exploit some form of data
compression. This compression creates dependency between
areas of the index and increases the scope of maintenance operations.
Resource           Adaptive Addressing
Map

Memory Footprint   N(A+CK)/[2.sup.S] (5)(6)   N[(K+A).sup.2]/F2B (1)(3)
Disk Footprint     2N(A+CK) (5)               N(K+A)/F (3)

Resource           B-Tree              Hash       Bit
Map

Memory Footprint   0 (2)               B
Disk Footprint     NK/(1/F-1) (2)(3)   NC/B (4)

Legend:

A = address size

B = disk block/cluster size

C = compression factor (between 0 and 1)

F = block fill factor (between 0 and 1)

K = key size (total across all composite
segments)

N = number of key-address pairs

S = bias (1-255)

Notes:

(1) Assumes that all blocks other than leaf
blocks are in memory. However, this calculation
only includes the penultimate level
and assumes that higher levels are insignificant.
Although the upper levels become
more significant with large key sizes, this is
likely to be compensated by key prefix
compression, where used.

(2) Assumes hash clustering of the underlying
data store.

(3) The block fill factor F is typically in the
region of 0.75.

(4) The effective compression factor C depends on the key encoding
method and the sequence of keys involved.

(5) The compression factor C depends on structure of the key and
the retrieval methods required. It can be as low as 0.01 (re. 1% of
the original key space).

(6) The bias S is a dynamic factor that can be increased to favour
maintenance efficiency over retrieval efficiency.
Operation           Adaptive Addressing      B-Tree   Hash       Bit
                                                                 Map

Maintenance (3)     < 1 (11)       2+ (1)    2 (2)    < 1 (6)
Retrieval Min (4)   (A+CK)/B (9)   (A+K)/B   0 (5)    C/B (8)
Retrieval Max (7)   S (10)         1         0 (5)    NC/B (8)

Legend.;

A = address size

B = disk block I/O
size

C = compression
factor (between 0
and 1)

K = key size (total across all composite segments)

N = number of key-address pairs

S = bias (1-255)

Notes:

(1) To insert a key-address pair requires a
read I/O to fetch the relevant leaf block and
a write I/O to update it with the new pair (2
I/Os). Additional I/Os may be required to
effect changes in upper levels of the tree.

(2) A hash index may either store addresses
in independent hash clusters or may enforce
hash clustering of the underlying data store.
Either way, inserting a key-address pair will
require a read I/O to fetch the relevant hash
cluster and a write I/O to update it with the
new pair (2 I/Os).

(3) A maintenance operation to insert a
single key-address pair. The I/O figures
relate purely to index overhead

(4) A best-case retrieval operation that
retrieves multiple key-address pairs to
amortize the I/O cost: The I/O figures relate
purely to index overhead.

(5) This assumes hash clustering of the underlying data store. If
addresses are stored in independent hash clusters then at least 1
additional read I/O will be incurred.

(6) Index I/O cost is usually amortized across multiple key-address
pairs because of the compact nature of the index and especially
where inserts are concentrated in a data store locality, such as
when appending consecutive entries.

(7) A worst-case retrieval operation that retrieves a single
key-address pair. The I/O figures relate purely to index overhead.

(8) The effective compression factor C depends on the key
encoding method and the sequence of keys involved.

(9) The compression factor C depends on structure of the key and
the retrieval methods required. It can be as low as 0.01 (i.e. 1% of
the original key space).

(10) The bias S is a dynamic factor that can be increased to favour
maintenance efficiency over retrieval efficiency.

(11) Typically less than 0.1 and can be as low as 0.001.


Further Information

For further information, please contact marketing@CopperEye.com.
COPYRIGHT 2001 A.P. Publications Ltd.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2001, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Publication:Database and Network Journal
Geographic Code:1USA
Date:Oct 1, 2001
Words:3998
Previous Article:70% Of Call Centre Fail To Meet Targets.(Optium's survey)
Next Article:Data protection report.
Topics:



Related Articles
HOW TO WRITE BETTER WHITE PAPERS.
Defying conventional wisdom - an alternative perspective on data warehousing. (Defying Conventional Wisdom).
UK businesses predict massive growth in information, low IT spend and poor systems. (IT News).
Spam Blocker.(Security)
Impact of key locality on database performance.(Database Technique)
Meeting the RFID challenge.(FEEDBACK)(radio frequency identification)(Letter to the Editor)
Smart recommendation for an evolving e-learning system: architecture and experiment.
One-of-a-kind securitization forum.(Banking & finance)
Timing belts.(Equipment)

Terms of use | Copyright © 2010 Farlex, Inc. | Feedback | For webmasters | Submit articles