Concurrent operations on extendible hashing and its performance.
Since the inception of database systems, the processing demands have been increasing more rapidly than the power of transaction processing systems. Database size has grown to several orders of magnitude and is continuously increasing. The size of any present-day database is usually in the range of gigabytes. The massive size of a database is one of the main problems in designing efficient data structures and algorithms that process them efficiently.
The most frequently used data structures in databases are B-trees  or its variations as main data structures, and a combination of dynamic and static hashing [4, 6, 9, 10] as auxiliary data structures. B-tree and its variations are well-suited for disk-based systems, but they are unable to grow and shrink efficiently and dynamically. A dynamic data structure called Extendible Hashing that can accommodate expansion and contraction of data concurrently with other operations on it has been discussed in . Extendible hashing does not require any major reorganization and appears to be a powerful data structure for database systems.
The power of a data structure can be utilized only with the help of efficient algorithms. In this article we present an algorithm that manages the execution of concurrent operations on extendible hashing. Our algorithm has its origin in . It uses the verification process at the right moment during the execution of concurrent operations to guarantee data consistency, eliminating, to some extent, the need for locking the directory. The verification process minimizes the number of atomic actions for serializing concurrent operations and reduces the locking overheads without affecting the performance. In  a similar algorithm has been reported that locks the data page as well as the entire directory in order to manage concurrent operations. Our algorithm, however, is more flexible in the sense that it allows higher concurrency and reduced locking overheads.
The directory expansion and contraction is supported by using a new lock mode that allows the directory contraction and expansion to go concurrently with record modifications. In some cases they also allow page split and/or page merge to go concurrently with directory modification without threatening consistency.
The behavior of the algorithm has been studied using a simulation model. We have compared the performance of this algorithm (we call this a dynamic scheme) to the one that does not support directory expansion or contraction (a fixed scheme) . Our study shows that, for a medium to heavily loaded system, directory contraction and expansion have some distinct advantages. For a lightly loaded system, the dynamic scheme is capable of managing a higher multi-programming level than the static scheme. In a heavily loaded system the dynamic scheme provides a higher level of concurrency. The extra overhead required to support the directory modifications does not offset the performance gained due to increased concurrency.
Extendible hashing (EH) is a file-structuring and file-searching technique in which the user is guaranteed no more than a two-page access to locate the data associated with a given key . It is a dynamic structure that grows and shrinks gracefully with the database. The data structure consists of two levels: directory and data pages. Directory resides on the first level and all the data pages reside at the second level. At any time a directory entry may point to one or more data pages. The number of entries in the directory is represented by 2[.sup.g], where g is defined to be the global depth of the directory. A hash function, H, transforms a primary key into a pseudo key in a bit form of a pre-defined length. The first g bits of this pseudo key are used to index the directory in order to find the entry that points to the desired data page. Each data page is associated with a local depth d (d < g). A data page holds all those records whose first d bits of the pseudo keys are the same and splits into two distinct pages when it overflows. When a page splits, the local depths of these sibling pages become equal to one more than the local depth of the parent page. One of the split data pages is characterized by a bit pattern, which is the old bit pattern concatenated with an additional bit of 0 and the other with the bit of 1.
Figure 1 illustrates the initial state of an EH organization. Figure 2 illustrates a page split and a directory expansion. The initial page was full, and to accommodate a new record a split occurred. Since d = g, the directory size is doubled and the values of g and d are changed as shown in Figure 2. A second page split and directory expansion are shown in Figure 3. It should be noted that whenever d = g, the directory doubles in size.
Modification to the Directory
Some modifications to the directory are necessary for implementing our algorithm. The modified directory is given in Figure 4. Each directory entry has two additional fields: verification bits (VB) and page status (PS). The verification bits (VB) of a directory entry are the most significant g bits of the pseudo key. This is in fact the same bit string as the directory entry prefix. The page status (PS) field of a directory entry stores the total number of records, which the corresponding data page is currently holding. In Figure 4, for example, the value of the PS field of Page p1 is 1 since currently it is holding only one record. The number of bits in a VB field increases when the directory expands, and it decreases when the directory shrinks. A data page, in addition to data records, stores the value of its local depth and is modified whenever the page is split or merged with another page.
We shall call our algorithm Extendible Hashing Cautious Waiting (EHCW). It is based on the two-phase locking policy . A conflict between two operations over the directory or a data page is resolved either by rolling back or blocking one of the conflicting transactions. Directory search is non-atomic, i.e., the directory is not locked during a search operation. It is made non-atomic by the use of the verification process. The verification process is applied at the directory level and at the data page level. The directory level verification obtains the correct directory entry pointing to the desired page. The page level verification gets the correct page containing the desired record. A search may proceed concurrently with the directory expansion and contraction. Accessing and processing of the data items are done under write (exclusive) and read (shared) locks. Concurrent directory expansion or contraction is serialized with the help of a verification scheme, a new lock mode called contraction/expansion (ce), and by rolling back or blocking conflicting transactions. The compatibility matrix defining the relationship among these lock modes is given in Figure 5. The algorithm is composed of three phases: Search, Verification, and Modification. The verification is common to both the search and the modification processes. A successful search through verification leads to the modification phase. During the transactions execution the following may happen:
(1) pages may split and/or merge and
(2) a page split or merge may cause the directory to expand or contract.
Thus, with the help of three lock modes and verification, these operations are serialized with a minimum of locking overheads.
When two transactions conflict over the directory or over a data page one of these transactions will be the requestor (requesting the directory or the data item) and the other the holder (holding the directory and/or the data item). Such a conflict is resolved as follows:
If the holder is active (i.e., not blocked for an
entity directory or data item) then block the requestor;
otherwise roll back the requestor.
We present the algorithm in terms of search, write, insert, and delete operations as follows. All algorithms have been presented formally in the Appendix. We begin with search.
(1) Obtain pseudo key K' = H (primary key).
(2) Extract most significant g bits from K'.
(3) Index the directory and read the values of VB, PS, and the page address stored in the directory entry.
(4) Compare (bit comparison)(1) VB bits with the index bits (obtained from the pseudo key) the transaction has (directory level verification). If index bits - VB bits then directory has been modified, so repeat from step 2.
These steps will be repeated until a successful comparison is found. A successful comparison (successful verification) indicates that it is the right version of the directory to proceed.
(5) Compare the number of VB bits with the local depth (bit comparison) stored in the page (page level verification):
If the number of VB bits < local depth (a page split had occurred) then go to step 2; otherwise the page may contain the data item. The address of this page and its local depth are recorded. The data item is searched in the page. If it is not found then it has not been inserted. Return success with proper message.
(6) End of Search.
During search, neither the directory nor the data page is locked. If the directory level verification succeeds and the directory is modified by other transactions then the page level verification may fail. This failure will force the search to repeat the process. Successful verifications at two levels guarantee that the required record will be found if it is present in the file. A successful search returns the address of the page and its local depth to the calling routine. We shall establish the correctness of these operations in the section on Proof of Correctness.
The data modification phase may write (change), insert, or delete a record. An insert may split a page. A page split may or may not initiate a directory expansion. Similarly a delete may initiate a page merge. A page merge in turn may or may not initiate a directory contraction. We will now describe these operations.
(1) Apply search. Pass the page address and the local depth to the calling routine.
(2) After a successful search, any one of the following situations may occur:
(a) Desired page is locked by a transaction (holder) that is blocked.
(b) Desired page is locked by a transaction which is not blocked for a data page.
(c) Desired page is free.
(3) In case of (a) the requestor is rolled back, and in case of (b) the requestor is blocked. In case of (c) the transaction locks the data page and compares its current local depth value with the local depth value received from the search. An unsuccessful comparison indicates interference from other transactions; hence the transaction releases the page and goes to step 1. A successful comparison indicates that the page is the correct page. Transaction finds the desired record and changes it.
In a concurrent environment a transaction may get suspended at any point during its execution. While hibernating, the desired page might be split and/or the directory might be expanded or contracted by other transactions. A page split may redistribute the records of a page among several pages. When the suspended transaction resumes, it may not find the desired record in the page it was going to search. This failure will force the transaction to repeat the search of the directory. It is possible that repeated search and verification may be necessary to find the desired record. The algorithm guarantees that if the record is present then the search will find it or it will return a failure if the record has not been inserted or it has already been deleted.
Insert a record
(1) Apply search. Pass the page address and the local depth to the calling routine.
(2) Lock the page in exclusive mode and compare its current local depth value with the local depth value received from the search. An unsuccessful comparison indicates interference from other transactions; hence release the page and go to step 1. A successful comparison indicates that the page is the correct page.
(3) Check the page status (PS) value. If the page is full then prepare for a page split.
(4) Page split: Compare the local depth of the page and the global depth. If they are equal then the page split will initiate a directory expansion. Split the page. (In a split a new page is acquired in the memory and locked in exclusive mode). Redistribute the existing records. Insert the new record in the right page and update pointers.
(5) If the directory
(a) is locked, then a conflict occurs. Resolve the conflict by rolling back or blocking the requestor.
(b) is free, then lock the directory (ce mode). If an expansion is required then expand the directory, link the old and new pages to correct directory entries, and update their PS and VB fields. If there is no directory expansion, then link the old and the new pages to correct directory entries and update their PS fields. Release lock from the directory but not from the data pages (locks on pages are released only at commit time). Releasing lock on the directory in a non-two-phase manner does not affect the database consistency since without successful verification no operation can proceed.
(6) At the end of the execution unlock these pages.
(7) End of insert.
(1) Apply search. Pass the page address and the local depth to the calling routine.
(2) Lock the page in exclusive mode and compare its current local depth value with the local depth value received from the search. An unsuccessful comparison indicates interference from other transactions hence release the page and go to step 1. A successful comparison indicates that the page is the correct page so delete the record.
(3) Add the PS values of this page and its sibling (page that can be merged with this page). If this value is less than the page capacity then prepare for a page merge.
(4) Page merge: Lock sibling in write mode. If the sibling is locked by some other transaction then skip this merge and proceed with execution. (Skipping a merge in this way does not cause any problem, since a merge is only for using memory efficiently and not for concurrency control.) Move all the records of these pages into one page if the lock is granted.
(5) If the directory
(a) is locked then a conflict occurs. Resolve the conflict by rolling back or blocking the requestor.
(b) is free then lock the directory (ce mode). If a contraction is required then contract the directory and link the right page with the correct directory entry. Update the PS and VB fields. If no directory contraction occurs then link the two directory pointers to this page. Release lock from the directory but not from the data pages (locks on pages are released only at commit time). Again releasing lock on the directory in a non-two-phase manner does not affect the database consistency since without successful verification no operation can proceed.
(c) Then release lock from the empty page and return it to the free page pool.
(6) At the end of the execution unlock these pages.
(7) End of delete.
Delete and Insert
During successive deletes it is possible that most of the pages may become empty, making a directory contraction desirable. The directory contraction, however, is not economical when the local depth of at least one page is equal to the global depth (g). In this situation, the directory contraction will require redistribution of all the records of those pages having local depth equal to g. Such a reorganization is not advisable, since the presence of empty pages in the directory does not affect the algorithm.
Any conflict over the directory is resolved, if required, by rolling back or blocking the requestor. Instead of rolling back the requestor, the process can be optimized by taking into consideration the resource utilization of the two conflicting transactions. The transaction that has used the least CPU can be selected for rolling back. This selection policy, however, may make the algorithm preemptive. The effect of this change has been presented in .
The search operation is vulnerable to the concurrent page split, directory expansion, and contraction. This type of interference is handled by the verification process. With this process the search operation is terminated as early as possible, in the case of a likely failure. We present a few examples to show some cases of concurrent execution of transactions. We use two sets (Set 1 and Set 2) of transactions and run them concurrently using our algorithm. Figure 6 gives the initial state of the file. For simplicity we assume that a data page holds only two entities (records).
Transaction Identity T, T2 T, T,, T,
Entities Requires 1,3 1,3 1,8 3 6
Transaction Identity T, T, T3 T, T, T6
Entities Required 1,3 1,8,6 3 7,9,3 5,1 10
Case 1: No Page Split and No Directory Modification We execute concurrently the transactions of set 1. T, and T2 wish to modify entities 1 K' = 01) and 3 K' 01). T, is scheduled and with the first two bits of the pseudo key indexes the directory. Suppose this takes T, to the first directory entry (prefixed by 01). It accesses and stores in its working area the values of page status, VB bits, and local depth, locks p[.sub.1], and then gets suspended. T2 begins execution, conflicts with T, (over p[.sub.1]), and so it is blocked. When rescheduled, T, does not verify the directory since the data page locked by T, (p[.sub.1]) cannot be acted upon by any other transaction. T, completes its execution and commits. When T2 is rescheduled, it compares the number of VB bits it stored in its working area before it was suspended, with the local depth of the p[.sub.1]. In this scenario the comparison is successful (since number of VB bits > local depth of p[.sub.1]), so T2 locks p[.sub.1], finds and modifies the desired records, and commits.
Case 2: Page Split and No Directory Modification Consider Figure 6. T3 wants to modify records 1 (K' = 01) and insert record 8 (K' = 00). T4 wants to modify record 3 (K' = 01). T3 is scheduled and gets suspended after reading the VB bits and page status of directory entry prefixed by 01. T4 begins but is suspended after accessing the VB (01) bits, local depth, and the address of page p[.sub.1] before locking it. When T3 is rescheduled it compares successfully VB bits with the local depth of p[.sub.1], locks p[.sub.1], and modifies 1. It begins insertion and finds that record 8 should go to p[.sub.1]. A page split is encountered, which does not lead to a directory expansion since local depth of p[.sub.1] < g = 2); p, splits, and a new page (p'[.sub.1]) is obtained and is locked by T3- Entities 1 and 3 are redistributed between p[.sub.1] and p'[.sub.1], 8 is inserted in p[.sub.1], and the local depths of these pages are updated to 2 as shown in Figure 7. Directory is locked and p', is linked to the correct directory entry; and ce lock is released from the directory. T3 completes its execution and releases p[.sub.1] and p'[.sub.1].
T,, is rescheduled and locks p,. It compares the old VB bits and the new local depth of p[.sub.1] (which is 2). The verification fails, T, unlocks p[.sub.1], invokes search, locks P',, finds and modifies record 3, and commits. Case 3. Page Split, Transaction Blocking or Roll Back, and No Directory Modification
Consider Figure 6 and Set 2. T2 wishes to modify records 1 (K' = 01) and 6 (K' = 11) and to insert record 8 (K' = 00); T3 wishes to modify record 3 K' = 01), and T, wishes to modify records 5 (K' = 11) and 1 (K' = 01). T3 begins, reads VB bits and PS, and gets suspended. T2 begins, and is suspended after locking pl. T, begins, locks P3, and modifies record 5 (K' = 11). It then tries to lock (after successful verification) page pointed to by 01 directory entry but gets blocked since p, is locked and the holder (T2) is not blocked. T2 starts and after modifying records 1, tries to insert record 8 in the page and encounters an overflow. As a result, p[.sub.1] is split, and records 1 and 3 go to page p', and 8 to p[.sub.1] (Figure 7). T, then tries to lock page p3 but finds it locked so it is rolled back since the holder Of P3 (which is T5) is in a blocked state. By rolling back the requestor T2), when the holder is blocked, we make the algorithm deadlock-free. (In such a roll back process, two pages, p[.sub.1] and p', may be merged; however, such a merge is generally not necessary). Suppose T5 resumes, locks p[.sub.1] (if p[.sub.1] and p', have been merged otherwise p'[.sub.1]). It modifies 1 and commits. T3 resumes locks P',, modifies 3, and commits.
A roll back may become expensive if it has to perform several page merges. We did not find the number of roll backs to be high in our experiment. In  it was illustrated that the number of roll backs was always lower than the number of blockings. Normal transactions, on the average, access 15-20 entities. With this number of entities, frequent roll backs are not likely. Case 4. Page Split, Directory Modification, and Transaction Roll Back and Blocking Consider Figure 7 and Set 2. T5 wants to modify records 5 (K' = 11) and 1 (K' = 01); T4 wants to insert records 7 (K' = 01) and 9 (K' = 10), and to modify record 3 (K' = 01). T2 wants to modify records 1, 8 (K' = 00) and 6 (K' 11), and T6 wants to insert record 10. T4 begins, reads VB and PS, and is suspended. T2 begins, reads VB, PS, and the address of pl, and is suspended. T, begins, manages to lock P3, and is suspended. T4 resumes, compares successfully VB bits and local depth, and locks p'[.sub.1]. A page split is encountered when T, tries to insert record 7 (K' = 01). The local depth of p', is equal to g so this split leads to a directory expansion. T4, therefore, applies a ce lock on the directory and is suspended. T,, begins and manages to lock P2. It encounters both a page split and a directory expansion when it tries to insert record 10. It is blocked when it tries to apply a ce lock to the directory. T4 resumes, splits the page p'[.sub.1], expands the directory, and wants to lock page P2 to insert record 9 (K' = 10). It conflicts with T,, (a blocked transaction) and, therefore, is rolled back. At this point, a directory contraction can be done but is not necessary since page splits and further directory expansion have occurred. T4, therefore, deletes record 7 (K = 01) and leaves the system. Such roll back of a transaction does not leave the database inconsistent since other transactions' updates are intact and new transactions can find records correctly. T5 resumes, modifies record 5, locks p'[.sub.1], modifies 1, and commits. T,, resumes, inserts record 10 in the correct page, and commits.
Proof of Correctness
We follow an approach similar to that taken in  to establish the correctness of our algorithm. We establish the following:
(1) All operations are deadlock-free and will terminate correctly in finite time.
(2) The search operation is correct.
(3) The insert/delete/modify operations are correct.
We make the following assumptions:
(1) There exists an upper bound for global depth.
(2) The search operation consists of a sequence of read steps. Each read step involves a directory entry or a data page.
(3) Insert/delete operations consist of a set of read and write steps that are atomic.
(4) In an interleave search and insert, the search completes before or after insert . We justify this assumption with the following example :
Consider the following interleaved schedule:
< I(pr) S(pr) >. Assume that 1(pr) inserts record r
in page p. S(pr) begins after the last step of 1(pr).
There is no deletion operation, and S(pr) fails to
find the record. This interleaved schedule is
equivalent to a serial schedule where S(pr) runs
before 1(pr). This equivalence, however, does not
establish the correctness of the interleaved schedule.
To resolve this problem we have introduced
the above assumption. We also assume that some
steps such as verification, getting directory entry,
page address etc., are atomic and can be supported
by the underlying operating system.
To construct a schedule we define the following notations:
* I(r) = insert record r in a data page. An insert obtains
a new page if the page where r is being inserted
splits, modifies the directory entry, and redistributes
the records in these two pages.
* D(r) = delete record r from a data page.
* S(d) = search directory entry d.
* S(r) = search record r in a data page.
The verification process is common to all the operations. It is an iterative process involving rereading the directory and data pages. We show below that the verification process will terminate in time.
Lemma 1: All Operations Terminate.
Proof- Since no two operations hold locks simultaneously and wait for each other, a circular wait cannot exist. No deadlock is therefore possible. Loop-preserving operations such as verification will terminate as soon as the correct page is found (successful directory level and page level verifications). In the case of a failure, it will terminate in finite time, since there is an upper limit to the global depth and the number of pages pointed to by the largest directory.
Lemma 2: Search Operation is Correct.
Proof: A search operation terminates in finite time. It may either succeed or it may fail. If a search operation succeeds then it must be correct, i.e., if record r exists in the file then the search operation will find it. If the search fails then the record it is looking for must not be in the file. We begin with success.
Search succeeds: Suppose we have three operations on a record r: I(r), S(r), and D(r). A serial schedule would be: < I(r) S(r) D(r) >. For a successful search a correct serializable schedule is the one where S(r) must complete before D(r). The only way search can find r after it has been deleted by D(r) is:
(1) r is inserted into data page p by I(pr).
(2) Page p was split to p and p' and r was migrated to P'.
(3) p is in a transient state and still holding r.
(4) Directory entry corresponding to r is pointing to p'.
(5) Search accesses p and finds r.
A serializable schedule is the one when all the steps of an interleaved operation preserve their order as they appear in the serial schedule. Under this situation the last step e cannot possibly occur since I(pr) has already changed the directory pointer, pointing to page p' that contains r. S will, therefore, access p' and not p. This implies that there is no such insert operation after which S will be looking into a wrong data page and fail to find r. S is therefore correct since it finds the record r which is present in the file.
Search fails: If search fails then r cannot be in the file. It will not be the case that search looks into a wrong page and terminates with failure when the record r is in some other page. We assume the contrary, i.e., suppose search fails after the record r is inserted by I(r) in some page. We show that the steps that lead to a failure cannot exist. The following are the only possible interleaved schedules:
(1) < I(pr) S(d) S(pr) >. Suppose I(pr) has split p into p and p' and moved r to page p'. If S(pr) fails then S will be looking for r in a wrong page, i.e., in p. But S cannot possibly be looking into p since the directory entry would not be pointing to p after the completion of I(pr). So there is no such I(pr) that assigns a wrong page to S(r). Consequently, if I(pr) expands the directory then the directory verification will take S(d) to the correct directory entry and lead S(pr) to the correct page.
(2) < S(d) S(pr) I(pr) >. In this case also, S(r) will look into p before I(pr) relocated r to p'.
(3) < S(d) I(pr) S(pr) >. If S(pr) fails, as we have assumed here, then S will be looking for r in page p. This cannot happen since the verification process will eventually find the correct directory pointer.
This illustrates that there is no such I(r) that will occur before S(r) such that S(r) will fail, Therefore the search operation is correct.
Insert/Deletion Operation is Correct
We establish that an insert operation inserts a record in a correct page and also that two interleaved inserts produce a correct result, i.e., each operation inserts its records in correct page(s).
A serial schedule for two inserts can be given as follows:
< S(d) S(r) V(d) I(p'r) S'(d) S'(r') V'(d) I'(p'r') >
S(r) finds the page where record r should be. Directory verification V(d) is done before I(p'r) splits the page into p and p', redistributes the records in p and p', and inserts r into p'. When the first insert is complete then all locks are released, and the insert from a second transaction begins.
Suppose these two inserts interleave incorrectly and insert their records in the wrong pages. If this is possible then one of the interleaved schedules could be:
< S(d) S(r) S'(d) V(d) S'(r') V'(d) I(pr) I'(pr') >
Under this schedule suppose S and S' find the same directory entry pointing to data page p. Both verifications V(d) and V'(d) will succeed. I(pr) splits page p into p and p', redistributes the records, and inserts r into p'. For I(r'), verification is correct so it will insert the record that may end up in wrong page (p or p'). This interleave schedule, however, is impossible since:
(1) I(r) will be executed under 2PL policy. The page split will be performed under exclusive locks (write, sm and ce if directory is modified). I'(pr) will conflict with I(pr) and consequently will be blocked. When I'(pr') resumes, it will begin with directory verification leading to a page verification.
(2) If the verification on the directory (comparison of VB and g bits of K') succeeds, the page level verification comparison of VB and local depth) will fail or succeed. If it succeeds, then record will go to the right page.
The success of these two verification phases, under 2PL locking, guarantees that any interleaved schedule will be serializable and correct. The same logic applies to the delete operation. This establishes the correctness of the insert and delete operations.
In this section we first describe the assumptions on which we have built our simulation models and then explain the flow of transactions through them. Our assumptions are as follows:
(1) There is one disk and one CPU. (This is to simulate the hardware resource contention.)
(2) The model makes use of the main memory buffers such that any number of required entities may be in the main memory which would not need any I/O.
(3) A roll back operation is given the highest priority.
(4) All transactions in a workload perform both read and write operations.
(5) Lock table and the entire directory always reside in the main memory.
The simulation model is shown in Figure 8. A transaction may be suspended anytime in a multi-programming environment. We do not show the suspension of a transaction in the simulation diagram. The flow of transaction through the model is as follows:
(1) New transactions arrive at Pending Queue (PQ). Transaction arrivals are drawn from a Poisson process.
(2) From PQ a transaction goes to CPQ1. The top transaction from CPQ1 is picked up, and after applying hashing function it indexes the directory. During this process the transaction may get suspended, and the state of the file may change. When it resumes the verification may fail, and it may have to repeat the search of the directory. On a successful verification the transaction requests for a lock on the entity.
(3) On successfully obtaining a lock, the transaction joins the Input-Output Queue (IOQ) and after I/O processing it joins CPQ2.
(4) At the end of the processing of each data entity, a test is carried out to find if the transaction has processed all its required entities. If it has, then it is committed; otherwise it is put at the top of CPQ1.
(5) If a lock is denied then the transaction may be rolled back or blocked. The transaction to be rolled back goes to the top of IOQ, consumes some I/O resource, and then is moved to the top of CPQ2 where it consumes some CPU resource. At the end of a roll back process, the transaction is scheduled for execution when the holder has committed or rolled back. The blocked transaction joins the Blocked Queue (BQ).
(6) A blocked transaction (requestor) is unblocked and moved to the top of CPQ1 when the holder releases the required entity.
(7) The end of the shrinking phase indicates the end of the committing operation.
We have simulated the expansion/contraction of directory and page split by providing a buffer pool of free memory pages. Pages for directory expansion and page split are obtained from this pool. If there are no free pages available, a page fault is generated and serviced in a FIFO manner. This design provides a facility to control the multiprogramming level dynamically. Under this scheme, the fixed directory structure has less memory available for transactions compared to the dynamic directory organization. Under the latter scheme, the available memory is proportional to the current directory size.
Simulation Parameter Values
We support page locking. The database size is selected to generate a reasonable amount of conflicts over directory and data pages under selected input parameters (arrival rates, buffer size). A much larger database size generated very few conflicts, making it inadequate for a meaningful evaluation of the CCM. On the other hand, a smaller size directory generated too many conflicts, making it difficult to explain the interaction among these parameters. A similar value for database size also has been used in other performance works.
We ran our simulator on VAX/VMS785, and the other parameter values are based on a CPU power of 1.5 mips. A time unit may be interpreted to be a millisecond. The values of these parameters have been selected on the basis of the number of the instructions each operation requires. A detailed description of the selection method can be found in . The simulator was driven using the following parameter values.
Database size (in records): 4095. Data page capacity (in records): 8. Maximum g value: 9. Transaction read:write ratio:(2) 50:50. Locking/unlocking time: 0.47. Record reading time: 0.62. Record modification time: 0.70. Record insert/delete time: 0.81. A page split time: 1.83. Page merge time: 1.19. A page fault service time: 9.30. Time to organize one directory entry 18.30.
RESULTS AND DISCUSSION
Arrival Rate, Throughput, and Response Time
Figure 9 shows the relationship between the arrival rate and the throughput. The throughput of both schemes are the same up to an arrival rate of six transactions per second, and beyond this it changes rather significantly. Figure 10 shows the relationship between arrival rate and response time. The significant change in the response time also begins to appear around seven transactions per second. The throughput of dynamic scheme begins to decline after the arrival rate of nine, but the decline is slow compared to the fixed directory scheme, which begins to decline sharply after seven transactions per second. As Figure 10 indicates, the sharp jump in the response times for fixed and dynamic schemes begins after seven and eight transactions per second. The increase begins to slow down after the arrival rates of eight and nine transactions per second.
The reason for this behavior might be associated with the rate of transaction blockings, roll backs, page faults, and directory and page level verifications. The number of transaction roll backs is related to the number of blockings since, under our algorithm, the requestor is rolled back if the holder is in a blocked state (blocked for data item and not for CPU or 1/0). So we look at the relationship between arrival rate and rolling back and blocking rates.
Arrival Rate, Blocking Rate, and Roll Back Rate
Figure 11 shows the relationship between the arrival rate and transaction blocking rate, and Figure 12 shows the relationship between the arrival and roll back rates. Both rates are higher in the case of fixed scheme. Here again we see the transition points to be seven for fixed and eight transactions per second for the dynamic schemes respectively. Figures 11 and 12 agree with our intuition, i.e., a higher blocking rate generates a higher roll back rate. The difference in these two rates between dynamic and fixed schemes is significant. The reason seems to be the size of the memory resident directory. It would be interesting to see the effect on the performance if, in the fixed scheme, the directory resides on the disk, and the desired portion is moved to the memory on demand. This essentially would simulate the dynamic scheme with the addition of 1/0 and may turn out to be a very inefficient since 1/0 cost may exceed the cost of directory expansion, which we found insignificant. We plan to investigate this in our future work.
Arrival Rate, Directory Verification, and Page Split
Figures 13 and 14 show the relationship between the arrival rate, the directory verification, and page split rates respectively. For both the figures, the changes in rates of these two schemes begin to appear after seven transactions per second. In the case of dynamic scheme, the rate of directory verification and page split rate increases significantly after this rate and stabilizes. On the other hand, these rates decrease in the fixed schemes and stabilize. The increase and the higher value of directory verification in dynamic scheme is due to expansion and contraction of the directory during transaction processing. In the fixed scheme the directory is never modified; therefore the verification success rate is high. On the other hand, the directory expansion generates more unsuccessful verifications before achieving success. The higher page split rate in the case of dynamic scheme seems to be due to the lower number of roll backs. A lesser number of roll backs implies greater numbers of successful transactions, and these transactions may split at least one page. The situation is different in the fixed scheme. A large number of transactions are rolled back before they even get to the point in their execution where page splits occur. The majority of those transactions that succeed do not split any page (transaction size is uniformly distributed) during their lifetime. This may also be the reason for the sudden drop in the page split rate in fixed scheme as Figure 14 shows. As indicated by Figure 12, the roll back rate rises sharply after the arrival rate of seven, and at that point the page split rate in the fixed scheme drops down significantly.
Arrival Rate, Page Merge Rate, and Transaction Wait Time
Figures 15 and 16 show the relationship between the arrival rate and page merge rate and average transaction wait time respectively. The merge rate shows a sudden increase in dynamic scheme after the arrival rate of eight. The fixed scheme does not show any sudden change. The sudden change in the dynamic scheme is the effect of directory contraction. As illustrated by Figure 14, the page split rate is significantly higher for dynamic scheme, and in the case of directory contractions the reverse scenario exists as shown in Figure 15. In the fixed scheme, there is no directory contraction; therefore page merge occurs only as a result of delete operations.
The transaction waiting time in the fixed scheme is higher than the dynamic scheme. The increased waiting time is the result of waiting for memory pages. Recall that in the fixed scheme the amount of memory available to transactions is much less compared to the dynamic scheme, at least in the lower transaction arrival rate. The available memory is not sufficient to support higher transaction volume; consequently pages are provided on demand. A transaction waits when the desired page is being acquired. This wait also contributes to transaction blocking leading to higher roll backs.
In this article we presented an algorithm for managing the execution of concurrent transactions on a dynamic data structure extendible hash file. The algorithm supports directory expansion and contraction and page split and merge. The algorithm uses the verification scheme of optimistic concurrency control mechanism and an additional "soft" lock, allowing operations on data pages and directory to proceed concurrently. It is based on a two-phase locking policy and uses transaction roll back and blocking to resolve conflicts among concurrent transactions over directory and data pages. It is deadlock-free since it resolves a conflict by rolling back the requestor, which is capable of precipitating a deadlock. It is non-preemptive since it does not roll back a transaction holder) that is using the CPU resource.
Our algorithm manages to reduce locking overheads and increases concurrency by allowing page modification and directory contraction/expansion to proceed concurrently. In , the directory modifications were supported with the help of exclusive locks, and the verification process was not used.
We used the simulation model to study the behavior of the algorithm under two schemes: dynamic scheme which supported directory modifications and the fixed scheme for which maximum directory size has to be pre-defined. We found that the dynamic scheme has some advantages over fixed directory scheme . It allows a higher level of concurrency. The extra overheads, generated by directory management, is not significant and does not offset the advantages gained by improved concurrency. We therefore believe that our algorithm EHCW may be an efficient scheme for Main
Memory Database Systems.
Acknowledgments. I am grateful to two anonymous referees for their comments, which were invaluable in refining this article. I thank Ling Ling Polican, Jamuna Prasad, and Meichun Hsu for stimulating discussions without which this article would not have been completed.
We present all the procedures functions of our algorithm in a pseudo-Pascal. We have used a set of functions/procedures that are mainly for acquiring additional information. We do not define them and assume that the underlying database system and/or operating system provides them. We also list the variable names and their meaning, which we have used in all the procedures given below.
g: global depth; dbase: base address of directory; dold: old directory entry; ibits: bit string; K: primary key; K': pseudo key in bit string; D: directory; dentry: a directory entry pointer; dnew: new directory entry; vb: verification bits of directory entry; ps: page status field of a directory entry; pageadr address of desired page; Id: local depth of a page; tstatus: transaction status (blocked or running); pagecapacity: maximum number of records in a page; dircontraction, dirfree, pagefree, siblingfound, result, found: boolean
(a) checkstatus (entity): checks if an entity is locked or free. It can also check if a transaction is blocked or free.
(b) checktype (pageadr): A function to check the existing lock mode (read/write) on pageadr.
(a) read 0: reads and returns the values of desired entities.
(b) readlock (pageadr): Applies a read lock on the desired data page.
(c) writelock (pageadr): Applies a write lock on the desired data page.
(d) getnewpage (): Gets a free data page in the memory and returns its address.
(e) put (): Inserts the record in the desired data page.
(f) redistributerecords (): Redistributes records among two desired data pages.
(g) checkdirstatus (): Checks if directory is locked or free.
(h) celock (): Locks directory in ce mode.
(i) getpage (): Gets a page for directory expansion.
(j) addpagetodir (): Adds pages to directory in directory expansion.
(k) linkpagetodir (): Links data pages to the directory.
(l) reorganizepointers (): Reorganizes pointers after directory expansion or contraction.
(m) cerelease (): Releases ce lock from the directory.
(n) block: Blocks the desired transaction.
(o) roll back (): Rolls back the desired transaction.
(p) update (): Modifies record K.
(q) removerecord (): Removes the desired record from the datapage.
(r) copyrecord (): Copies records from one page to another.
(s) dispose (): Disposes/returns an empty page to the system.
procedure search (K, result);
(* Procedure to search the desired record)
read g, dbase); K' := H (K); (* H is
a hashing function *)
ibits := g bits of K'; result :=
dold := dbase + get (D [ ibits ]); (*
get the directory index*)
dnew := dbase + get D[ibits]);
(* Begin directory verification *)
while dnew < > dold do
read (g); ibits := gbits of K'; dold
:= dnew; dnew := get (D [ibits }
(* Page level verification*)
vbits := dnew . vb; pageadr :=
dnew.padr (* get page address *)
if vbits > = Id (pageadr) then
found := findrecord (K); (* search
record in the page )
if not found) then print 'record K
does not exist')
else result := found * end search
else search (K, result)( * start
search again *)
procedure changerecord (K);
(* Modifies the desired record*)
search (K, result);
if result then
free := checkstatus (pageadr); (*
check the pageadr status; locked
or free * )
if not free then
locktype := checktype (pageadr);
if locktype = read then
readlock (pageadr) (* apply a read
tstatus := checkstatus (holder);
(* check status of the holder
of pageadr * )
if tstatus = blocked then rollback
else block requestor)
writelock (pageadr); Verify: if
correct page then proceed else
search; update (K)
procedure insert (K);
search (K, result);
if result then
free := checkstatus (olddatapage); (*
check the oldpage status; locked or
if not free then
tstatus := checkstatus holder);
(*check status of the holder of
if tstatus = blocked then roll back
else block (requestor)
writelock (olddatapage); if correct
page then proceed else search;
if direntry.ps = page capacity then
begin (* page is full and the insert
will require a page split *)
checkdirstatus (smlock, celock,
if celock then
tstatus := checkstatus (holder);
if tstatus = blocked then roll
else block requestor);
else (* no ce lock on directory *)
if localdepth (olddatapage) <
smlock dir) (* No directory
expansion, page split only *)
else (* page split and directory
dexpansion := true; celock
getnewpage (newdatapage); writelock
putr (K, datanewpage, olddatapage);
(* put record into the right
if dexpansion then
begin * directory expansion
for i := 1 to 2g do
getpage (dirnewpage); celock
addpagetodir (datanewpage); g := 2
for i := 0 to 2g - 1 do dir [i].vb
:= gbits (i); (* right most bits *)
smrelease dir); cerelease dir)
else (* page is not full *)
put (K, olddatapage)
procedure deleterecord (K);
search (K, olddatapage, result);
if result then
(writelock olddatapage); if correct
page then proceed else search;
removerecord olddatapage, K);
if ps < = pagecapacity/2) then
begin ( * this page can be merged with
its sibling so find sibling *)
siblingfound := true;
if leftmost localdepth bits of
(direntry - 1) =
leftmost localdepth bits of
direntry then sibling :=
direntry - 1
else if leftmost localdepth bits
of (direntry + 1) =
leftmost localdepth bits of
sibling := direntry + 1
else siblingfound := false;
if siblingfound then
begin (* check if page can be
merged with its sibling *)
if direntry.ps + sibling.ps <=
page capacity then
begin (* pages can be merged so
check status *)
pagefree := checkstatus
if pagefree then checkdirstatus
(smlock, celock, free);
if pagefree and (smlock or
begin (* prepare for merge *)
celock dir); writelock
(* check for directory
dircontraction := true; i := 0;
k := i;
page [k] := dir [i].datapage;
if q = newld[k] then
dircontraction := false
i := i +
while (i < 2[.sup.g]) and
if page[k] < > dir
k := k + 1; page[k]
if q = newld[k] then
dircontraction := false
if dircontraction then
begin contract directory
for j := 0 to2[.sup.g-1] - 1 do
k := 1;
for i := k to g - newld[i] do
m := m + 1
1. Comer, D. The ubiquitous B-Tree. Comput. Surv. 11, 2 june 1979), 121-138.
2. Ellis, C.S. Extendible hashing for concurrent operations and distributed data. In Proceedings of the Second ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (Atlanta, Ga., March 21-23). ACM, New York, 1983, pp. 106-116.
3. Eswaran, K.P., Gray, I.N., Lorie, R.A., and Tralger, I.L. The notions of consistency and predicate locks in database systems. Commun. ACM 19, 11 (Nov. 1976), 624-633.
4. Fagin, R., Nievergelt, J., and Strong, H.R. Extendible hashing: A fast access method for dynamic files. ACM Trans. Database Syst. 4, 3 (Sept. 1979), 315-344.
5. Hsu, M., and Yang, W.-P. Concurrent operations in extendible hashing. In Proceedings of the ]2th International Conference on Very Large Databases. IEEE, New jersey, 1986.
6. Knuth, D.E. Sorting and Searching. Vol. 111, The Art of Computer Programming. Addison-Wesley, Reading, Mass., 1974.
7. Kumar, V. A concurrency control mechanism based on extendible hashing for main memory database systems. In Proceedings of the 17th Annual ACM Computer Science Conference (Louisville, Ky., Feb. 21-23). ACM, New York, 1989, pp. 109-113.
8. Kumar, V., and Hsu, M. A superior deadlock-avoidance two-phase locking algorithm and its performance. Inf Sci. To be published.
9. Larson, P.-A. Dynamic hashing. BIT 18, 2 (1978), 184-201.
10. Litwin, W. Linear hashing: A new tool for files and table addressing. In Proceedings of the Sixth International Conference on Very Large Databases (Montreal, Canada, Oct. 1-3). IEEE, New jersey, 1980, pp.212-223.
CR Categories and Subject Descriptors: D.4.8 [Operating Systems]: Performance-simulation; E.5 [Data]: Files-organizationlstructure; H.2 [Information Systems]: Database Management; H.2.4 [Database Management): Systems-concurrency, transaction processing
General Terms: Performance
Additional Key Words and Phrases: Atomic, blocking, concurrent operations, extendible hashing, global depth, local depth, pseudo key, roll back, verification process
ABOUT THE AUTHOR:
VIJAY KUMAR is an associate professor of computer science at the University of Missouri-Kansas City. His current research interests include object-oriented databases, distributed database systems, telecommunication and main memory database systems, database security, and data structures.
Author's Present Address: Computer Science and Telecommunication, University of Missouri-Kansas City, 5100 Rockhill Road, Kansas City, MO 64110. email@example.com
Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||comparing effectiveness of concurrent and non-concurrent operation algorithms|
|Publication:||Communications of the ACM|
|Date:||Jun 1, 1990|
|Previous Article:||Fast hashing of variable-length text strings.|
|Next Article:||Abstracts from other ACM publications.|