Applying SD-tree for object-oriented query processing.1 Introduction The advent of internet has made the volume of data going high everyday in all computer-based applications. This has entrusted researchers to design more powerful techniques to generate and manipulate large amounts of data to derive useful information. Indexing plays a vital role in the fast recovery of required data from large databases. Among the many indexing techniques reported the signature file approach is preferred for its efficient evaluation of set-oriented queries and easy handling of insert and update operations. Initially applied on text data [2, 3, 12, 17, 21] it has now been used in other applications like office filing [6], relational and Object-Oriented Databases [14,16,26] and hypertext hypertext, technique for organizing computer databases or documents to facilitate the nonsequential retrieval of information. Related pieces of information are connected by preestablished or user-created links that allow a user to follow associative trails across the [9]. Signatures are hash coded abstractions of the original data. It is a binary pattern of predefined length with fixed number of ls. The attributes' signatures are superimposed su·per·im·pose tr.v. su·per·im·posed, su·per·im·pos·ing, su·per·im·pos·es 1. To lay or place (something) on or over something else. 2. to form object's signature. To resolve a query, the query signature say Sq is generated using the same hash function An algorithm that turns a variable-sized amount of text into a fixed-sized output (hash value). Hash functions are used in creating digital signatures, hash tables and short condensations of text for analysis purposes (see hash buster). and compared with signatures in the signature file for l s sequentially and many non-qualifying objects are immediately rejected. If all the Is of Sq matches with that of the signature in the file it is called a drop. The signature file method guarantees that all qualifying objects will pass through the filtering mechanism; however some non-qualifying objects may also pass the signature test. The drop that actually matches the Sq is called an actual drop and drop that fails the test is called false drop. The next step in the query processing is the false drop resolution. To remove false drops Search results that meet your search criteria but have nothing to do with the information you were trying to find. For example, you want biographical details about Grover Cleveland, but you get information about Cleveland, Ohio. See full-text search. each drop is accessed and checked individually. The number of false drops can be statistically controlled by careful design of the signature extraction method [7] and by using long signatures [3,6]. 1.1 Related work Different approaches have been discussed by researchers to represent Signature file in a way conducive for evaluating queries, such as Sequential Signature File [31], Bit-Slice Signature file [31], Multilevel Signature file [25], Compressed Multi Framed Signature file [23], Parallel Signature file [20], S-Tree and its variants [13,24], Signature Graph [28] and Signature tree [27,29,30]. 1.2 Motivation for the current work The signature tree developed by Chen [30] is having the following drawbacks: * Signatures are inserted considering both 0s and 1s whereas actual weight age is for set bits only. * Insertion path is dictated by the existing tree structure. * To process a query, bits appearing in the tree from root node (mathematics, data) root node - In a tree, a node with no parents, but which typically has daughters. are compared with query signature pattern for 0s and I s and not by its set bits. * For a 0-bit in the query both left and right sub tree is followed leading to multiple traversals. These observations laid the foundation for the current work. We study a new indexing technique for OODBSs using the dynamic balancing of B+ tree called Signature Declustering (SD)-tree in which the positions of 1s in the signatures are distributed over a set of leaf nodes. Using this for a given query signature all the matching signatures can be retrieved cumulatively in a single node. The rest of the paper is organized as follows. In section 2 we discuss briefly the different approaches used to represent the signature file. In section 3 the structure of the proposed SD-tree is shown. Section 4 is devoted to the algorithms for insert, search and delete operations. Section 5 proposes a sample data set and queries to validate the new structure. Section 6 reports the results of the experiments conducted with the analytical comparison of SD-tree with that of Signature tree [30]. Finally section 7 concludes the work with further outlook on the work. 2 A summary of signature file techniques 2.1 Signature files A Signature is a bit string formed from a given value. Compared to other index structures, signature file is more efficient in handling new insertions and queries on parts of words. Other advantages include its simple implementation and the ability to support a growing file. But it introduces information loss which can be minimized by carefully selecting the signature extraction method. Techniques for signature extraction such as Word Signature (WS) [2,3,4,6], Superimposed Coding (SC) [1,2,3,4,5,6,10], Multilevel Superimposed Coding [12], Run Length Encoding See RLE. (RL) [3,4,6], Bit-block Compression (BC) [3,4], Variable Bit-block Compression (VBC) [4,6] have been reported. The encoding See encode. scheme sets a constant number say m, of 1s in the range [1..F], where F is the length of the signature. The resulting binary pattern with m number of 1s and (Fm) number of 0s is called a word signature. The signature of a text block or object can be obtained by superimposing (logical OR operation) all its constituent signatures (i.e) word signatures for text block and attributes' signatures for object. The set of all signatures form a signature file. An example of Superimposed Coding and a sample query evaluation is given below. Information 0010 0100 Retrieval 0100 0001 Block Signature 0110 0101 Sample queries Matching query Keyword = Information 0010 0100 Query descriptor 0010 0100 Block signature matches (Actual Drop) False Match query Keyword = Coding 0010 0001 Query descriptor 0010 0001 Block signature matches (False Drop) but keyword does not Non-matching query Keyword = Information 0010 0100 Keyword = Science 0000 0110 Query descriptor 0010 0110 Block signature does not match 2.2 Applications of signatures Signatures are applied mainly in database access methods useful for text retrieval such as Full text scanning, Inversion inversion /in·ver·sion/ (in-ver´zhun) 1. a turning inward, inside out, or other reversal of the normal relation of a part. 2. a term used by Freud for homosexuality. 3. , Clustering, Multi attribute retrieval methods like hashing Creating hash totals or hash tables. See hash total and hash table. hashing - hash coding and signature files. Such applications are discussed in [3,5,21]. Here the documents are stored sequentially in the "Text File". Signatures which are abstractions of the documents are stored in the "Signature File". The latter serves as a filter on retrieval. It helps in discarding a large number of non-qualifying documents. Signatures have been applied in areas rich in text documents like telephone directory [1], office systems [3,4,6], Optical and Magnetic disk access [8], Data base Management system and Library automation. Other applications include Access method for documents Indexing method for large text file [17, 18]. Access method for formatted data [l]. To speed up searching in editor [7]. To compress a vocabulary [7]. For a spelling checking program [7]. In differential file [7]. 2.3 Physical representations of signature files This section discusses briefly the various techniques used to represent signature files. We follow the lead of [30] for figures in this section. 2.3.1 Sequential Signature File (SSF SSF Scalable Simulation Framework SSF Single Stock Futures SSF Service Switching Function SSF Small Form Factor SSF Svenska Simförbundet (Swedish Swimming Association) SSF Space Station Freedom SSF Society of St. ) The signatures are sequentially stored in a file called SSF as in Fig. 1. In single level signature methods every signature must be accessed and tested. Since signatures are abstractions of original data with smaller size, the method is faster than sequential scan See progressive scan. of objects themselves. This method is easy to implement and requires low storage space and low update cost. The disadvantage is that more the number of objects exist, the more is the time spent on scanning signature file [8,31]. Therefore it is generally slow in retrieval. To support faster access multilevel signature methods are suggested. [FIGURE 1 OMITTED] 2.3.2 Bit-Sliced Signature File (BSSF BSSF Bromeliad Society of South Florida BSSF Bone-Shaped Short Fiber ) BSSF stores signatures in a column-wise manner as illustrated in Fig. 2. Thus F bit-slice files one for each bit position of the set signatures are used. In retrieval only a part of the F bit-slice files have to be scanned and hence the search cost is lower than SSF. However update cost is higher. This is because a new signature insertion requires about F disk accesses one for each Bit-slice file [8,31]. [FIGURE 2 OMITTED] 2.3.3 Compressed Bit-Sliced Signature File (CBSSF) By choosing a proper hashing function for signature extraction the number of 1s is forced to be one. Here, the signature length should be increased to maintain the false drop probability at minimum. This creates a sparse matrix In the mathematical subfield of numerical analysis a sparse matrix is a matrix populated primarily with zeros. Conceptually, sparsity corresponds to systems which are loosely coupled. which is easy to compress [8, 23]. A simple way to compress is to replace each 1 with its corresponding physical address. In Fig. 3 the hash table A lookup table that is designed to efficiently store non-contiguous keys (account numbers, part numbers, etc.) that may have wide gaps in their alphabetic and numeric sequences. Hash tables are created by using a hashing function (algorithm) to hash the keys into hash buckets. has a list of pointers pointing to the heads of linked list [8]. For example assume that the word Text has its first bit set to 1 and it appears at the 50th byte of text file then searching the first bucket list, we find the position of the word Text. Although this approach gives some space saving, the number of false drops will definitely be increased due to sparse signature files. [FIGURE 3 OMITTED] 2.3.4 S-Tree S-Tree [24] is a B+ tree like structure [11,22] with leaf nodes containing a set of signatures with their Object Identifiers (OIDs). The internal nodes are formed by superimposing the lower level nodes as shown in Fig. 4. For example to retrieve a query signature Sq = 11000000, we search S-ree top down. From root node v1 we compare and move to v6. In the next level both v7 and v8 match and finally we end up with signatures o11, o12 in v7 and o13 in v8. The advantage is simple tree searching way of obtaining signatures rather than searching the whole signature file. [FIGURE 4 OMITTED] The disadvantage is that due to superimposing, internal nodes in the upper level tend to have more weight which ultimately decreases selectivity selectivity /se·lec·tiv·i·ty/ (se-lek-tiv´i-te) in pharmacology, the degree to which a dose of a drug produces the desired effect in relation to adverse effects. selectivity 1. . The S-Tree has been further improved in [13, 14]. In [13] a number of new split methods namely Linear split, Quadratic quadratic, mathematical expression of the second degree in one or more unknowns (see polynomial). The general quadratic in one unknown has the form ax2+bx+c, where a, b, and c are constants and x is the variable. split, Cubic split and hierarchical clustering for S-tree is proposed to improve query response time. In [14] a new hybrid scheme combining linear hashing, S-tree and parametric weighted filter is used to evaluate subset-superset queries. 2.3.5 Multilevel Signature file The structure is similar to S-Tree. However a signature at non-leaf node is formed by superimposed coding from all text blocks indexed by the sub-tree of which the signature is the root. Fig. 5 shows multilevel signature file for the set of signature values depicted in 4 Fig 4. Though this method improves selectivity in an n internal node, it requires more space. An improved e method for multilevel signature file is discussed in [25]. 2.3.6 Signature Graph The signature file is organized as a trie like structure [28,30]. However, the path visited in the graph to find a signature that matches a given query signature corresponds to a signature identifier which is not a continuous piece of bits, differentiating the signature graph from trie. [FIGURE 5 OMITTED] Though signatures are represented compactly, the search path length is not same for all queries. In other words Adv. 1. in other words - otherwise stated; "in other words, we are broke" put differently the graph is not balanced. In worst case it degrades to a signature file. Fig. 6. shows the signature file and the corresponding signature graph. [FIGURE 6 OMITTED] 2.3.7 Signature Tree The signature tree is a binary tree A data structure in which each node contains one parent and no more than two children. Binary Tree _title> binary tree - (btree) A tree in which each node has at most two successors or child nodes. like structure with nodes representing the bit positions and left and right sub tree followed for binary values The following table shows the decimal value of a binary number when all bits in the binary number are 1. Just as the largest number in a group of decimal digits is all 9's, the largest number in a group of binary digits is all 1's. 0 and 1 respectively. Each signature is identified from the root by checking the bit positions dictated by the nodes. Nevertheless for a query signature the tree is searched top to bottom according to according to prep. 1. As stated or indicated by; on the authority of: according to historians. 2. In keeping with: according to instructions. 3. the bit positions dictated by the nodes rather than the Is in query signature. Also, for a match with bit 1, searching follows the right sub tree and for 0 at a node both left and right sub trees are followed. That is for a balanced signature tree more than one path is traversed. Fig. 7. indicates a signature file and its corresponding signature tree. The thick lines in Fig 7(b) indicates the signature identifier that corresponds to $3. In the following section we discuss a quite different method called SD-tree which considers only the positions of 1 s in a given signature and decluster them over a set of leaf nodes so that query response time is improved. [FIGURE 7 OMITTED] 3 The Structure of SD-tree In this section we describe the structure of SD tree. There are three types of nodes: * Internal nodes * Leaf nodes * Signature nodes The internal nodes and leaf nodes are somewhat similar to the internal nodes and leaf nodes of B+ trees respectively. The internal nodes form the upper tree and leaf nodes at last but one level. The signature nodes are at the bottom level of the SD-tree. We will now explain the structure of the nodes in detail. To make discussion simple, we assume the tree order as 3 for a signature file with 8 block signatures of length 12. 3.1 Structure of Internal node An internal node of SD-tree is illustrated in Fig. 8. Like B+ tree internal node pointers and keys alternate each other. For a tree of order 3 the internal node has two keys K1 and K2 and three pointers P1, P2, P3. These pointers are tree pointers pointing to the nodes at the lower level. While searching, the left tree pointer is followed for values less than or equal to the node value, else right pointer is followed for values greater than the node value as in B+ tree. [FIGURE 8 OMITTED] 3.2 Structure of a leaf node Noun 1. leaf node - (botany) the small swelling that is the part of a plant stem from which one or more leaves emerge node phytology, botany - the branch of biology that studies plants The leaf nodes appear in the last but one level of the SD tree. Like B+ tree all the key values appear in ascending ascending /as·cend·ing/ (ah-send´ing) having an upward course. ascending progressing to higher levels, usually used in reference to the nervous system. order of their values in the leaf nodes and are connected to promote sequential search A search for data that compares each item in a list or each record in a file, one after the other. Contrast with direct search and indexed search. . But unlike B+ tree in SD-tree each value is followed by a signature node instead of data pointer. This is depicted in Fig 2. Pointers P1 and P2 point to the corresponding signature nodes for K1, K2; P3 is the sequential pointer to next leaf node and P4 is the backward pointer from next leaf node. [FIGURE 9 OMITTED] 3.3 Structure of signature node The structure of a signature node is shown in Fig. 10. The signature node for Ki has [2.sup.i-1] binary combinations denoting the possible prefixes. When a signature Su with 1 in the ith position is to be inserted the intermediate prefix The beginning or to add to the beginning. To prefix a header onto a packet means to place the header characters in front of the packet. "To prefix" at the beginning is the opposite of "to append" characters at the end. See prepend. 1. formed (explained in section 4) is compared with the binary combinations in the signature node at Ki and u is inserted in the list. 3.4 Overall structure of SD-tree Consider the partially filled SD-tree shown in Fig 11(a) for discussion. The tree has been constructed for signature length F = 12. It is obvious from Fig. 11 (a) tree of order 3 has height 2. Consider the signature file in Fig 11 (b). To insert signature S1 with first occurrence of 1 at position 2, access the leaf node with value 2, follow its signature node and write the signature value as 1 (for S1) with the prefix 0 (for bit 1). In the same way $2 is inserted in signature node 1 with no prefix and signature node 3 with prefix 10 (for bits 1 and 2). S4 will be inserted at signature node 1 with no prefix and signature node 2 with prefix 1. While inserting a signature there are two possible options to move to next 1s position in the leaf node. One is via the sequential pointer between leaf nodes and the other is the top-down traversal of tree from root. To ensure optimal search path the threshold (Th) is fixed at h+1 (h being the tree height plus one more level for accessing signature nodes). As long as the number of sequential pointers traversed in leaf nodes is within the specified Th value, follow the sequential pointers. This is calculated by finding the difference (d) between two consecutive 1s in a signature divided by the number of entries per leaf node. This is given by the condition d / p-1 [less than or equal to] Th , where p is the order of the tree. The division here is integer integer: see number; number theory division. When the above condition is not true, a new tree access is initiated from root. Similarly to promote optimal query processing the query signature value Sq is taken in decimal form. The occurrence of the last one (in the least significant position) is found by performing D Mod 2 operation. The remaining binary prefix In computing, binary prefixes can be used to quantify large numbers where powers of two are more useful than powers of ten (such as computer memory sizes). Each successive prefix is multiplied by 1024 (210) rather than the 1000 (103 is formed in the process of decimal to-binary conversion of D. The procedures for tree maintenance are explained in the following section. [FIGURE 11 OMITTED] To minimize the number of false drops the database size is selected according to [4, 27, 30] as F [ln.sub.2] = mD ... (1) where F is the signature length, m the number of set bits and D the average data block size. 4 Tree Search and Updates This section lists the algorithms for signature insert, delete and search operations on SD-tree. A global flag F is set to 0 indicating the search path from the root of the tree by default. In the procedure after the first 1's insertion, depending on the d value F may be set to 1. 4.1 Insertion The algorithm for signature insertion is outlined in this section. The call to procedure New(node) returns a new link in the signature node.
Insert ([S.sub.u])
Input : The signature to insert [S.sub.u];
1. Let [i.sub.1], [i.sub.2], ..... [i.sub.n] be the positions
of 1 in [S.sub.u];
F[left arrow] 0; Th = h+1 ; //for tree of order p
2. Move [i.sub.2] to [i.sub.n] to queue Q. B = NULL;
3. If([i.sub.1] = 1) then begin
Access leaf node [i.sub.1]; //from root
New(node);
insert u;
B = strcat ( B, ' 1 ');//to denote bit 1 position
end
else begin
for k = 1 to [i.sub.1] - 1 do
B = strcat ( B, '0');
Access leaf node [i.sub.i]; //from root
New(node); write (B);
insert u ;
B = strcat ( B, ' 1 ');
end
f = [i.sub.1];//store the current bit position
F [right arrow] 1 ;//enable sequential search in leaf
nodes
4. While Q not empty do
begin
read x from Q;
if(x = f+1) then
begin
Access leaf node x; //via leaf node
pointers
If (not(B)) then New(node);//create
node
Write(B);
Write u @ prefix B;
end
else begin
d=x-f,
If(d/(p-1)) > Th then F [right arrow] 0;
for k = f+1 to x-1 do
B = strcat ( B, '0');
Access leaf node x;
If (not(B)) then New(node);
Write(B);
Write u @ prefix B
end
B = strcat (B, '1 ');
f = x;
end. //until queue is empty
4.2 Searching The following algorithm outlines the steps to search for signatures matching a given query signature Sq. In the procedure F [left arrow (character) left arrow - The graphic which the 1963 version of ASCII had in place of the underscore character, ASCII 95. ] 0 always and the algorithm lands up directly in the signature node corresponding to last 1 from root. The match here is the exact match and the optimal signature length selected in equation (1) minimizes the false drops. Search([S.sub.q]) Input : The (query) signature to search as decimal D. Output : The list of signatures matching the given signature. 1 Repeat i = D mod 2; D = D div 2; until i = 1 2 Let n be the corresponding bit position of i. 3 Let B = toBinary(D); //convert D to binary 4 Access leaf node n //from root node 5 Search for B. 6 If Found() then read and output the list of signatures. 4.3 Deletion The algorithm to delete a signature from SD-tree is described below. Delete ([S.sub.u]) Input : [S.sub.u], the signature to delete. 1. Let [i.sub.1], [i.sub.2],.....[i.sub.n] be the positions of 1 in [S.sub.u]. 2. For each [i.sub.k] (l [less than or equal to] k less than or equal to] n) form prefix B as in Insert ([S.sub.u). 3. Access the leaf node and follow the signature node; 4. Access prefix B and search for u. 5. If present, delete it. 6. Repeat steps (2) through (5) for all [i.sub.k]s. 5 A sample validation model This section discusses the evaluation of sample Object-Oriented queries on a hypothetical object base. Fig 12 shows the class diagrams in UML (Unified Modeling Language) An object-oriented analysis and design language from the Object Management Group (OMG). Many design methodologies for describing object-oriented systems were developed in the late 1980s. notation [15,19]. The classes and their relationships are listed in Table 1. The sample hashing outputs of attributes values are listed in Table 2. The attributes' hash coded values are superimposed to produce object's signature. The set of all object signatures of a class correspond to the signature file. The signature files created for various attribute combinations of objects are listed in Table 3. [TABLE 3 OMITTED] 6 Experimental results To validate the proposed structure we implement SD-tree in Java and for every test run the tree is constructed statically before signature insertion. The parameters considered in the experiments' data sets are Signature length (F), Signature weight (m) and signature weight distribution (swd). The experiments were carried out in a standalone system with Intel Pentium IV See Pentium 4. processor. The main memory size is 512 MB and the hard disk capacity is 80 GB. 6.1 Signature tree Versus SD-tree In this section the parameters which are generally considered in the analysis of indexing structures like time and space complexities are reported. We compare the results of Y. Chen's Signature tree [30] with that of the SD-tree. The observed results are listed in Table 5. 6.2 Time Complexity Like in other signature applications we use the response time as the performance measure [23]. The time complexity of the insert algorithm initially is the sum of time taken to construct the B+ tree of order p and the time taken for inserting. Here B+ tree is constructed with values 1,2, ....., F where F is the length of the signature for a given dataset. Hence compared to the use of B+ tree as index structure for large datasets the value F is small which reduces the time taken for SD-tree construction considerably. In algorithm 4.1 the time complexity for insertion is bounded by O(nm) where n is the number of signatures in the file and m is the number of 1s in the given signature as against the O(nF) in the signature tree insertion[30], n being the number of signatures in the file and F is the full length of signature including O's and 1's. Since deletion follows similar steps as insertion the time complexity is same for both. Another desirable characteristic of SD-tree is that for higher F values, by varying p, the value of h, height of the tree can be kept small to promote faster search. It is observed that the In Table 5, n--Number of signatures in signature file F--Length of signature m--Number of set bits p--Order of SD-tree [lambda]--Number of path traversed in query searching a--Average no. of signatures / signature node k--log 2 n f--Average no. of prefix values / signature node Similarly search time is the sum of time taken to access the leaf node (T1) and signature node search time (Tsi). This is given by Ts = T1 + Tsi Here, T1 is constant for all leaf nodes for a dynamic balanced structure like SD--tree and Tsi is directly dependent on the i value and the number of signatures inserted in the signature node. In the worst case the search time is bounded by O(T1 + [2.sub.i-1]). To analyze the query response time the signature weight distribution was fixed as 100% , 70%, 50% and 30% for signature lengths 10 and 30. The values are plotted in Fig. 13. through Fig. 16. The weight of the signature was biased in upper byte(U), lower byte(L) or uniformly distributed(M) and time values noted. For 100% swd the insertion and search time is a constant of the signature weight bias. This is depicted in Fig. 13. In the same way the swd was fixed at 70, 50 and 30 and the observed values are plotted in Fig. 14, Fig. 15 and Fig. 16 respectively. The query response time between two consecutive queries is minimized by following the backward pointers in leaf nodes when the following condition is true. That is, [d.sub.i,j]/p-1 [less than or equal to] Th where [d.sub.i,j] is the difference in Sqi's last 1's position and Sqj's first 1's position. All the graphs show that the time taken for signature insertion grows linearly with the values of swd, F and the weight bias. Insertion time The term insertion time is used to describe the length of time which is required to rearrange a subcritical mass of fissile material into a critical mass. This is one of the three main requirements in a nuclear weapon design to create a working fission atomic bomb. increases in upper nodes due to the complexity of the circuit in creating prefixes. SD-tree maintenance and space overhead SD-tree maintenance is quite simple that the tree is not subject to extensive node split or merge. This is because the insertions and deletions do not affect the node values or the height of the tree. Operations are reflected only in the signature node. In the experiments the binary prefix pattern nodes are created dynamically. The space consumption for insertion of a signature number at a signature node depends on the binary prefix length (1), the pointer size (p) and the space for writing the signature number(n). The prefix is created anew a·new adv. 1. Once more; again. 2. In a new and different way, form, or manner. [Middle English : a, of (from Old English of; see of) + new each time if it does not exist. Hence, the space complexity to insert with prefix for a signature of weight (w) is given by O[w((1+2p)+(n+p))]. Similarly the space complexity of a signature for which prefix already exists is bounded by O[w(n+p)]. Fig. 17 shows the space overhead of SD-tree. The tree is created statically. Signature nodes are created dynamically and are of fixed size. It is clear from the graph that the space consumption increases linearly with F and w. It is obvious that for all combinations of values the query search time for SD-tree is lesser than signature insertion time. For swd of 70%, 50% and 30% the signature weight was biased in lower byte, upper byte or uniformly distributed and values noted. All the outputs clearly indicate that the time taken for signature insertion and query response is slightly higher in upper levels. [FIGURE 13 OMITTED] [FIGURE 14 OMITTED] [FIGURE 15 OMITTED] [FIGURE 16 OMITTED] [FIGURE 17 OMITTED] 7 Conclusion and research directions In this paper we presented a novel way to represent signatures in a B+ tree like structure called SD-tree and analyzed the performance for query response time. By varying the signature length and distribution of Is in the signature the query response time was noted and results plotted. It is clear from the graphs that considerable search time is saved. The space overhead in SD-tree may be higher due to the presence of binary prefixes in higher order signature nodes, but the flexibility provided by the SD-tree outweighs all besides simple maintenance and faster query retrieval time. The work is proposed to extend in the following directions. The synthetic data sets are to be replaced with, run and verified on a real Object Oriented Data Base system. Another direction is when the signature weight is more than 50%, use 0s so that number of signature nodes accessed for insertion and search is optimal. Also the structure can be modified to support point and range queries in Object Oriented Data Base system. Received: April 14, 2008 References [1] Charles S. Roberts Charles S. Roberts is a wargame designer. He is renowned as "The Father of Board Wargaming", having created the first modern wargame (a boardgame) in 1952, and the first wargaming company in 1954. , (1979) Partial-Match Retrieval via the Method of Superimposed Codes, Proc. of IEEE (Institute of Electrical and Electronics Engineers, New York, www.ieee.org) A membership organization that includes engineers, scientists and students in electronics and allied fields. , Vol. 67, No. 12, pp 1624 1642. [2] Chris Faloutsos, Stavros Christodoulakis, (1984) Signature Files." An Access Method Jor Documents and Its' Analytical Performance Evaluation Performance evaluation The assessment of a manager's results, which involves, first, determining whether the money manager added value by outperforming the established benchmark (performance measurement) and, second, determining how the money manager achieved the calculated return , ACM (Association for Computing Machinery, New York, www.acm.org) A membership organization founded in 1947 dedicated to advancing the arts and sciences of information processing. In addition to awards and publications, ACM also maintains special interest groups (SIGs) in the computer field. Trans on Office Information Systems, Vol. 2, No. 4, pp 267-288. [3] Christos Faloutsos, (1985) Access Methods for Text", ACM Computing surveys, Vol. 17, No. 1, pp 49-74. [4] Chris Faloutsos, (1985) Signature files: Design and performance comparison of some signature extraction methods" Proc. of ACM SIGMOD SIGMOD Special Interest Group On Management of Data , pp 63-82. [5] Christos Faloutsos, Stavros Christodoulakis, (1985) Design of a Signature File Method that Accounts for Non-Uniform Occurrence and Query Frequencies, Proc. of VLDB (Very Large DataBase) An extremely large database. What constitutes a VLDB is debatable, but databases of 100GB or more are generally considered the starting point. , pp 165-170. [6] Christos Faloutsos, Stavros Christodoulakis, (1987) Description and Performance Analysis of Signature File Methods for Office Filing, ACM Trans on Office Information Systems, Vol. 5, No. 3, pp 237-257. [7] Christos Faloutsos, Stavros Christodoulakis, (1987) Optimal Signature Extraction and Information Loss, ACM Trans. On Database Systems, Vol. 12, No. 3, pp 395-428. [8] Christos Faloutsos, Raphael Chan, (1988) Fast Text Access Methods for Optical and Large Magnetic Disks: Designs and Performance Comparison, Proc. of VLDB, pp 280-293. [9] Chris Faloutsos, Raymond Lee, Catherine Plaisant, Ben Shneiderman Ben Shneiderman (born August 21, 1947) is an American computer scientist. He provided fundamental research in the field of human–computer interaction, such as with his eight rules of design. , (1990) [10] Incorporating String Search in a Hypertext System Noun 1. hypertext system - a database management system that allows strings of text (`objects') to be processed as a complex network of nodes that are linked together in an arbitrary way : User Interface and Signature File Design Issues', Hypermedia hypermedia: see hypertext. The use of hyperlinks, regular text, graphics, audio and video to provide an interactive, multimedia presentation. All the various elements are linked, enabling the user to move from one to another. , Vol. 2, No. 3, pp 183-200. [11] D. Dervos, Y. Manolopoulous, P. Linardis, (1998) Comparison of Signature File Methods with Superimposed Coding, J. Information Processing information processing: see data processing. information processing Acquisition, recording, organization, retrieval, display, and dissemination of information. Today the term usually refers to computer-based operations. , 6, pp 101-106. [12] Douglas Comer, (1979) The Ubiquitos B-Tree, Computing Surveys, Vol. 11, No. 2, pp 121-137. [13] Dik Lun Lee, Young Man Kim, Gaurav Patel, (1995) Efficient Signature File Methods" for Text Retrieval, IEEE TKDE TKDE Transactions on Knowledge and Data Engineering , Vol. 7, No. 3, pp 423 435. [14] Eleni Tousidou, Alex Nanopoulos, Yannis Manolopoulos, (2000) Improved Methods for Signature-Tree Construction, The Computer Journal, Vol. 43, No. 4, pp 301-314. [15] Eleni Yousidou, Panayiotis Bozanis, Yannis Manolopoulos, (2002) Signature-based structures for objects' with set-valued attributes, Information Systems, Vol. 27, No. 2, pp 93-121. [16] Grady Booch Grady Booch (born February 271955) is a software designer, a software methodologist and a design pattern enthusiast. He is chief scientist of Rational Software (now a part of IBM) and a series editor for Benjamin/Cummings. , James Rambaugh, Ivar Jacobson Ivar Hjalmar Jacobson (born in Ystad, Sweden, on September 2, 1939) is a Swedish computer scientist. He got his Master of Electrical Engineering at Chalmers Institute of Technology in Göteborg in 1962 and a Ph.D. , (2003) The Unified Modeling Language User Guide, Pearson Education Pearson Education is an international publisher of textbooks and other educational material, such as multimedia learning tools. Pearson Education is part of Pearson PLC. It is headquartered in Upper Saddle River, New Jersey. Pte Ltd PTE LTD Private Limited . [17] Hwan-Seung Yong, Sukho Lee, Hyoung-Joo Kim, (1994) Applying Signatures for Forward Traversal Quet3, Processing in Object-Oriented Databases, Proc. 10th Intl. conf. Data Engg, pp 518-525. [18] Justin Zobel, Alistair Moffat, Kotagiri Ramamohanarao, (1988) Inverted inverted reverse in position, direction or order. inverted L block a pattern of local filtration anesthesia commonly used in laparotomy in the ox. Files" Versus Signature Files for Text Indexing, ACM Trans. On Database systems, Vol. 23, No. 4, pp 453-490. [19] A. Kent, R. Sacks-Davis, K. Ramamohanarao, (1990) A Signature File Scheme Based on Multiple Organizations for Indexing Very Large Text Databases, J. Am. Soc. Information Science, 1990, Vol. 41, No. 7, pp 508-534. [20] Martin Fowler
Martin Fowler , Kendall Scott, (2003) UML Distilled, A brief guide to the standard Object Modeling Language, II edition, Pearson Education Pte Ltd. [21] Paolo Ciaccia, Paolo tiberio, Pavel Zezula, (1996) Declustering of Key-Based Partitioned Signature Files, ACM Trans. On Database Systems, Vol. 21, No. 3, pp 295-338. [22] Per-Ake Larson, (1984) A Method for Speeding up Text Retrieval, ACM SIGMIS, Database Winter, Vol. 15, No. 2, pp 19-23. [23] Rudolf Bayer Rudolf Bayer has been Professor (emeritus) of Informatics at the Technical University of Munich since 1972. He is famous for inventing two data sorting structures: the B-tree with Edward M. McCreight, and later the UB-tree with Volker Markl. , Karl Unterauer, (1977) Prefix B-Trees, ACM Trans. On Database Systems, Vol. 2, No. 1, pp 11-26. [24] 23. Seyit Kocberber, Fazli Can, (1999) Compressed Multi-Framed Signature Files: An Index Structure for Fast Information Retrieval information retrieval Recovery of information, especially in a database stored in a computer. Two main approaches are matching words in the query against the database index (keyword searching) and traversing the database using hypertext or hypermedia links. , Proc. ACM Symp. Applied Computing, pp 221-226. [25] 24. Uwe Deppisch, (1986) S-Tree: A Dynamic Balanced Signature Index for Office Retrieval, Proc. ACM SIGIR SIGIR Special Interest Group on Information Retrieval (Association for Computing Machinery) SIGIR Special Inspector General for Iraq Reconstruction conf, pp 77-87. [26] Walter W. Chang, Hans J. Schek, (1989) A Signature Access Method for the Starburst Database System, Proc. of VLDB, 145-153. [27] Wang-chien Lee, Dik L. Lee, (1992) Signature File Methods for Indexing Object-Oriented Database Systems, Proc. 2nd Intl. Comp. Sc. Conf, pp 616 622. [28] Yangjun Chen, (2002) Signature Files and Signature Trees, Information Processing Letters, 82, pp 213 -221. [29] Yangjun Chen, Yibin Chen, (2004) Signature File Hierarchies and Signature Graphs: a New Index Method for Object-Oriented Databases, Proc. of ACM Symp. on Applied Computing, pp 724-728. [30] Yangjun Chen, (2005) On the Signature Trees and Balanced Signature Trees, Proc. of ICDE ICDE International Conference on Data Engineering ICDE International Council for Open and Distance Education ICDE International Council for Distance Education , pp 742-753. [31] Yangjun Chen, Yibin Chen, (2006) On the Signature Tree Construction and Analysis, IEEE TKDE, Vol. 18, No. 9, pp 1207-1224. [32] Yoshiharu Ishiwaka, Hiroyuki Kitagawa, Nobuo Ohbo, (1993) Evaluation of Signature Files as Set Access Facilities in OODBs, Proc. of ACM SIGMOD, pp 247-256. I. Elizabeth Shanthi Dept of Computer Science Avinashilingam University History Avinashilingam University For Women was originally a part of University of Madras and in June 1987 this institution got its separate identity. It is now the largest institution in the country for imparting Home Science Education at all levels. for Women, Coimbatore, India. E-mail: shanthianto@yahoo.com R. Nadarajan Dept of Mathematics and Computer Applications PSG College of Technology PSG College of Technology, also called PSG Tech, is an certified institute located in Coimbatore in the state of Tamil Nadu, India. It was founded in 1951 by the Managing Trustee Sri G.R. Govindarajulu and the founder Principal Dr. G.R. Damodaran. Under the leadership of Dr. , Coimbatore, India. E-mail: nadarajan_psg PSG, n polysomnograph; polygraph performed during sleep. Physiological variables such as pulse, blood pressure, and respiration are monitored and charted. @yahoo.co.in
Table 1: Class relationships
S. No Class 1 Class 2 Relationship
1. University Dept Composition
2. University Student Aggregation
3. Student Programme Association
4. Dept Instructor Association
5. Student Male Generalization
6. Student Female Generalization
7. Programme Subject Association
Table 2: Sample attributes
Dept-name
Mathematics--1010 0000
Computerscience--0100 1000
Physics--0001 0010
Pgm-name
M.E--0001 0100
M.Sc (S.E)--1000 1000
M.Sc (Mat)--0100 1000
M.sc (Phy)--0011 0000
Inst-name
John--1000 0010
Adams--1000 1000
James--0000 1001
Janes--0110 0000
Stud-name
David--0100 0010
Elena--1001 0000
Maria--0100 0001
Peter--0010 0001
Grace--0000 1100
Antony--0000 0011
Male
Sex--0010000
Female
Sex-00001001
Table 4: Sample queries
1. List of students Pgm-name = Student 1000 1000
doing a given programme M.Se(S.E)
2. Instructors handling M.Sc(S.E) in Instructor 0100 0001
a given subject subjects
3. Dept(s) offering a 1. Comp. applns in Programme 0010 1000
given subject subjects
2. Pgm-name = Dept
00010100,00110000
3. All instructors of Dept-name = Instructor 0100 1000
a given dept Comp.sc
4. Female students Pgm-name = Female 1000 1000
attending a given M.Se(S.E)
programme
1. List of students Pgm-name = 1100 1010 (David)
doing a given programme M.Se(S.E) 1001 1000 (Elena)
1100 1001 (Maria)
2. Instructors handling M.Sc(S.E) in l110 1011 (John)
a given subject subjects 0100 1001 (James)
3. Dept(s) offering a 1. Comp. applns in 1011 1101 (M.E)
given subject subjects O111 1000 (M.ScPhy)
2. Pgm-name = 1101 1100 (Comp.sc)
00010100,00110000 0011 0010
(Phy)
3. All instructors of Dept-name = 1110 1011 (John)
a given dept Comp.sc 0100 1001 (James)
4. Female students Pgm-name = 1001 1001 (Elena)
attending a given M.Se(S.E) 1100 1001 (Maria)
programme
Figure 10: A typical signature node entry.
B1 Signatures having prefix B1
B2 Signatures having prefix B2
... ...
Bn Signatures having prefix Bn
Table 5: Signature tree Vs SD-tree
Parameter Signature tree SD-tree Inference
Time O(nF) O(nm) m < F;
complexity Faster
insertion
Tree height O([log.sub.2] n) O([log.sub.p] P>2;
(F/(p-1)) Shorter tree
Search cost O([lambda]. O([log.sub.p] F<n;
[log.sub.2] n) F+a) Cost < Sig.
tree
Space [nlog.sub.2] F + 2 O(F(a+f))
complexity [summation] [2.sup.i] < Sig. tree
(i+1) when k>F
i=0 to k
Figure 12: Sample Object schema.
Class : Subject
Sub-name
1. Software engg--0100 0001
2. Comp. Apples-- 0010 1000
3. Comp. Engg--0100 0100
4. Applied Maths--1000 0001
5. Calculus--0010 0100
6. Nuclear physics--0101 0000
Class : Male
Stud-name Pgm-name Sex Object signature
1. David M.Sc (S.E) Male 1101 1010
2. Peter M.E Male 1011 0101
3. Antony M.E Male 1001 0111
Class : Female
Stud-name Pgm-name Sex Obj signature
l. Elena M.Sc (S.E) Female 1001 1001
2. Maria M.Sc (S.E) Female 1100 1001
3. Grace M.Sc (Phy) Female 0011 1101
Class : Dept
Dept-name Programs Obj signature
1. Computer science M.E,M.Sc (S.E) 1101 1100
2. Mathematics M.Sc (Mat) 1110 1000
3. Physics M.Sc(Phy) 0011 0010
Class : Programme
Pgm-name Subjects Obj. signature
1. M.E Comp.apptns, 1011 1101
Applied Maths
2. M.Sc (S.E) Software engg, 1100 1101
Comp. engg
3. M. Sc(Phy) Comp. apples, 0111 1000
Nuclear physics
Class : Instructor
Inst-
name Dept-name Subjects Obj signature
l. John Comp.sc Software 1110 1011
engg,
Comp.
applns
2. Adams Mathematics Applied 1010 1101
Mathematic
s, calculus
3. James Comp.sc Software 0100 1001
engg
4. Janes Physics Nuclear 0111 0010
physics
Class : Student
Stud-name Pgm-name Obj signature
1. David M.Sc(S.E) 1100 1010
2. Peter M.E 0011 0101
3. Antony M.E 0001 0111
4. Elena M.Sc(S.E) 1001 1000
5. Maria M.Sc(S.E) 1100 1001
6. Grace M.Sc(Phy) 0011 1100
|
|
||||||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion