Printer Friendly

Automated Boolean Matrix data representation scheme through AVL tree for mining association rules.

Introduction

Mining Association rule is an appropriate and well researched tool of data mining to discover interesting relationships between variables in large databases. It was first introduced by (Sedgewick, et al., 1983), and has got widespread attention since the publication of his research and the introduction of the Apriori algorithm. This study has included generating association rules from frequent itemsets (Agrawal, et al., 1993; Mannila, et al., 1994; Agrawal, et al., 1996.). The initial research of this method was mainly driven by the analysis of market basket data. For example, from the transaction data of a supermarket, the rule {milk, egg} => {bread} is generated, which represents that customers who purchase milk and eggs also tend to buy bread as well. This relationship explains the behavior of consumers purchasing, and it can help managers to make better decisions in product promotion, shelf design, and customer relationship. Association rule mining has been applied to variant areas, including risk analysis in commercial institutions, medical sectors, basket data analysis, cross-marketing, catalog design, clustering, classification, etc.

In computer science and data mining, Apriori (Agrawal, et al., 1993; Agrawal, et al., 1996.) is a classic algorithm for learning association rules (Han, et al., 2001). Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation). In the first step, frequent itemsets are generated, i.e those itemsets which holds the criteria of minimum support (Mannila, et al., 1994). In the second and final steps, Rule generation is made possible by evaluation of the confidence measure.

In this study, we propose data representation scheme to improve the existing Apriori algorithm for mining association rules. The algorithm improves through representation scheme through AVL-Tree (a balanced tree) to provide more reduced time and condensed rules than the Apriori method. In addition, our method is able to get the last frequent itemsets and generate rules depending on data representation and Boolean Matrix. This paper is organized in six sections: section 2 reviewed related works, section 3 proposed an automated Boolean matrix data representation using AVL-Tree, section 4 demonstrated the benefits of data representation, section 5 presented the algorithms, section 6 discussed the experimental results, and section 7 concluded the proposed representation scheme.

Related work:

Basically, data that are related to businesses grow speedily, making data mining a reality and an essential field in our life, and has increased the interest in the database management system. In data mining, Apriori(Agrawal, et al., 1993; Agrawal, et al., 1996.) and FP-growth (frequent pattern growth) (Han, et al., 2001.)are the main techniques in Association Rule Mining (ARM) which is used to get sequences pattern for items in a database.

Mining frequent pattern is seemingly one of the most important approaches in data mining and is referred to the market basket analysis from the Association Rules Mining task (Yen, et al., 1996.). A pattern is a set of items or subsequences or substructures that occurs frequently in a data set.

An association rule has the support X [??] Y to be the percentage of transactions in the database that contain X and Y, and the formal definition of support(x) is based on (AggrawalandSrikant., 1996)work :

Let I = { i1, i2, ..., im) be set of m literals called items and the database D = (t1, t2, ..., tn) a set of n transactions, each consisting of a set of items from I. A transaction t E D is said to contain itemsetX if X [subset or equal to] t. Hence, the support of itemsetX is defined as in the equation (1) below (Chunjie, et al., 2009.):

support(X) = ft E D X [subset or equal to] {t E D} (1)

The Mining frequent pattern using Apriori algorithm has been used in many applications in data mining such as: Basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis.

Each transaction t [euro] D is an item set such that t is a proper subset of I. A unique identifier, which we call it as TID, is associated with each transaction. We say that a transaction t supports X, a set of items inI, if X is a proper subset of t. We assume that the items in a transaction or an itemset are sorted in lexicographic order.

An itemsetX has support s if s% of the transactions support X. Support of X is denoted as Supp(X). An association rule is an implication of the form

X [??] Y, where X and Y are subsets of I and X [intersection] Y = [empty set].

We say that the rule X [??] Y holds in the database D with confidence C if

confidence (X [right arrow] Y) = P(Y|X) = n(X [union] Y)/n(X) [greater than or equal to] C (2)

n( X)

(where [absolute value of A] is the number of occurrences of the set of items A in the set of transactions D, and A occurs in a transaction t, if and only if A is a proper subset of t). We also say that the rule X [??] Y has support S if

support (X [right arrow] Y) = p(X [union] = n(X [union] Y)/N [greater than or equal to] S (3)

where N is the number of transactions in D. Here, the support is a measure of the frequency of a rule, whereas, the confidence is a measure of the strength of the relationship between sets of items.

On the other hand, (Han, et al., 2006) has presented a new set based approach for getting frequent item sets. The best feature of one scan algorithm is that it requires only one scan of the database. They compared their algorithm with Apriori. In addition, (Gautan, et al., 2010) proposed MLBM algorithm only scans the transaction database once, and it adopts the Boolean vector to discover frequent itemset. Other authors (Zhu, et al., 2010)present the combination of the cloud model based on generalization method with Apriori algorithm.

(Coenen, et al., 2004) presented how to compute partial support counts in one DB-pass and how to store them in an enumeration tree (P-Tree). In addition, (Yu, et al., 2006) proposed a novel algorithm so called reduced apriori algorithm with tag (RAAT), which reduces one redundant pruning operation and compares with Apriori algorithm. On the other hand, (Gautan, et al., 2010) proposed Association Rule Based on Boolean Matrix algorithm, which only scans the transaction database once, and discovers frequent item set. In addition, (He, et al., 2004) proposed a frequent pattern discovery method for outlier detection.

AVL tree proposed by G.M Adelson-Velskii and E.M Landis (Sedgewick, 1983), and hence, its name Adelson-Velskii Landis, is a self-balanced tree; that means the heights of the children (any two sub-trees) can differ by at most one. Using an AVL tree is better than simple data structures such as vectors or arrays where complexity of running time usually takes the form of O(log n) time in both the average and worst cases (Gilberg, et al., 2005).

In this study, there are two main phases involved: firstly, a new transactional data representation scheme in the Boolean matrix form is proposed to help us to generate N- frequentitemset, and secondly the improvement to get association rules for the proposed Apriori algorithm, because of its efficiency to detect the last frequent itemsets.

Automated Boolean Matrix with Association Rules (ABMAR) involves several common steps in Apriori algorithm such as follows:

i. Compute unduplicated sorted n items and m transactions during loading process by AVL tree.

ii. Save all unduplicated n items in itemVector.

iii. Get the value of parameters: number_of_items, and number_of_rows automatically, then initialize settings for minSupport and minConfident.

iv. Use regular expression to replace items by itemVector's index.

v. Replacement process will reduce the capacity of Boolean Matrix.

vi. The columns of Boolean Matrix are discretized data while the number of rows is equivalent to the number of transactions.

vii. Apply BinApriori on Boolean Matrix, then generate n-Candidates and get m-Frequent itemsets.

viii. BinApriori will complete all frequent itemsets and save them with their minSupport values in AVL tree.

ix. Capture the last frequent and set confidence value.

x. obtain all possible subsets with their weight by set theory.

xi. Generate all possible association rules depend on weight and confidence value.

In general, the conceptual framework has involved two main phases: First, initial phase is executed by loading data from transaction table and scan the database once. The main phase (I) is called the data representation scheme to represent the data from transaction database to Boolean Matrix automatically. In addition, N-frequent itemsets are generated from the Apriori algorithm by using Boolean Matrix from the previous stage. The implementation has been carried out under Java JDK 1.6 platform and uses object oriented programming concept. Table 1 and Table 2 show the example of the first stage of phase (I). Supposed, if we have a synthetic data 'Sparse' which is a dataset with multi-scattered records with each record contains several data items.

Data representation starts after loading data that are saved in AVL-tree (balanced binary tree). The AVL-tree has several benefits such as saving balanced data, giving sorted data when using Inorder-Traversal-Tree. It also has an ability to save unduplicated data. The Inorder-Traversal-AVL-Tree process gives us sorted unduplicated data that are saved in vector or sorted unduplicated array. In addition, this array is used in data discretization which is a part of data reduction for a given continuous number of values, especially for numerical data. Replacing the values of the array by their index values is the discretization process

The first step is to set the value of minimum support number(S) to 0.22 and the required minimumconfidence (C) to 0.7. Table 1 contains the transaction items, and Table 2 contains the discretized data, which is a part of the data reduction for a given continuous number of values, especially for numerical data. The discretization process is done by replacing the index of array in Table 3 that is generated from AVL-tree, which gives unduplicated sorted frequent patterns.

Automatically, Boolean Matrix in Table 4 is generated by phase (I) - stage (I) in previous section by transferring discretized table to Boolean Matrix. In addition, the number of items in the whole dataset and the transaction table's rows are computed. In this situation, we have 5 items, and 9 transactions (rows). This approach has decided the number of itmes is equivalent to the number of columns, and the number of transaction is equivalent to the number of rows of the Boolean Matrix.

The data representation which is related to AVL-tree for sorting Frequent Pattern and Boolean(Binary) Matrix will be applied during Association Rules Mining.

Here ct counts all 1 's in each column, and r is the number of transactions (rows). In Table 4, the 10th row of the Boolean Matrix presents the count number of all 1's in column one, which is 6, where the value 6 is then divided by 9 transactions (rows) and equals to 0.66 by the following equation:

computed _ weight = count 1's in column/number of trans (rows) [greater than or equal to] S (4)

The last step of phase(I) is generation of rules that is based on the equivalence subset theory that can be represented as a set of specific rules. In this example, that last frequentitemset is {1, 2, 5}. Therefore, several rules generated are Rules1: 1 ^ 2 [right arrow] 5, Rules 2:1 ^ 5 [right arrow] 2, Rules 3: 2 ^ 5 [right arrow] 1, Rules 4:1 [right arrow] 2 ^ 5, Rules 5: 2 [right arrow] 1 ^ 5 and Rules 6:5 [right arrow] 1 ^ 2.

Benefits of data representation:

1. Transformation data from source-data-file to AVL tree that is a balanced data structure for getting non-duplicated items sorted and maintained actual elements of data, and increased processing efficiency.

2. Reduce computational processing time.

3. Obtain multi-data types of represented data storage such as tree, hash table or Binary Matrix.

4. Use AVL tree and Automated Binary Matrix to get Association Rules Mining.

5. Capture Frequent Pattern by apply Apriori algorithm.

The above Figures 3 and 4 show how to create AVL-Tree from the transaction database. The first tree inserts 1, 2, and 5 and rotates its nodes to keep data as balanced. During this process the duplicated data will ignore the insert to the tree. For this reason, we read the data and use AVL-tree to save balanced unduplicated of that data, during which the process of counting the items and transactions is done, and getting sorted unduplicated data by inorder function of AVL-tree. Thus, the sorted unduplicated data is saved in Vector that is used in discretization phase. The data of vector is replaced by index of vector, which starts from 1 to number of n-unduplicated items. In addition, the benefit of this operation is to reduce the occupation of discretized data in Boolean matrix, and hence, applying Apriori algorithm on this matrix to catch the n-frequent itemset.

Algorithms:

In this phase, a new transactional data representation scheme in the automated Boolean matrix form is proposed to help us to generate n-frequentitemset and get association rules. During this representation, the data read from AVL-tree to get unduplicated items and return the number of items and transactions automatically, then generate Boolean matrix, which put 1 if the item is available in original dataset, and otherwise put 0. Applying Apriori algorithm on this matrix to obtain n-frequentitemset is then performed.

The Figures 1, 2, 3 and 4 show the data representation of the Boolean Matrix from the source data file and then apply Apriori algorithm with Boolean Matrix to get n-frequent items and association rules, respectively.

Fig. 5: ALGORITHM I (DRBM): Data representation to Boolean Matrix
from the source data file.

Input: unduplicated sorted items obtained from AVL tree

Output: Boolean Matrix

Start
/*generate Boolean matrix during replacement process by regular
expression automatically */

num_of_items = fbobj.retuenNumOfItems();
num_of_rows= fbobj.retuenNumOfrows;
itemsVectore= fbobj .reurnNondublicateSorteditems();
wholefile=fbobj.reurnDataFile();
dataReplacement=
fbobj.PatternMacher(itemsVectore,wholefile);
CallSaveReplacementData();
intbinaryMatrix[] []=
newint[num_of_rows][num_of_items];

/*the matrix columns are the data items that are been in replacement
file, and the rows of file as same as rows of that matrix*/

while(!EOFile)
 {
int y=Scanner.readItem();
   if(newline)
            matrixrow++;
else
booleanMtrix[matrixrow][y-1]=1;
returnbooleanMtrix;
 }
End

Fig. 6: ALGORITHM II: Applied Apriori algorithm with Boolean Matrix.

Input: number of items, number of transaction, minSupport,
minConfident

Output: n-frequent itemsets, association rules
Start
/*generate all possible candidates of n-frquentitemsets form Boolean
Matrix that is generated during data representation scheme
*/

readDataToAVL_tree(sourceDataFile);
numberOfItems=automatedReturnNumOfItems();
numberOfTrans=automatedReturnNumOfTrans();
dataInitialization(sourceDataFile,numberOfItems, numberOfTrans);
while (possibleCandidates is not empty)
  {
noItemset= noItemset+1;
generateAllPossibleCandidates(noItemset);
getFrequentItemsets();
b=captureFrequentItemsets();
if (b==true) then
  {
constructCompletedFrequent ();
  saveFrequentWithMinSup ();
    }

  }
End


Experimental Results & Discussion:

There are five datasets where two were obtained from the FIMI Repository and the other three datasets were obtained from the UCI Repository. Two synthetic datasets were chosen for the experimental testing in the research. The datasets were chosen based on number of transactions, average length and type (Sparse). The datasets are FIMI Repository's T10I4D100K and T40I10D100K data.

We ran our experiments by using PC with these specifications: Intel(R) Core(TM) Duo CPU 2.20 GHz with 2 GB RAM and Windows XP with 32-bit operating system. We also used the concept of object oriented programming by Java programming language to implement our code.

We used synthetic data 'Sparse', which is a dataset with multi-scattered records that are characterized by normal sparsity. Each record contains several attributes (i.e., columns in a database scheme), which can be viewed as a vector data. Sparsity is related to the average records that are no "similar" length in the multi- dimensional space and are defined by the attributes. The "Sparsity" is analytically well-constituted and related to "long-tail" circumstance: individual transactions attend to irregular sold items (Leskovecet al., 2007)

The developed system returned us the number of items and the number of transaction automatically without the user determining them (e.g. T40I10D100K has 942 items and 100000 rows).

Finally, our experiments computed the n-frequent itemset with their threshold minSupport. Table 8 shows n-frequent itemsets with their threshold minSupport of T10I4D100K dataset.

The process of generation of association rules starts with finding all subset of all the sets that are the last frequent itemset of a dataset. For example, if we choose (354, 494, 501, 696,732) with its threshold minSupportand confidence of 0.00507 and 70%, respectively, which resulted in the rejection of 6 rules and selection of 18 rules that are shown in Table 9.

Three datasets (anneal, which is synthetic and others, which are non-synthetic) obtained from the UCI Repository of machine learning databases repository were chosen for the next experimental testing in the research..

The composition of the matrix depended highly on the discretized data, which constitute the columns' data. Hence, the number of items and transactions were computed automatically using the AVL-tree with the support and confident numbers are set-up during the initial stage. The computational time of the data transformation from datasets to AVL-tree is between 0.109 seconds to 1.812 seconds, providing us with several benefits such as save balanced and sorted data.

Summarizing, data mining employs several techniques to extract association rules, such as Apriori and FP- Growth. Mining Association Rule is an appropriate tool and has been one of the well-researched approaches of data mining used to discover entertaining relationships between items in databases.

Our objective was how to improve efficiency and generate association rules by a new data representation through AVL-Tree from the transaction of data, and generate n-frequent itemsets. The benefit of AVL-Tree is that it keeps the data as balanced values. Thus, the proposed representation scheme transfers the 'dataset' to Automated Boolean Matrix (ABM) and then builds association rules.

Conclusion:

Basically data, which are business related, grow hastily, thus making data mining a reality and essential field in our life, where searching for meaningful information in huge databases has constituted a fundamental issue. That tells why mining association rules is a widespread technique in data miming.

This study has demonstrated that Apriori with data representation scheme can reduces computational time, improves the process of finding n-frequentitemset and generates association rules faster than the original Apriori. In addition, the database only being scan once, before being automatically transform to Boolean Matrix. Finally, applying Apriori to get the frequent itemset and their rules depend on minSupport and confidence values, which have controlled over the number of association rules.

References

Sedgewick, R., Algorithms, Addison-Wesley, 1983. ISBN 0-201-06672-6, page 199, chapter 15: Balanced Trees.

Agrawal, R. and R. Srikant, 1994.Fast algorithms for mining association rules in large databases. In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo, editors, Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, pages 487-499, Santiago, Chile.

Mannila, H., H. Toivonen and A. Verkamo, 1994."Efficient algorithms for discovering association rules." AAAI Workshop on Knowledge Discovery in Databases (SIGKDD). Seattle, 181-92.

Agrawal, R. and R. Srikant, 1995. "Mining Sequential Patterns", Proc. Internal Conf. Data Engineering, pp: 3-14.

Agrawal, R., M. Mehta, J. Shafer, R. Srikant, A. Arning and T. Bollinger, 1996. "The Quest Data Mining System", Proc. of KDD'96, USA, pp: 244-249.

Yen, S.J. and A.L. P.Chen, 1996.An Efficient Approach to Discovering Knowledge from Large Databases. In Proc. of the IEEE/ACM International Conference on Parallel and Distributed Information Systems, pp: 8-18.

Han, J. and M. Kamber, 2001. Data Mining Concepts and Techniques: Morgan Kaufman.

Han, J., P. Jian, Y. Yiwen and R. Mao, 2004. Mining frequent patterns without candidate generation. Data Mining and Knowledge Discovery, (8): 53-87.

He, Z., J.Z. Huang, X. Xuand D. Shengchun, 2004.A Frequent Pattern Discovery Method for Outlier Detection. In SpringerLink (Ed.), Lecture Notes Computer Science. (3129/2004): 726-732 .Springer Berlin/ Heidelberg.

Coenen, F., G. Goulbourne and P. Leng, 2004. "Tree structures for mining association rules." Data Mining and Knowledge Discovery (8.1): 25-51.

Gilberg, R.F. and B.A. Forouzan, 2005. Data Structures: A Pseudocode Approach with C, 2nd Edition, Thomson Course Technology.

Han, J. and M. Kamber, 2006. Data mining concepts and techniques', Elsevier Inc., Second Edition, San Francisco.

Kotsiantis, S. And D. Kanellopoulos, 2006. Association Rules Mining: A Recent Overview, GESTS International Transactions on Computer Science and Engineering, 32(1): 71-82.

Leskovec, J., L. Adamicand, B. Huberman, 2007. The Dynamics of Viral Marketing. ACM Transactions on the Web (TWEB), 1: 1.

Yu, W., X. Wang, F. Wang, E. Wang and B. Chen, 2008. The research of improved apriori algorithm for mining association rules, the 11th IEEE International conference technology proceedings. Sch. of Inf. & Eng., Northeast Dianli Univ., Jilin 513-516.

Chunjie, B. and W. Xiaoping, 2009. An Algorithm for Mining Association Rules Based on Sets Operation, Second International Workshop on Computer Science and Engineering, China.

Gautan, P. and K.R. Pardasani, 2010. A Fast Algorithm for Mining Multilevel Association Rule Based on Boolean Matrix,IJCSE.

Zhu, Y.,Y. Wang, S. Wang and X. Guo, 2010. Mining Association Rules Based on Cloud Model and Application in Credit Card Marketing, Asia-Pacific Conference on Wearable Computing System, IEEE, China.

Fimi Frequent Itemset Mining Dataset Repository retrieved from the World Wide Web: http://fimi.cs.helsinki.fi/data/

UCI Repository for Machine Learning Databases retrieved from the World Wide Web: http://www.ics.uci.edu or http://archive.ics.uci.edu/ml/

Agrawal, R., T. Imielinski and A. Swami, May 1993.Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 207-216, Washington D.C.

Ghassan Saleh Al-Dharhani, Zulaiha Ali Othman, Azuraliza A. Bakar,

Centre for Artificial Intelligence Technology Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Malaysia.

GhassanSaleh Al-Dharhani, Zulaiha Ali Othman, AzuralizaA.Bakar: Automated Boolean Matrix Data Representation Scheme through AVL tree for Mining Association Rules

Corresponding Author: GhassanSaleh Al-Dharhani, Centre for Artificial Intelligence Technology, Faculty of Information Science and Technology, UniversitiKebangsaan Malaysia, 43600, Bangi, Selangor DarulEhsan, Malaysia

Tel.: +60 12-2306594; E-mail: ghassanxyz@gmail.com

Table 1: Transactions Items

TID     List of Items    TID     List of Items

T100    7, 8, 55         T105    8, 9
T101    8, 10            T106    7, 9
T102    8, 9             T107    7, 8, 9, 55
T103    7, 8, 10         T108    7, 8, 9
T104    7 ,9

Table 2: Transactions Items

TID    List of Items    TID    List of Items

T1     1, 2, 5          T6     2, 3
T2     2, 4             T7     1, 3
T3     2, 3             T8     1, 2 ,3, 5
T4     1, 2, 4          T9     1, 2, 3
T5     1, 3

Table 3: Sorted Frequent Pattern Array

1    2    3    4     5
7    8    9    10    55

Table 4: Automated Boolean Matrix

        1       2       3       4       5

1       1       1       0       0       1
2       0       1       0       1       0
3       0       1       1       0       0
4       1       1       0       1       0
5       1       0       1       0       0
6       0       1       1       0       0
7       1       0       1       0       0
8       1       1       1       0       1
9       1       1       1       0       0
Ct      6       7       6       2       2
ct/r    0.66    0.77    0.66    0.22    0.22

Table 5: shows the Datasets

Data set         T10I4D100K    T40I10D100K

No. of Items     870           942
Avg. Length      10            40
No. of Trans.    100, 000      100, 000
Type             Sparse        Sparse
Size             3. 93 MB      14. 8 MB

Table 6: shows the meaning of synthetic datasets parameters

T    Average length of the transaction
I    Average size of frequent itemsets
D    Number of transactions
K    Thousand

Table 7: The comparison of execution time (seconds) between Apriori
with Boolean Matrix and normal Apriori with 10000 transactions and
868 items

thresholdminSup(%)     Apriori     Data Repres.    Apriori Bool.
                                                   Matix

1                      8630.668    111.822         153.899
1.5                    4868.884    171.429         62.525
2                      2111.979    180.102         33.072
2.5                    1062.788    110.464         15.382
3                      446.751     171.083         8.412
3.5                    255.013     170.305         6.131
4                      180.398     163.706         5.101

Table 8: n-frequent itemsets with their threshold minSupport of
T10I4D100K dataset

n-frequent itemset     threshold minSupport>=0.005

32,192,245,298,446     0.00732
184,250,393,773,846    0.00693
192,474,573,806,828    0.00559
302,354,494,501,696    0.00504
302,354,494,501,732    0.00503
302,354,494,696,732    0.00509
302,354,501,696,732    0.00509
302,494,501,696,732    0.00503
354,494,501,696,732    0.00507

Table 9: The subset of frequent {354, 494, 501, 696, 732} with its
threshold minSupport of 0.00507.

Mining Association Rules           minSup s    confidence
                                               number c

354    -->[494, 501, 696, 732]     2.047       24.768%
494    -->[354, 501, 696, 732]     1.589       31.907%
501    -->[354, 494, 696, 732]     2.164       23.429% Rejected
696    -->[354, 494, 501, 732]     2.237       22.664% Rejected
732    -->[354, 494, 501, 696]     0.752       67.42% Rejected
354, 494 -->[501, 696, 732]        0.61        83.115% Selected
354, 501-->[494, 696, 7            0.617       82.172% Selected
354, 696-->[494, 501, 732]         0.826       61.38% Rejected
354 ,732-->[494, 501, 696]         0.603       84.08% Selected
494, 501-->[354, 696, 732]         0.601       84.359% Selected
494, 696-->[354, 501, 732]         0.604       83.94% Selected
494, 732-->[354, 501, 696]         0.592       85.642% Selected
501, 696-->[354, 494, 732]         0.619       81.906% Selected
501, 732-->[354, 494, 696]         0.6         84.5%
696, 732-->[354, 494, 501]         0.607       83.526% Selected
354, 494, 501     -->[696, 732]    0.56        90.536% Selected
354, 494, 696     -->[501, 732]    0.567       89.418% Selected
354, 494, 732     -->[501, 696]    0.561       90.374% Selected
494, 501, 696     -->[354, 732]    0.559       90.698% Selected
494, 501, 732     -->[354, 696]    0.558       90.86% Selected
501, 696, 732     -->[354, 494]    0.57        88.947% Selected
354, 494, 501 696-->[732]          0.533       95.122% Selected
354, 494, 501 732-->[696]          0.531       95.48% Selected
494, 501, 696 732-->[354]          0.531       95.48% Selected

Mining Association Rules           Rejected /
                                   Selected

354    -->[494, 501, 696, 732]     Rejected
494    -->[354, 501, 696, 732]     Rejected
501    -->[354, 494, 696, 732]
696    -->[354, 494, 501, 732]
732    -->[354, 494, 501, 696]
354, 494 -->[501, 696, 732]
354, 501-->[494, 696, 7
354, 696-->[494, 501, 732]
354, 732-->[494, 501, 696]
494, 501-->[354, 696, 732]
494, 696-->[354, 501, 732]
494, 732-->[354, 501, 696]
501, 696-->[354, 494, 732]
501, 732-->[354, 494, 696]         Selected
696, 732-->[354, 494, 501]
354, 494, 501     -->[696, 732]
354, 494, 696     -->[501, 732]
354, 494, 732     -->[501, 696]
494, 501, 696     -->[354, 732]
494, 501, 732     -->[354, 696]
501, 696, 732     -->[354, 494]
354, 494, 501 696-->[732]
354, 494, 501 732-->[696]
494, 501, 696 732-->[354]

Table 10: Results from the UCI Repository Datasets.

Data set Name       anneal     chess     waveform

Non-Dup-items       71         58        101
No. of Trans.       898        28056     5000
No. of Attributes   Avr (13)   7         22
synthetic           YES        NO        NO
minSup 0.2          8.861s     16.942s   1327.568s
minSup 0.4          1.154s     8.595s    1210.750s
minSup 0.6          0.281s     3.822s    708.2756s
minSup 0.8          0.141s     3.222s    120.1424s
minSup 1.0          0.109s     2.917s    34.4845s
COPYRIGHT 2013 American-Eurasian Network for Scientific Information
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2013 Gale, Cengage Learning. All rights reserved.

 
Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Original Article
Author:Al-Dharhani, Ghassan Saleh; Othman, Zulaiha Ali; Bakar, Azuraliza A.
Publication:Advances in Natural and Applied Sciences
Article Type:Report
Date:Jul 1, 2013
Words:4574
Previous Article:Effect of silica content on the acid resistance of enamels.
Next Article:Measuring corporate social responsibility commitments in the Malaysian Financial Services Industry.
Topics:

Terms of use | Privacy policy | Copyright © 2018 Farlex, Inc. | Feedback | For webmasters