Printer Friendly

Research on the computer-aided business process management based on data mining and apriori algorithm.

1. Introduction

Business process management BPM (Business Process Management) is developed on the basis of traditional workflow technology. It is a general term describing a set of services and tools that provide support for explicit process management. BPM can be regarded as an extension of traditional workflow management WFM (Workflow Management) system. WFM is a center of operation, its focus on the design of the sequence of operations, attention should be done rather than what can be done, and therefore limits the improvement of the operation (Aalst, 2004; Bhattacharya, 2007). BPM is mainly emphasized on the process of diagnosis and the human center, the goal driven process design, in order to improve the adaptability of the system. BPM is from the related business process change field, such as business process reengineering, business process flow improvement BPI, enterprise application integration technology of EAI (Enterprise Application Integration) development (Kim, 2007; Lee, 2009). Its core idea is to provide a unified modeling, implementation and monitoring of the environment for all kinds of business among enterprises (Kumaran, 2007; Abreu, 2015). The main functions of BPM include process analysis, process definition and re definition, resource allocation, process management, process quality and efficiency evaluation, process optimization, etc.. In the increasingly fierce competition in the business environment, the effective application of BPM will help enterprises to improve their information technology and competitiveness. The traditional business process model will focus on the control flow modeling, and the design of the data is only in the auxiliary position. In recent years, data centric business process is becoming a new trend in the development of BPM. Among them, the Artifact centric business process modeling is a typical representative. Artifact was first proposed by Nigam IBM and Caswell, which is defined as an identifiable, self described information unit entity. Subsequently, Gerede gives the formal definition of Artifact, and makes a static analysis of the business process modeling based on Artifact (Nadia, 2005; Rosa, 2011). Artifact is a data entity with behavioral characteristics in the business process, which plays a leading role in the process of business process operation. Artifact has different life cycles at each stage of the execution of the business process. Artifact lifecycle character contains the execution state of business process in the core business data and service, can be more comprehensive to business process management, promote cooperation among people of various fields, improve the efficiency of process management of enterprise. In order to better realize the process automation, Artifact as the center of the business process using SOA service oriented architecture Service as its physical realization of the technical approach (Yan, 2011). Web service is a mature technology in SOA architecture. The purpose is to ensure that the application services of different platforms can be interoperable. At the same time, Web service composition technology is the main way to build a service oriented, loose coupling and integrated application system. As an important value added function of service composition, it provides the basis for the application of the reuse and automation. BPEL is a business process execution language based on XML, and BPEL appears as a SOA service composition standard. At present, BPEL has been widely used in the implementation of Artifact centric business process reengineering and process automation. Artifact combines the data in the business process and the implementation process of the service, and completes the business process through the operation of the Web service on Artifact, which reflects the semantic of the whole business process.


2. Business process and data mining theory

2.1. Business process model

Artifact as the center of the business process model as a business process logic description model, it is through the use of a series of basic elements to describe the real business process. The basic elements of the process model include service elements (Service), warehouse elements (Repository), transmission pipeline elements (Connector), and Artifact type elements. Artifact is a data unit, which can be identified and described. It is a key function of data entity in the business process. It records the operating data throughout the execution of the business process and marks the progress and status of the process. Service element is an abstract description of the behavior characteristics of business activities in the real business process, which will update the property value of the Artifact in the process of execution, or create a new Artifact. The storage element describes the temporary or permanent storage of the Artifact, and the transfer of the Artifact between the service element and the service element is carried out in a request / response manner. The transmission pipeline connects the service element and the warehouse element in the process model, and reflects the flow of Artifact in the business process, which forms a complete business process model based on Artifact. December 13, 2007, the Ministry of Commerce released the Ministry of Commerce on promoting the views of the specification and development of e-commerce ", is committed to promote the healthy development of online trading norms, strengthen the standardization of online transactions, reduce transaction risk.

Definition 1: Artifact model A is a 2 tuple (U, t), which:

1. U is all the properties of finite sets, there is a special attribute called a, and a [member of] U;

2. [tau]: u [right arrow] D for a complete mapping function, where D for the domain of a finite set and d there are at least a logo domain.

Definition 2: Artifact type CA for a 2 tuple (A, L), which:

1. A artifact model, A is a tuple (U, [tau]), including: (a) u is all the properties of a finite set, there is a special attribute is called a for identifying attributes and a [member of] U; (b) [tau]: u [right arrow] D for a complete mapping function, where d for the domain of a finite set and d there are at least a logo domain.

2. life cycle model L Artifact model, the life cycle model L for a 2 tuple (Pn, [phi])

Customer point of the menu GC is the key to the restaurant process Artifact class, the entire business process around the key GC class Artifact start work.


Through the service of artifact for attribute value creation and update operations, through the warehouse of artifact of storage and eventually a complete archive of operation, transmission through the pipeline to build the transmission relationship between service, artifact types and warehouse.

2.2. Data mining

Data mining, also known as knowledge discovery KDD (Discovery in Database Knowledge), refers to the extraction of a large number of data from a valid, novel, potentially useful, and ultimately understandable patterns of non trivial process. Among them, the clustering analysis and association rules mining is two classical data mining methods, by referring to the related algorithm for artifact centric business process model for clustering and service composition patterns mining operations. Association rule mining is to find the relationship between itemsets in massive data. In the process of association rule mining, the support degree (support) and the confidence level (confidence) are two important indexes. In fact, only the support and confidence of the rules are larger rules are more valuable rules.

Support(X [??] Y) = P(Xu Y) (1)

confidence(X [??] Y) = P(y|X) (2)


Association rules can provide us with a lot of valuable information. Typically, in the case of association rules mining, minimum support and minimum confidence need to be specified in advance. The association rules between the minimum support threshold (minsup) and the minimum confidence threshold (minconf) can be satisfied simultaneously. The purpose of association rule mining is to obtain the strong association rules which satisfy the user set threshold condition. Apriori algorithm is one of the most influential association rules mining algorithm, and it is also one of the classical algorithms in data mining. First, the algorithm retrieves all the frequent itemsets in the database, i.e., the support degree is not lower than the threshold set by the user. Then, the rule that satisfies the user's least trust is constructed by using the frequent item sets.
Apriori algorithm:

Input: transaction database D, minimum support threshold minsup;

Output:frequent item L set in D.

Apriori(D, minsup)


(1) Li=find_frequent_i_itemsets{D};

(2) for(k=2; Lk-1[nor equal to][empty set]; k++){

(3) Ck=aproiri_gen(Lk-i,minsup);

(4) for each transaction t [member of] D{ /

(5) Ct=subset(Ck,t);

(6) for each candidate c [member of] Ct

(7) c.count+ + ;

(8) }

(9) Lk={c [member of] Ck|c.count[greater than or equal to]minsup}

(10) }

(11) return L= U kLk;



Clustering (Clustering) is the process of dividing a group of physical objects or abstract objects into similar objects. Among them, the class cluster is a collection of data objects, all objects in the same cluster are similar to each other, and the objects between clusters and clusters are different from each other. Clustering has been widely used in many fields, and one of the core applications is data mining and pattern recognition. In addition, cluster analysis can be used as a pretreatment for other operations.

3. Computer aided process model

Business process model is a valuable asset of the enterprise, which provides important data resources for business process reengineering, process optimization and so on. Therefore, it is of great practical significance to effectively manage and use the process model base. In order to reduce the redundancy of process model and improve the efficiency of the customer search process model, the similarity of business process model has become a hot research topic. At the same time, it is an important problem to calculate the similarity or distance between the two process models, in order to improve the operation of the process model, process integration and model analysis. Artifact centric business process model is composed of a series of service elements, warehouse elements, transmission pipeline elements and Artifact types through a certain business rules. In order to artifact centric business process model of the design process, in order to meet the artifact lifecycle reachability of the existence and uniqueness and persistence, to ensure the correctness of process model, process model design should follow some basic rules: (1) in the process model, between the service and the service does not allow direct connection transmission pipeline. Arrival, that is, the number of not more than the number of read write a similarly, warehouse and warehouse does not allow direct transmission pipeline connection; (2) service element is not allowed to transfer the same artifact types for reading the transmission pipeline, is not allowed to transfer the same artifact type transmission pipeline; (3) service element type transmission pipeline service element type is written transmission pipeline; (4) any artifact types have the complete life cycle along the transmission pipeline through a service path from its service archive service; (5) the artifact centric business process model is not allowed to have isolated elements and structure.

In the Artifact centric business process, Artifact is a data object in the business process, the operation of the Artifact process is the basic business process. Set Ai={a1, a2, ..., an} and A2, Aj={a1, ..., am}, respectively, representing the two to Artifact as the center of the business process in the key Artifact attribute set. Similarity calculation of Ai and Aj:

Sim([A.sub.i],[A.sub.j]) = [absolute value of ([A.sub.i][intersection][A.sub.j])]/[absolute value of ([A.sub.i][union][A.sub.j])] (3)

Similarity calculation of AI and AJ


Artifact centric business process in the two figure of the service node from the service input, output, the implementation of the premise and the impact of the implementation of the. Service S1 and S2 similarity calculation such as

Sim([S.sub.1],[S.sub.2]) = [[omega].sub.1] x Sim([in.sub.1],[in.sub.2])+ [[omega].sub.2] x Sim([out.sub.1], [out.sub.2]) + [[omega].sub.3] x Sim([pre.sub.1],[pre.sub.2]) + [[omega].sub.4] x Sim([e.sub.1],[e.sub.2])

Warehouse node is composed of Artifact type which is the name of warehouse and operation of warehouse.

Sim([R.sub.1],[R.sub.2]) = [[omega].sub.1] x Sim([n.sub.1],[n.sub.2]) + [[omega].sub.2] x Sim([A.sub.1], [A.sub.2])

The two part of the similarity calculation formula:



4. Experimental analysis

In recent years, more and more enterprises will be accumulated as a kind of valuable business assets, to provide data resources and knowledge for enterprises to improve their competitive advantage. It is very valuable for enterprises to make full use of these resources and knowledge, to dig out the new process model or to find the deficiency of the existing process model. The main purpose of the process model cluster is to cluster the existing process model. Process model cluster operations can be applied to a variety of process analysis areas, such as process improvement, process mining and process model analysis and so on. The results of the process model cluster provide the basis for process optimization, process discovery and process reengineering.

On the other hand, in order to realize the automation of process, the application of service-oriented architecture in business process management is becoming more and more popular. In general, it is difficult to meet the business function of the whole process, and the service composition technology is difficult to meet the needs of the business. As a matter of fact, it is a significant task to excavate the pattern of service composition in a large number of process models. Service composition pattern mining is a bottom-up, through the analysis of the process logs in the presence of the service dependencies to dig out the process of Web service composition model. Through the mining of service composition model in the business process, to provide the basis for the development of Web service providers to meet the market demand of Web services. Taking the restaurant business process model as an example, the feasibility of the Arti algorithm is verified. Experiment from the business process model repository for selected 30 process model as the matching model library, including 15 a similar process model and the process of the restaurant, the process model in the least number of nodes for four, the maximum number of nodes for 20. The parameters of the experiment are set as shown in Table 1.

Figure 5 is the number of models were 10, 15, 20, 25 and 30, respectively, when the implementation of the business process to match the execution time. Figure 6 is the total number of models to take 15, the number of model nodes are 4, 8, 12, 16 and 20 time algorithm execution time. According to the algorithm execution time of Figure 5 and Figure 6, when the number of the model and the number of nodes in the graph are relatively small, the Arti algorithm does not show a great advantage in the execution time. However, with the number of nodes in the process model and the number of nodes in the process model to a certain number of stages, the efficiency of Arti algorithm is significantly better than that of the heuristic greedy algorithm.



Recall recall and precision (precision) is similar research and correlation matching algorithm quality of main performance index evaluation of the business process. Figure 7 shows the total number of 10 respectively from the model, 15, 20, 25 and 30 in the restaurant business process test for recall. Experimental results showed that due to the Arti algorithm considers the nodes of the parameters matching, in the node mapping process ahead of the different types of nodes matched to the same set, can return to full model library to meet the conditions of the nodes, with higher recall. Only based on the greedy heuristic algorithm in average recall was 41.3%, and the average recall Arti algorithm for 61.4%.


Figure 8 shows the total number of 10 respectively from the model, 15, 20, 25 and 30 in obtaining restaurant business process testing precision. Greedy heuristic algorithm in the node mapping process only use keyword matching, and does not reflect the functions and semantic information of the node, the Arti algorithm is based on node function of similarity comparison, and further improves the precision. Only based on the heuristic greedy algorithm in this experiment, the average precision is 54%, and the average precision of Arti algorithm is 80.5%.


5. Conclusion

Process model mining technology is an important means to effectively use the existing process model knowledge, to realize the process reengineering, process optimization and so on. In this paper, several key problems in the process of business process model based on Artifact are studied deeply. Among them, the process model of similar process retrieval, clustering flow models are the basis for the research of; process model clustering is to achieve the process reengineering, process and operation of the preprocessing work. We propose a method for describing the artifact is the center of business process bipartite graph model and similarity matching algorithm. The two graph model and similarity matching algorithm solve the problem of similarity calculation of the business process model based on Artifact. It is an important index of the process model clustering, process model mining and so on. The experimental results show that the algorithm in the query time and the execution efficiency is superior to the existing algorithm has higher recall and precision.

Recebido/Submission: 20/04/2016

Aceitacao/Acceptance: 22/06/2016


Aalst, W. (2004). Business Process Management: A personal view. Business Process Management Journal, 10(2), 248-253.

Abreu, A., Rocha, A., Cota, M. P., & Carvalho, J. V. (2015). Caderneta Eletronica no Processo Ensino-Aprendizagem: Visao de Professores e Pais de alunos do ensino Basico e Secundario. RISTI--Revista Iberica de Sistemas e Tecnologias de Informacao, (16), 108-128.

Bhattacharya, K., Caswell, N. (2007). Artifact-centered operational modeling: Lessons from customer engagements. IBM Systems Journal, 46(4), 703-721.

Kim, W., Lim, K. (2007). An Approach to Service-oriented Architecture Using Web Service and BPM in the Telecom-OSS Domain. Internet Research, 17(1), 99-107.

Kumaran, S. (2007). Using Model-Driven Transformational Approach and Serviceoriented Architecture for Service Delivery Management. IBM Systems Journal, 46(3), 513-529.

Lee, G. (2009). Business process management standards: a survey. Business Process Management Journal, 15(5), 744-791.

Nadia, B. (2005). An Overview of Continuous Improvement: From the Past to the Present. Management Decision, 43(5), 761-771.

Rosa, M., Reijers, H. (2011). Apromore : An advanced process model repository. Expert Systems with Applications, 38(6), 7029-7040.

Yan, Z., Dijkman, R. (2011). Business process model repositories--framework and survey. Information and Software Technology, 54(4), 380-395.

Jianhu Gong

Computer Science and Engineering Department, Guangdong Peizheng College, Guangzhou 510830, China
Table 1--Parameter settings in experiment

wsubn    wskipe   wskipn   wsube

0.6      0.1      0.1      0.1

wsubn    [lambda]      [lambda]      [lambda]
         (string)      (node)        (model)

0.6      [3.sub.0.5]   [3.sub.0.5]   [3.sub.0.5]
COPYRIGHT 2016 AISTI (Iberian Association for Information Systems and Technologies)
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2016 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Gong, Jianhu
Publication:RISTI (Revista Iberica de Sistemas e Tecnologias de Informacao)
Date:Aug 1, 2016
Previous Article:Performance evaluation of logistics system based on analytic hierarchy process.
Next Article:Research on the regional differences of financial agglomeration and economic growth based on data mining.

Terms of use | Privacy policy | Copyright © 2020 Farlex, Inc. | Feedback | For webmasters