Printer Friendly

Explorations of computer database intrusion detection technology targeting at external forced entry.

1. Introduction

In an information society, whether a country, a small business or an individual will store massive important information on a computer system, and security of the computer information system has become an important issue for national sovereignty, trade secrets and personal privacy. Especially with the popularity of database management system (DBMS) (Zuech R., Khoshgoftaar T M., Wald R., 2015), all kinds of important information are stored in a more centralized manner; once an information leakage occurs, the consequences would be disastrous. The database intrusion detection technology is a currently widely used technology and management tool. It is a proactive security protection technology that provides real-time protection to internal attacks, external attacks and incorrect operations; it will give an alarm promptly when detecting intrusions, notify the system administrator to take preparedness measures, intercept and respond to invasions before the network system is endangered and adjust security policies and protective measures according to security events, while improving the effectiveness of real-time response and recovery afterwards, thus providing a basis for regular security assessments and analyses and improving the overall level of network security. The intrusion detection technology (Patel A., Taghavi M., Bakhtiyari K., et al., 2013) is a dynamic security protection technology. Different from the traditional database static security technology, it as security protective layer on top of the database management system layer strengthens the database system safety. Intrusion detection research can be traced back to the work by James Aderson (Liao H J., Lin C H R., Lin Y C., et al., 2013) in 1980. He first coined the concept of intrusion detection, and in the paper (Modi C., Patel D., Borisaniya B., et al., 2013) put forward using audit trail to monitor the threat of an invasion. The significance of this idea was not being understood at that time. In 1988, the Morris Internet Worm (Hanguang L., Yu N., 2012) caused the Internet to stop working for five days. This event posed urgent needs for computer security, which led to the research and development of many IDS systems. With changes in the detection environment, a number of research institutions have conducted fruitful research of distributed intrusion detection. Typical systems include the 1991 NADI (Ashoor A S., Gore S., 2011) (Network Anomaly Detection and Intrusion Reporter) and DIDS (Distributed Intrusion Detection System). These systems proposed collecting and consolidating audit information from multiple hosts to detect a series of coordinated attacks to the host. In 1994, the COAST Laboratory at Purdue University in the U.S. designed an AAFID (Autonomous Agents for Intrusion Detection) (Shanmugavadivu R., Nagarajan N., 2011) prototype. The system prototype performs testing by using autonomous entities (agents) (Uddin M., Rahman A A., Uddin N., et al., 2013), which has improved the IDS scalability, serviceability, efficiency and fault tolerance. In 2000, the Institute of Software Chinese Academy of Sciences proposed an Agent-based distributed intrusion detection system model which is an open system model with good scalability, easy intrusion detection of Agent, and also easy expansion of the new intrusion detection mode. Despite the intrusion detection research based on data mining has made numerous theoretical achievements and some systems have been applied in practice to a certain degree, this field of study still has many problems to be solved; for example, data mining in intrusion detection places little emphasis on discovering new knowledge (Abreu, A., Rocha, A., Cota, M. P., & Carvalho, J. V., 2015).

2. Database Intrusion Detection Technology

Intrusion Detection System (IDS) is a combination of software and hardware for intrusion detection (see Figure 1). The main functions of the IDS include:

1. Monitor and analyze user and system activities, search for unauthorized operations by illegitimate and legitimate users;

2. Detect the correctness and security vulnerabilities of system configuration, and prompt the administrator to fix vulnerabilities;

3. Make a statistical analysis of non-normal activities of users, and find the law of intrusions;

4. Check the consistency and correctness of system programs and data;

5. Capable of responding to real-time intrusion detection;

6. Audit trail management of the operating system.

[FIGURE 1 OMITTED]

2.1. Common Methods of Intrusion Detection

1. Misuse detection

Misuse detection (Liao S H., Chu P H., Hsiao P Y., 2012), also known as detection based on knowledge, makes detection by using known attack methods based on the defined intrusion patterns to determine whether these intrusion patterns occur. Analyzing the characteristics, conditions, arrangement, and relations between events of the intrusion process can describe specific signs of intrusion. The advantage of this method is high detection accuracy as judgments are made based on a specific feature library, and because a clear reference to the test results makes it convenient for system administrators to take appropriate measures. The main drawbacks are over dependence on a specific system, poor system migration and heavy maintenance workload. Applications of misuse detection contain expert systems, feature analysis, model-based reasoning, state transition, and Petri nets.

2. Anomaly detection

Anomaly detection (Lu M., Qian Z., Hong-mei Y A N., 2015), also known as behaviour-based intrusion detection, makes intrusion detection according to user behavior or normal degree of resource use rather than by a specific behavior, which is the current main research direction of the IDS. Anomaly detection is based on a hypothesis that there is a close correlation between program execution and user behavior in terms of system characteristics. There are two keys of anomaly detection. The first is to establish normal usage profile, and the second is to compare the model with the current system or user behavior to determine the degree of deviation from normal mode.

2.2. Database Intrusion Detection

1. Relational database intrusion detection system

Yip Chung, Michael Gertz and Karl Levitt presented the Detection of Misuse in Database Systems (DEMIDS) (Chen R M., Hsieh K T., 2012), which is tailored to detect misuse behavior, especially insider abuse by legitimate users. Among the relations in a database schema, the authors suggested detecting abnormalities by determining the relationship between attributes used in a query through primary and foreign key functional dependencies. To this end, the paper proposes the following concepts:

Working scopes: the behavior outline of users, which includes tables, attributes and attribute values that users frequently operate; these attributes are often cited simultaneously in a SQL statement. Working scopes are described with frequent itemsets.

Frequent itemsets: sets {[F.sub.1] = [f.sub.1], ... [F.sub.m] = [f.sub.m]}that are greater than the minimum support level and smaller than the maximum distance attributes and their values; represented by [sup, appe], where sup is support and appe is affinity.

support: support itemset {[F.sub.1] = [f.sub.1], ... [F.sub.m] = [f.sub.m]} is the probability that it appears in the audit record.

affinity degree: affinity of itemset {[F.sub.1] = [f.sub.1], ... [F.sub.m] = [f.sub.m]} refers to the distance measure between the attribute sets.

In frequent user mining itemset algorithms, DEMIDS uses the concept of distance to measure the attribute tightness between frequent itemsets. If the two attributes belong to the same relationship or linked by a series of foreign keys, they are considered as similar.

2. Fingerprint technology

Fingerprint technology (Paulauskas N., Garsva E., 2015) is particularly suitable for typical client/server database applications, because these applications often do not allow users to write their own queries, but offers several standard query formats through the interface. It is easy to generate fingerprints, with relatively low false alarm probability. In order to further improve the accuracy of fingerprint technology, two Boolean variables [F.sub.begin] and [F.sub.end] are used to mark the execution location of each fingerprint in the transaction, as shown in Fig. 2. If a transaction contains two SQL statements [r.sub.1] and [r.sub.2], its execution order would be [r.sub.1] - [r.sub.2] or the transaction only executes [r.sub.1] but not [r.sub.2], which means the execution of [r.sub.2] is based on the premise of the execution of [r.sub.1]. Then fingerprints formed by [r.sub.1] are [F.sub.begin] = 1, [F.sub.end] = 1; fingerprints formed by [r.sub.2] are [F.sub.begin] = 0, [F.sub.end] = 1.

[FIGURE 2 OMITTED]

3. Database Intrusion Detection Based on Data Mining

3.1. Correlation Analysis Method

Correlation analysis method is a data mining method that is the most frequently studied and used by people. It mainly finds certain relation between a group of objects in the database. Correlation analysis method can be divided into two categories: association rules and sequential patterns. Among them, association rules are to analyze a set of records and derive the relationship between projects in a given collection of items and some sets of records. Similar to association rules, sequential patterns also aim to dig out the relation between data, but sequential patterns analyze the causality between data and the sequential analysis algorithm can get the relationship between database records in the time window. Such algorithms can find some event sequential patterns that frequently appear as per some laws in audit data. These frequently occurring event sequential patterns can help select valid statistical characteristics in constructing the intrusion detection model. Association rules and sequential patterns are methods commonly used in the intrusion detection systems of network and operating systems. Association rules are to analyze a set of records and derive the implicative relationship between projects in a given collection of items and some sets of records.

Given a set I = {[i.sub.1], [i.sub.2], ... [i.sub.m]} of all items that are all fields in the database; D is a set of all transactions, namely the transaction database; each transaction T is a itemset. T [subset or equal to] I. Assume an itemset A, if and only if A [subset or equal to] T, the transaction T contains A.

Definition 4.1 association rule is similar to the implication expression A - B, where A [subset] I, B [subset] I and A [intersection] B = [PHI].

Definition 4.2 Rule A [right arrow]B establishes in the transaction set D, with the support degree of S. S indicates the percentage of containing A [union] B in transactions of D, which is the probability P (A [union] B) :

S (A [right arrow] B) = P (A [union] B) = [absolute value of (A [union] B)]/[absolute value of D] (1)

Where [absolute value of D] represents the number of transactions in the database D.

Definition 4.3 Rule A--B has confidence C in D; C is the percentage that D transactions contain both A and B, which is the conditional probability P (B | A), namely:

C (A [right arrow] B) = P (B | A) = [absolute value of (A [union] B)]/[absolute value of A] (2)

where [absolute value of A] is the number of transactions that contain itemset A in the database.

Definition 4.4 threshold. To find useful association rules from the transaction databases, we need to have users to determine two threshold values: minimum support threshold (mlll_sup) and minimum confidence threshold (min_conf) while rules that meet the minimum support and minimum confidence are called association rules.

Definition 4.5 A collection of sets is called itemset. An itemset that contains k items are called a k itemset. If the itemset meets minimum support, it is called frequent itemset.

3.2. The Apriori Algorithm

The commonly used data mining method in database intrusion detection are association rule mining and sequential rule mining, and generating frequent itemsets is a key step of these two data mining tasks. In recent years, people have carried out a lot of in-depth research concerning mining algorithms for frequent itemsets. In many algorithms, the Apriori algorithm proposed by Agrawal is the most famous.

The Apriori algorithm uses a hierarchical order cycle method to complete the excavation work of searching frequent itemsets. This cyclic process uses k-itemsets to generate (k + 1)-itemsets. The specific description is shown below:
BEGIN
Li= find_frequent_1-itemset (D);
for (k=2; [L.sub.k-1] [not equal to] [PHI] ;k++){
[C.sub.k] =apriori_gen ([L.sub.k-1]);
for each transaction t [member of] D {
[C.sub.t] == subset ([C.sub.k], t)
for each candidate c [member of] [C.sub.t]
c.count++;
}
[L.sub.k] ={c [member of] [C.sub.k] |c.count[greater than or equal to]
  min_sup};
}
return L = [U.sub.k][L.sub.k];
END

Subroutine
has_infrequent_subset(c, [L.sub.k-1])

BEGIN
for each(k-1)-subset s of c
if s [not member of [L.sub.k-1]
return TRUE;
return FALSE;


The Apriori algorithm generates frequent itemsets as per the following three steps:

1. Connection step

Use the property 1 to generate the set of candidate k-itemsets by self-joining frequent (k-1)-itemsets via [L.sub.k-1]. The candidate k-itemsets is denoted as [C.sub.k].

Assume p, q [member of] [L.sub.k-1], if

(p [1] = q [1] [conjunction] p [2] = q [2][conjunction] ... [conjunction] p [k - 1] = q [k - l]) (3)

Then the link of p and q is p [1], p [2], ... p [k -1], q [k -1]

2. Pruning step

[C.sub.k] generated by connections is a superset of [L.sub.k]. According to property 2 of the Apriori algorithm, if the (k-1) subset in a candidate k-itemset is not in [L.sub.k-1], then the candidate set cannot be frequent and may be deleted from [C.sub.k].

3. Scan the database

Scan the database and add up the times when each item of [C.sub.k] appears. If a record includes the candidate set, then the support count of the candidate set adds one, and finally by comparing the support degree and the minimum support degree prescribed by the user to determine whether the candidate set is a frequent itemset.

3.3. The Improved Apriori Algorithm

Trie is a tree structure down to the roots. The trie structure was first coined by Briandais and Fredkin for efficient storage and retrieval of dictionary structures. The Trie root depth is set as 0; the d layer node of Trie points to the d + 1 layer nodes, and this pointer is called the Trie edge, with each edge representing a letter. If node u points to node v, node u is the parent of node v, and node v is the chid of node u.

The Trie structure can not only effectively store English words but also suit for storing various finite sequence sets. By simply changing Trie edge to storing one item of the finite sequence set, each path of Trie stands for a sequence.

When frequently mining itemsets, letters can be used to show ordered itemsets. Thus, the candidate k-itemset [C.sub.k] = {[i.sub.1], [i.sub.2], ... [i.sub.k]} can be represented by words [i.sub.1], [i.sub.2], ... [i.sub.k] constituted by letters in the entry.

Figure3 is a Trie tree that stores the candidate set{C,D},{A,E,G},{A,E,L},{A,E,M},{K,M,N}. The node number is a symbol and will be used in algorithm realization.

[FIGURE 3 OMITTED]

4. Design and Realization of Intrusion Detection System

In this paper, misuse detection and anomaly detection features are combined to design and realize a new self-adaptive intrusion detection system. The Trie-based Apriori algorithm described in the second section will be applied to the database intrusion detection system in order to improve the efficiency of generating rule bases.

4.1. Anomaly Intrusion Detection Module

1. Data preprocessing

First, the audit records are grouped in accordance with the user ID. The association rule mining algorithm is used, which is to first divide UserID in records by several disjoint logical blocks, and each separately considers a block. This can be highly parallel and assigns records corresponding to each user to a processor for generating frequent itemsets and generating the corresponding rules and eventually merging all the rules. By classification, when generating frequent itemsets, user data are non-interfering, thus reducing the number of candidate sets to be analyzed, further improving the efficiency of the algorithm in practice and accelerating the processing speed of massive data.

In our system implementation, two enumeration types are defined; ActStatus indicates the operating status ID set while UserActivity indicates that the operating ID set, namely:

enum ActStatus{Fail=0, Suc, Disallow};

enum UserActivity{Login=3, Select, Update, Insert, Delete,

DropTable,AlterTable, CreateDB, DropDB, CreateTable,

PrivManage, AddUser, UpdateUser, DelUser, CreateRule, MapId};

Therefore, 0~2 is an operating status ID set; 3~20 is an operating status set and two operating IDs are reserved for DBMS extension; and the resource ID set starts from 21 until the computing performance shows the largest positive integer.

2. Produce association rules

The Apriori algorithm is used for data mining after pretreatment, and association rules produced are stored in a table named AssociationRule.

According to Apriori nature: all non-empty subsets in a frequent itemset must be frequent. Therefore, a frequent itemset and its subsets stand for the same user workspace in a relational database, then only by using the largest frequent itemset to represent user behavior pattern can we reduce redundancy.

Set min_supp=0.2, min_ conf=0.8. Use the Apriori algorithm based on Trie tree put forward in the fourth chapter to facilitate the program to mine audit data after pretreatment and generate the following rules:

4.2. Detection Results

1. Misuse detection results

Misuse detection relies on the construction of a misuse detection rule base. If the misuse rules are not well constructed, then the intrusion detection system cannot effectively detect intrusions. In this system, misuse detection can well detect attack attempts, but since the misuse detection rule base does not have intrusion rules tailored for impersonation attacks or legitimate user attacks, it cannot detect the two attacks. The results are shown in Table 2.

2. Anomaly detection results based on data mining

Anomaly detection based on data mining is to mainly find behavior contrary to the normal behavior, so for attacks of legitimate users, anomaly detection has a high detection rate of up to 80%~90%, and the correct detection rate is about 90%; for impersonation attacks, the detection rate is about 70%, but the correct detection rate is 90%. But for password guessing attacks (a large number of login failures in a short time), since anomaly detection of data mining is to model normal user operating behavior, only successful login records are analyzed during detection, so there is a low detection rate of attacking attempt types.

5. Conclusions

Due to the complexity of the database structure, with respect to intrusion detection of the network and operating systems, the database intrusion detection technology faces more research difficulties. Research of this paper realizes some features of the database intrusion detection system, but is far from perfection. The following main conclusions are drawn: composite intrusion detection engines combine the advantages of both misuse detection and anomaly detection, thus improving the intrusion detection rate; however, since the misuse detection results depend on rule bases, a rule base should be developed according to the actual situation to cover previously known attacks. Only in this way can it quickly and accurately detect intrusions; for anomaly detection based on data mining, association rules can be employed to achieve the anomaly detection module of database intrusion detection; besides, the trie-based Apriori algorithm proposed in this paper is adopted to quickly raise normal user behavior rules.

Recebido/Submission: 11/9/2015

Aceitacao/Acceptance: 28/11/2015

References

Abreu, A., Rocha, A., Cota, M. P., & Carvalho, J. V. (2015). Caderneta Eletronica no Processo Ensino-Aprendizagem: Visao de Professores e Pais de alunos do ensino Basico e Secundario. RISTI--Revista Iberica de Sistemas e Tecnologias de Informacao, 2015(16), 108-128.

Ashoor A S., Gore S. (2011). Importance of Intrusion Detection system (IDS). International Journal of Scientific and Engineering Research, 2(1), 1-4.

Chen R M., Hsieh K T. (2012). Effective allied network security system based on designed scheme with conditional legitimate probability against distributed network attacks and intrusions. International Journal of Communication Systems, 25(5), 672-688.

Hanguang L., Yu N. (2012). Intrusion detection technology research based on apriori algorithm. Physics Procedia, 24, 1615-1620.

Liao H J., Lin C H R., Lin Y C., et al. (2013). Intrusion detection system, A comprehensive review. Journal of Network and Computer Applications, 36(1),16-24.

Liao S H., Chu P H., Hsiao P Y. (2012). Data mining techniques and applications-A decade review from 2000 to 2011. Expert Systems with Applications, 39(12), 11303-11311.

Lu M., Qian Z., Hong-mei Y A N. (2015). Development of anomaly detection technology module based on ARM6410. Journal of Xi'an University of Science and Technology, 2, 021.

Modi C., Patel D., Borisaniya B., et al. (2013). A survey of intrusion detection techniques in cloud. Journal of Network and Computer Applications, 36(1),42-57.

Patel A., Taghavi M., Bakhtiyari K., et al. (2013). An intrusion detection and prevention system in cloud computing, A systematic review. Journal of Network and Computer Applications, 36(1), 25-41.

Paulauskas N., Garsva E. (2015). Computer system attack classification. Elektronika ir Elektrotechnika, 66(2),84-87.

Shanmugavadivu R., Nagarajan N. (2011). Network intrusion detection system using fuzzy logic. Indian Journal of Computer Science and Engineering (IJCSE), 2(1), 101-111.

Uddin M., Rahman A A., Uddin N., et al. (2013). Signature-based Multi-Layer Distributed Intrusion Detection System using Mobile Agents. IJ Network Security, 15(2), 97-105.

Zuech R., Khoshgoftaar T M., Wald R. (2015). Intrusion detection and Big Heterogeneous Data, a Survey. Journal of Big Data, 2(1), 1-41.

Wei Xiang-He (1) *, Zhang Hong (2)

* weixhwx@sina.com

(1) Nanjing University of SCI.&TECH., 210094, Nanjing, Jiang Su, China

(2) Huaiyin Normal University of SCI.&TECH., 223300, Huaian, Jiang Su,China

DOI: 10.17013/risti.17B.370-379
Table 1--Association Rules

Rule                                         Confidence level   Support

Rui [conjunction] User [??] AddUser            0.94               0.30
Rui [conjunction] User [??] DeleteUser         0.83               0.24
Rui [conjunction] Xuanke [??] Create Table     1                  0.32

                                               ...               ...

Table 2--Misuse Detection Results

Id     Resources   Analysis           Invasion    Detection
                                      time        time

001    DBMS        Landing failed     2014-5-25   2014-5-25
                   three times in     09:16:03    09:20:00
                   a minute

002    office      Operation failed   2013-4-21   2013-4-21
       material    ten times in two   09:37:18    09:40:12
                   minutes

...   ...        ...               ...        ...
COPYRIGHT 2016 AISTI (Iberian Association for Information Systems and Technologies)
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2016 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Xiang-He, Wei; Hong, Zhang
Publication:RISTI (Revista Iberica de Sistemas e Tecnologias de Informacao)
Date:Mar 30, 2016
Words:3701
Previous Article:Research of Hierarchical Random Graph Model based on maximum likelihood estimation.
Next Article:Applications of methods of information technologies in engineering and social sciences.
Topics:

Terms of use | Privacy policy | Copyright © 2022 Farlex, Inc. | Feedback | For webmasters |