Printer Friendly

Intrusion detection by backpropagation neural networks with sample-query and attribute-query.

Abstract: The growing network intrusions have put companies and organizations at a much greater risk of loss. In this paper, we propose a new learning methodology towards developing a novel intrusion detection system (IDS) by backpropagation neural networks (BPN) with sample-query and attribute-query. We test the proposed method by a benchmark intrusion dataset to verify its feasibility and effectiveness. Results show that choosing good attributes and samples will not only have impact on the performance, but also on the overall execution efficiency. The proposed method can significantly reduce the training time required. Additionally, the training results are good. It provides a powerful tool to help supervisors analyze, model and understand the complex attack behavior of electronic crime.

Keywords: intrusion detection system (IDS), backpropagation neural networks (BPN), Query-based learning.

I. Introduction

The enlargement of this electronic environment comes with a corresponding growth of electronic crime where the computer is used either as a tool to commit the crime or as a target of the crime [1]. In past years, numerous computers are hacked because they do not consider the necessary of precautions to protect against network attacks. The failure to secure their systems puts many companies and organizations at a much greater risk of loss. Usually, a single attack can cost millions of dollars in potential revenue. Moreover, that's just the beginning. The damages of attacks include not only loss of intellectual property and liability for compromised customer data (the time/money spent to recover from the attack) but also customer confidence and market advantage. There is a need to enhance the security of computers and networks for protecting the critical infrastructure from threats. Accompanied by the rise of electronic crime, the design of safe-guarding information infrastructure such as the intrusion detection system (IDS) for preventing and detecting incidents becomes increasingly challenging. Figure 1 illustrates the intrusion detection system and external/internal network intrusion attacks.


The intrusion detector learning task is to build a predictive model (i.e. a classifier) capable of distinguishing between bad intrusions and normal connections. Recently, an increasing amount of research has been conducted on applying neural networks to detect intrusions. An artificial neural network consists of a collection of processing elements that are highly interconnected. Give a set of inputs and a set of desired outputs, the transformation from input to output is determined by the weights associated with the interconnections among processing elements. By modifying these interconnections, the network is able to adapt to the desired outputs. The ability of high tolerance for learning-by-example makes neural networks flexible and powerful in IDS. However, the time required to induce the model from a large dataset is long. Our paper introduces a novel query-based methodology for learning intrusion detection by neural networks with sample-query and attribute-query. Our method first applies the information theory to select good attributes. Then, we use the query-based methodology [2] to include a subset of samples in learning. Choosing good attributes and samples will not only have impact on the performance, but also on the overall execution efficiency. We examine our method by a benchmark intrusion dataset to verify its feasibility and effectiveness. Results show that the proposed method can not only reduce the training time but also improve the training results. We can accurately predict probable attack behavior in IDS.

II. Related Works

As network intrusions are constantly changing, a flexible IDS is required to analyze the enormous amount of network traffic in a manner which is less structured than the traditional rule-based system. In [3] [4], neural networks have been proposed as alternatives to the statistical analysis component of anomaly detection systems. They determine what is normal and flag for further inspection if an abnormal or anomalous event occurs. In [5] [6], neural networks have been applied to build keyword-count-based misuse detection systems. The data presented to the systems consisted of attack-specific keyword-counts in network traffic. Using neural networks, [7] analyzed program behavior profiles for both anomaly detection and misuse detection to identify the normal system behavior. In [8], a neural network detection system is developed where packet-level network data was classified according to 9 packet characteristics. In [9], a statistical neural network classifier for anomaly detection is developed. It can identify UDP flood attacks. Comparing different neural network classifiers, the backpropagation neural network (BPN) has showed to be more efficient in developing IDS. However, the time required to induce models from large datasets is long. In our experiments of learning an IDS dataset with nearly 500000 samples, BPN never gets a feasible result within the pre-specified criteria. It is time-consuming. Therefore, we focus on the combination of data reduction and classification with a query-based learning methodology in this paper. By analyzing and identifying the most important components of training data, we can reduce processing time, communications overhead and storage requirements in mining network intrusion.

III. Proposed Method

A learning machine consists of a learning protocol to specify the information achievement manner, and a deduction procedure to learn the correct concept [10]. For a learning protocol, the input information can be examples that exemplify the concept to be learned, or oracles that, when presented with data, tell whether or not the data exemplify the concept. Therefore, we can apply not only the samples presently at hand but also extra samples produced by the oracle to train a system. When the point of query is set as y, the oracle would respond with a(y). The pair (y, a(y)) is called the queried sample.

A. Sample-Query

According to [2] [11] [12], samples from the decision boundary can produce the best training results. We want to decides the points y to let a(y)=0.5. Notably, conventional approaches have assumed that, for each input or output point, the oracle knows its input-output pattern. Randomly select a boundary point P, its conjugate data pair (points [P.sup.+] and [P.sup.-]) can be extracted along the reverse boundary. Notably, samples with P, [P.sup.+] and [P.sup.-] are arbitrary input-output patterns. However, we may not have experts or simulators or the oracle may be very expensive for specifying the correct output. To resolve this drawback, we divide the samples into one training set and one query set. Then, an oracle is designed to follow the self-regulation rule [12] to select samples (environment-focus) those are close to the conjugate data pair (self-focus). It provides system the ability to interact with the environment to train the system by queried samples. As [13] has reported, the system could use some particular samples in the data set to learn almost completely what the full data set is taught. In this paper, an oracle is designed to achieve appropriate samples for further training. Thus, the learning performance is improved by labeling only those data that are expected to be informative. In the proposed method, we first examine the non-trained samples to detect whether they are put in the wrong class. As the output also indicates the distance from the boundary to the sample, we can easily store these mistake samples in a priority queue (min-heap). Then, the stored points that are the most close to the class boundary are picked as the extra training samples.

B. Attribute-Query

Notably, the learning accuracy may be degraded from presence of noise and large number of attributes. In some applications, the benefit of increased accuracy obtained from the additional data proved the expense. The economically reasonable decision is often to use only a subset of the available training samples. Attributes that are not likely to be useful are discarded. For large training datasets, choosing good attributes and samples will have impact on not only the performance but also the overall execution efficiency. In this paper, we use information gain to select subsets of attributes for training. Given a sample set S, the information gain g([A.sub.i]) gives the information about the value of [A.sub.i].

g([A.sub.i])=I(p,n)-[e.sub.i] (1)

where I(p,n) is the total information required to classify a random example and [e.sub.i] is the expected amount of information needed to classify a random example. Given p and n as the numbers of positive and negative examples in the sample set, I(p,n) can be calculated as follows.

I(p,n)= - p/p + n [log.sub.2] p/p + n - p/n + n [log.sub.2] n/p + n (2)

In our experiments, we calculate the information gain of each attribute to select a constant number of attributes in training. The objective is to produce concise and highly informative training sets.

C. Training with Sample-Query and Attribute-Query

A step-by-step description of the proposed algorithm is shown as follows. The learning process is finished either the number of iterations or the root-of-mean-squared-error obtained is over the given threshold.

Step1: Initialize all weights in the neural networks randomly. Give the iteration threshold N and the error threshold RMSE.

Step2: The dataset is S = {[a.sub.i] [member of] [R.sup.n]} where n is the number of selected attributes. Get the initial training samples SS [subset] S by stratified random sampling.

Step3: Train the neural networks by SS. IF (the error E < RMSE) or (the iteration number > N) then EXIT.

Step4: Examine the non-trained samples (S - SS).

Step5: Add some samples that are the most close to the class boundary to SS. Go to Step 3.

Notably, the goal of learning intrusion detection is not to obtain an exact representation of the training data but rather to extract a "model" of attack function. Therefore, the generalization ability for making good prediction to unseen attack is much more important. As in the real world, a passive learner will simply learn the samples going by. However, an active learner will explore the unknown portion of environment to learn extra information. The proposed method with the generalization ability is very suitable for learning network intrusion. Additionally, network log data are usually with large redundancy. The selection of concise subsets of training data can reduce the training time.

IV. Experimental Results

In this paper, we used the dataset applied in KDD intrusion detection contest to evaluate the performance of the proposed approach. Figure 2 show the data distribution of DARPA data attack breakdown of the training set. This dataset is a version of the DARPA intrusion detection evaluation dataset prepared and managed by MIT Lincoln Laboratory. Researchers set up an environment to acquire 9 weeks of raw TCP dump data for a local-area network (LAN) simulating a typical US Air Force LAN. They operated the LAN as if it were a true Air Force environment, but peppered it with multiple attacks. A standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment, was provided. Its objective was to survey and evaluate research in intrusion detection. These intrusions fall into four main categories: Denial of Service (DoS), Probe, Remote to User (R2L), and User to Root (U2R).

In the KDD dataset, the training set contains 494021 samples and the test set contains 311029 samples. Nearly 80% of samples are DoS attacks. Samples of normal connection are about 20%. Other types of attack samples, including U2R (0.011%), R2L (0.228%) and Probe (0.831%), are really rare. It is important to note that the test data is not from the same probability distribution as the training data. It includes 17 specific attack types those are not in the training set. This makes the dataset more realistic.

Network intrusion detection is a two-class classification problem. Its effectiveness can be defined as the ability to make correct class predictions of samples. For each single prediction, there are four different outcomes (known as the confusion matrix for the two-class case shown in Table I). The true-positives and true-negatives are correct classifications. A false-positive occurs when the system classifies an action as anomalous (a possible intrusion) when it is a legitimate action. Although this type of error may not be completely eliminated, a good system should minimize its occurrence to provide useful information to the users. A false-negative occurs when an actual intrusive action has occurred but the system allows it to pass as non-intrusive behavior. In other words, malicious activity is not detected and alerted. It is a more serious error. Notably, in a real world system, the effect of incorrectly detecting abnormal network behavior (false-negative) is different from that of incorrectly predicting normal classification outcome (false-positive). These two kinds of errors will generally have different costs; likewise the two types of correct classification will have different benefits.

In this paper, we examine our proposed method and the original BPN for comparison. The convergence criteria used for training are a root mean squared error (RMSE) less than or equal to 0.01 or a maximum of 10000 iterations. The diagram of training phase is show in Figure 3. In the KDD dataset, each sample has 41 attributes where some attributes may be more useful in distinguishing normal connection from attacks. For large training datasets, choosing good attributes and samples will have impact on not only the performance but also the overall execution efficiency. In this paper, we use information gain to select 11 attributes for training. Attributes that are not likely to be useful are discarded. Without loss of generality, the results of the bagged boosting method are directly obtained from the KDD intrusion detection contest. It uses all samples and attributes in classification.


Our experiments show that the training time of the proposed method is 1447 seconds. However, the training of BPN is over 21746 seconds. Notably, as BPN doesn't converge to a feasible result within the pre-specified training iterations, we can only select 9500 samples (by stratified random sampling) in training. But we use much fewer samples in the proposed method. Our method starts with only 494 samples (about 1% of the KDD training set) in initial training. Then, 16 queried samples are added for further training. All the experimental results are summarized in Table II. Notably, in a real world system, the effect of incorrectly detecting abnormal network behavior (false-negative) is different from that of incorrectly predicting normal classification outcome (false-positive). These two kinds of errors will generally have different costs; likewise the two types of correct classification will have different benefits.

V. Conclusion

In this paper, a new method is introduced for learning intrusion detection. Instead of accepting whatever samples fed, the suitable samples could be selected for training to produce valuable results. Experiments show that the proposed method could gain effective classification with less training cost. It is flexible and powerful. Intrusion detection systems must be capable of distinguishing between normal (not security-critical) and abnormal user activities, to discover malicious attempts in time. However translating user behavior (or a complete user-system session) in a consistent security-related decision is often not that simple--many behavior patterns are indistinguishable and unclear. If uncertain behavior is not considered anomalous, then intrusion activity may not be detected. If uncertain behavior is considered anomalous, then system administrators may be alerted by false alarms. The proposed method which we presented to the connection record that can't be distinguished, predict that will classify it as other in way, and does not force and include in one. So can analyses further about the record of other. Our future works are to extend this concept to develop more learning methods for more real world applications.


[1] A. Brungs, R. Jamieson, "Identification of Legal Issues for Computer Forensics," Information Systems Management, vol. 22, issue 2, Springer 2005, pp. 57-66.

[2] R.I Chang, "Disease Diagnosis using Query-Based Neural Networks," LNCS, 2005.

[3] H. Debar, B. Dorizzi, "An Application of a Recurrent Network to an Intrusion Detection System," International Joint Conference on Neural Networks. 1992, pp. (II) 478-483.

[4] H. Debar, M. Becke, D. Siboni, "A Neural Network Component for an Intrusion Detection System, " IEEE Symposium on Research in Security and Privacy, 1992.

[5] J. Ryan, M. Lin, R. Mikkulainen, "Intrusion Detection with Neural Networks," Advances in Neural Information Processing Systems, vol. 10, 1998, MIT Press.

[6] R. Lippmann, R. Cunningham, "Improving Intrusion Detection performance using Keyword selection and Neural Networks," RAID Proceedings, West Lafayette, Indiana, Sept 1999.

[7] A. Ghosh, A. Schwartzbard, "A study in using Neural Networks for Anomaly and Misuse Detection, " Proceedings of the 8th USENIX Security Symposium, 1999.

[8] J. Cannady, "Artificial Neural Networks for Misuse Detection," Proceedings of the 1998 National Information Systems Security Conference (NISSC'98), 1998.

[9] Z. Zhang, J. Li, C.N. Manikopoulos, J. Jorgenson, J. Ucles, "Hide: a hierarchical network intrusion detection system using statistical preprocessing and neural network classification," IEEE Workshop on Information Assurance and Security, 2001, pp. 85-90.

[10] L. Valiant, "A theory of the learnable," Communications of the Association for Computing Machinery, vol. 27, no.11, 1984, pp.1134-1142.

[11] J.N. Hwang, J.J. Choi, S. Oh, R.J. Marks II, "Query-based learning applied to partially trained multilayer perceptrons," IEEE Trans. on Neural Networks, Vol. 2, pp.131-136, 1991.

[12] R.I Chang, P.Y. Hsiao, "Unsupervised query-based learning of neural networks using selective-attention and self-regulation," IEEE Trans. on Neural Networks, vol.8, no.2, 1997, pp.205-217.

[13] T. Oates, D. Jensen, "The Effects of Training Set Size on Decision Tree Complexity," International Conference on Machine Learning, 1997, pp.254-262.

[14] R. Kohavi, F. Provost, "Glossary of Terms," Machine Learning, vol. 30, no. 2-3, 1998, pp. 271-274.

Author Biographies

Ray-I Chang He received the Ph.D. degree in electrical engineering and computer science from National Chiao Tung University, Hsinchu, Taiwan, R.O.C., in 1996. He then joined the Computer Systems and Communications (CSCL) Laboratory, Institute of Information Science, Academia Sinica. In 2003, he joined the Department of Engineering Science, National Taiwan University, Taipei, Taiwan.

Liang-Bin Lai In 1999, he received the B.S. degree in information management from the National Yunlin University of Science and Technology, Yunlin, Taiwan. Since 2004 he has been studying toward the Ph.D. degree.

His research interests include neural networks, fuzzy logic, data mining, and database applications.

Ray-I Chang, Liang-Bin Lai, Wen-De Su *, Jen-Chieh Wang *, Jen-Shiang Kouh

Department of Engineering Science and Ocean Engineering, National Taiwan University No.1, Sec. 4, Roosevelt Road, Taipei 106, Taiwan {rayichang, d93525009, kouhjsh}

* Information Management Center Chung-Shan Institute of Science and Technology Armaments Bureau, Ministry of National Defense Longtan, Taiwan, ROC
Table 1. Confusion Matrix [14]

 Predicted Negative Positive
Negative a b
Positive c d
 Recall rate: d/(c + d)
 Precision rate: d/(b + d)
 False-Positive rate: b/(a + b)
 False-Negative rate c/(c + d)

Table 2. The experimental result in confusion matrix.

Actual Predicted Probe DoS U2R R2L

Probe ours 2523 81 509 8
 BPN 2515 181 0 0
DoS ours 564 227080 126 0
 BPN 25 222153 0 0
U2R ours 25 0 83 8
 BPN 0 0 0 0
R2L ours 14 479 147 6660
 BPN 4 2 0 0
Precision rate ours 75.8% 99.1% 9.2% 76.5%
 BPN 91.1% 99.6% 00.0% 00.0%
Recall rate ours 60.7% 98.8% 36.4% 41.2%
 BPN 60.3% 96.7% 00.0% 00.0%
False-Positive rate ours 24.2% 00.9% 90.8% 23.5%
 BPN 8.96% 0.42% 00.0% 00.0%
False-Negative rate ours 39.4% 1.20% 63.6% 58.6%
 BPN 39.6% 03.4% 100.0% 100.0%

Figure 2. The data distribution of DARPA data attack

smurf 56.838%
Neptune 21.700%
Normal 19.691%
others 1.772%
back 0.446%
satan 0.332%
ipsweep 0.252%
portsweep 0.211%
warezclient 0.206%
teardrop 0.198%
pod 0.053%
nmap 0.047%
guess_passwd 0.011%
buffer_overflow 0.006%
land 0.004%
warezmaster 0.004%
imap 0.002%
rootkit 0.002%
loadmodule 0.002%
ftp_write 0.001%
multihop 0.001%
phf 0.001%
perl 0.001%
spy 0.000%

Note: Table made from pie chart.
COPYRIGHT 2007 Research India Publications
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2007 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Chang, Ray-I; Lai, Liang-Bin; Su, Wen-De; Wang, Jen-Chieh; Kouh, Jen-Shiang
Publication:International Journal of Computational Intelligence Research
Geographic Code:9TAIW
Date:Jan 1, 2007
Previous Article:New region of interest image coding using general layered bitplane shift for medical image compression.
Next Article:Alpha-beta bidirectional associative memories.

Related Articles
St. Bernard software teams with Internet Security Systems.
Jargon watch.
MMDB 2004; proceedings.
A neural network based software retrieval system with fuzzy-related thesaurus.
Application of neural networks to business bankruptcy analysis in Thailand.
Classification of fuzzy-based information using improved backpropagation algorithm of artificial neural networks.
Web services security and e-business.
Beyond compliance: protecting sensitive data on the mainframe environment: in the light of the British Government data loss, part two of a rather...
A multi layered approach to prevent data leakage.
Predicting coronary artery disease using different artificial neural network models/koroner arter hastaliginin degisik yapay sinir agi modelleri lie...

Terms of use | Copyright © 2017 Farlex, Inc. | Feedback | For webmasters