Printer Friendly

Machine-Learning Approach to Optimize SMOTE Ratio in Class Imbalance Dataset for Intrusion Detection.

1. Introduction

The early IDS (intrusion detection system) [1] is divided into the host-based IDS (HIDS) and the network-based IDS (NIDS). HIDS has the advantage of analyzing the system log and resource usage information by the host and user. However, installing an IDS in each host increases the management points and wastes more resources. If network-level packet analysis is not possible and the attacker takes control of the system, the IDS may be interrupted. NIDS has advantages that it does not need to install an IDS on each host, and NIDS can perform analysis at the entire network level. However, there is a disadvantage in which it is possible to confirm only the attack via the IDS, and it is difficult to confirm the attack attempt at the system level. In early 2003, the IDS was losing the trust of users due to the problem of generating false positives. The causes of false positives are due to the development of erroneous rules, traffic irregularities, and limitations of pattern matching tests. Even though the IDS problem has not been solved to date, "pattern matching" is still being used as a basis for security solutions.

Intrusion detection attacks [2] are divided into misuse detection and anomaly detection. In misuse detection, detected attacks are compared with existing signatures in the database to determine whether they are intrusions. While misuse detection detects only the known attacks, anomaly detection detects a new type of attack that has a pattern different from the normal traffic and the known attack types.

Many researchers have studied intrusion detection. In general, researchers attempted to distinguish the normal class from attack classes using the publicly available intrusion detection evaluation dataset and to identify the exact attack type. However, the classification of rare classes in a huge real-time dataset requires a long computation time, and then it is difficult to achieve good efficiency. It is necessary to create and test many experimental datasets to improve classification performance by adjusting the class ratio.

In this paper, we present a novel method that optimally adjusts the SMOTE [3] ratios for rare classes. The number of cases for the tuple of SMOTE ratios is too large to test all the cases. For that reason, we propose the following efficient method. We randomly generated some tuples of SMOTE ratios and used these tuples to create a model using a support vector regression (SVR) [4]. We input a number of tuples for SMOTE ratios to the SVR model, and we chose the best tuple of SMOTE ratios. Experimental results using the proposed method were significantly better than those of the previous approach [5].

The contributions we make through the proposed method are given as follows. We suggest how to find the SMOTE ratios that show good performance with very few tests. Hence, we dramatically reduce the amount of computations required to find the best SMOTE ratios. We are sure that the proposed method is helpful for the study of class imbalances.

The remainder of this paper is organized as follows. Section 2 explains the related works on the KDD CUP 1999 dataset [6] and class imbalances. In Section 3, we present the background of this research. In Section 4, we suggest a new method by creating a numerical model using sampled SMOTE ratios. In Section 5, we explain our experimental environments, procedures, and results. The paper ends with our concluding remarks in Section 6.

2. Related Work

2.1. KDD Dataset. Leung and Leckie [7] studied anomaly detection using unsupervised learning algorithms on the KDD CUP 1999 intrusion detection dataset. These researchers proposed density-based clustering and grid-based clustering algorithms. In density-based clustering, a cluster includes a minimum number of data points. The approach has the advantage of filtering outliers or finding clusters with arbitrary shapes. In the grid-based approach, all clustering operations are conducted on a grid structure. The method has the advantage of a fast computing speed. With the method, a classifier can learn from unlabeled data and detect new types of attacks that were previously unseen. The experimental results showed that the accuracy of their method is similar to one of existing methods, and the method has several advantages in terms of computational complexity.

Meng [8] studied intrusion detection machine-learning techniques on the KDD CUP 1999 dataset. There have been many studies using popular methods, such as artificial neural networks, SVM [9], and decision trees. However, these methods were rarely used in large-scale real intrusion detection systems. This researcher aimed at practical anomaly detection and conducted a comparative study with artificial neural networks, SVMs, and decision trees using the same environment as previous studies. In the analysis of the experimental results, the intrusion detection system with machine-learning techniques showed a high dependency on the test environment, and this researcher concluded that it was important to find a suitable method for applying machine-learning techniques to real environments.

Davis and Clark [10] reviewed the data preprocessing techniques used in anomaly-based network intrusion detection systems. The research focused on network traffic analysis and feature extraction/selection. Most of studies on NIDS dealt with the TCP/IP packet headers of network traffic. Time-based statistics can be derived from the headers to detect network scans, network worms, and DoS attacks. Recent, full service responses are analyzed to detect attacks targeting clients. This focuses on which attack classes can be detected by the reviewed methods. This review shows the trends that scrutinize packets to extract or select the most relevant features through targeted content parsing. These contextsensitive features are required to detect network attacks.

Staudemeyer and Omlin [11] used a long short-term memory recurrent neural network (LSTM-RNN) to evaluate the classification performance using the KDD CUP 1999 dataset. LSTM networks can learn "memory" and create a model with time series data. The LSTM is trained and tested on their modified KDD CUP 1999 dataset. The LSTM network structure and parameters were obtained through experiments. Several performance measures were used to analyze experimental results. Their results showed that LSTM-RNN can learn all the unknown attack classes in the training dataset. Furthermore, they found that both receiver operating characteristic (ROC) curves and area under the curve (AUC) were well suited for evaluating LSTM-RNN.

Kim et al. [12] proposed a system-call-language-modeling method based on LSTM for designing an anomaly-based host intrusion detection system. These researchers used an ensemble method to solve the false-alarm rates problem that was common in conventional intrusion detection systems. The method can effectively learn the semantic meaning and interactions of each system call that existing methods cannot handle. These researchers demonstrated the validity and effectiveness of their method through several tests on publicly available benchmark datasets, and their method has an advantage in that it is easy to transplant to other systems.

Kim et al. [13] investigated artificial intelligence intrusion detection systems that used the deep neural network (DNN) and conducted experiments on the KDD CUP 1999 dataset. Data preprocessing (such as data transformation and normalization was conducted) was used to input the dataset into the DNN model. When a learning model was created, the DNN was used for data refinement. The full dataset was used to verify the learning model. Performance measures, such as the accuracy, detection rate, and false-positive rate, were used to verify the detection efficiency of the DNN model, and the model showed good performance for intrusion detection.

Le et al. [14] studied deep-learning algorithms to solve the problem of machine-learning techniques (such as SVM and fc-NN) that had high false-positive rates in intrusion detection systems. They found six optimizers that are applicable to the LSTM-RNN model to be the best suited for intrusion detection systems. The LSTM results using the Nadam optimizer were better than previous approaches, with an accuracy of 97.54%, a detection rate of 98.95%, and a false-positive rate of 9.98%. In Table 1, the studies related to intrusion detection are summarized.

Seo [5] tried to adjust the class imbalance of train data to detect attacks in the KDD 1999 intrusion dataset. He tested with machine-learning algorithms to find efficient SMOTE ratios of rare classes such as U2R, R2L, and Probe. He studied to improve the performance of classification focusing on detection of rare classes. The number of instances of rare classes in the train data was increased by 12, 9, and 1.5 times, respectively. The recall metrics of k-NN tests were increased to 0.11 in U2R class and 0.02 in R2L class. The metrics of SVM tests were increased to 0.02 in U2R class and 0.08 in R2L class, and those of decision tree tests were increased to 0.25.

2.2. Class Imbalance. In the study of Japkowicz [15], most previously designed concept-learning systems assume that a training dataset is generally well balanced. This assumption is not necessarily correct. In practice, most instances represent one class, and only a small number of instances represent other ones. These researchers tried to experimentally demonstrate that a class imbalance degrades the performance of standard classifiers. These researchers compared the performance of several methods that were previously proposed by other researchers.

Japkowicz and Stephen [16] studied class imbalance. Class imbalance has been reported to degrade the performance of some standard classifiers. They conducted a systematic study by answering the following three problems. First, they attempted to understand the concept complexity, the size of the training set, and the class imbalance level. Second, they discussed several basic resampling or cost-modifying methods to compare the efficiency of the previously proposed class imbalance problems. Finally, they conducted studies with the assumption that class imbalance problems also affected other classification systems, such as decision trees, neural networks, and SVMs.

Chawla et al. [17] studied the SMOTEBoost algorithm. In data mining, most of the datasets have the class imbalance problem, and data mining tools learn from imbalanced datasets. The classifier, which learns from a minority class with very few instances, tends to be biased towards a high accuracy in the prediction of the majority class. SMOTE is used in the design of classifiers to train unbalanced datasets. They presented a new approach to learn from imbalanced datasets by combining the SMOTE algorithm and the boosting procedure. Unlike standard boosting in which the same weight is given to all misclassified examples, SMOTEBoost generates synthetic examples from minority classes. SMOTEBoost indirectly changes the weight by updating and compensating for the skewed distribution. In the experiments with SMOTEBoost applied to several datasets with a high or moderate class imbalance, the classification performance for the minority class and the overall F-measure was improved.

Drummond and Holte [18] used two commonly used sampling methods for applying machine learning to imbalanced classes and misclassification costs. They adopted a performance analysis technique called cost curves to explore the interaction of oversampling and undersampling with the decision tree classifier C4.5. They showed that applying C4.5 to undersampling could establish a reasonable standard for comparing algorithms. However, it is recommended that the cheapest cost classifier becomes a part of the standard since it can be better than undersampling for relatively modest costs. Oversampling has little influence on the sensitivity and the misclassification costs have no significant effect on performance.

Zhou and Liu [19] demonstrated the effect of sampling and threshold-moving in training cost-sensitive neural networks. Both oversampling and undersampling were considered. These techniques modified the distribution of training data so that the costs of the instances were explicitly conveyed by the appearances of the instances. Threshold-moving moves the output threshold towards inexpensive classes to improve classification performance. The hardensemble and soft-ensemble are used for the experiments. In hard-ensembles and soft-ensembles, all classifiers vote on each class and return the class that receives the most votes. The difference between the two ensembles is that hard-ensemble uses binary votes and soft-ensemble uses real-value votes. Twenty-one UCI datasets and actual datasets were used in their experiments. The experimental results showed that as the number of classes increases, the degree of class imbalance worsens and the efficiency of classification deteriorates. Threshold-moving and the soft-ensemble showed relatively good performance in training cost-sensitive neural networks.

Liu et al. [9] used undersampling to solve the class imbalance problem. Undersampling is a very effective method to mitigate class imbalance using only a subset of the majority class. The disadvantage of the method is that instances of majority classes are ignored. They presented two algorithms to overcome the drawback. First, the EasyEnsemble algorithm samples several subsets from the majority class, trains a learner using each subset, and then combines the outputs of the learners. EasyEnsemble internally uses the AdaBoost ensemble. The BalanceCascade algorithm trains learners in sequence. At each step, instances of the majority class that are correctly classified by the current trained learners are removed from further consideration. The experimental results showed that both methods produce better solutions than the conventional class imbalance.

Burez and Van den Poel [20] attempted to solve the class imbalance problem to predict customer churn. Customer churn is caused by a customer who changes service provider. Customer churn is a highly rare event in the service industry, but it is a notably interesting and informative research area. However, the class imbalance problem in the context of data mining has not paid it considerable attention until recently. They studied how class imbalance can be better handled in churn prediction. They have conducted studies to improve the performance of random sampling and undersampling with appropriate evaluation matrices, such as AUC and lift. They compared gradient boosting, weighted random forest modeling, and some standard modeling techniques. They studied the performance of both random and advanced undersampling. They compared the specific modeling techniques of gradient boosting and weighted random forests with some standard techniques. In their experiment, the use of undersampling improved the prediction accuracy and the AUC values.

Seiffert et al. [21] had stated that class imbalance was a common problem in various applications. Several techniques had been used to mitigate class imbalance problems. They used a hybrid sampling/boosting algorithm called RUSBoost to train skewed training dataset. The algorithm was simpler and faster as an alternative of SMOTEBoost. They evaluated the performance of RUSBoost, SMOTEBoost, random undersampling, SMOTE, and AdaBoost. They chose fifteen datasets in various applications and then conducted experiments with four learners (C4.5D, C4.5N, naive Bayes (NB), and repeated incremental pruning) to produce error reduction (RIPPER) over four evaluation matrices. Both RUSBoost and SMOTEBoost were better than other methods, and RUSBoost performed equal to or better than SMOTEBoost.

Horng et al. [22] proposed an SVM-based intrusion detection system. The system combines a hierarchical clustering algorithm, a simple feature selection procedure, and an SVM technique. The clustering algorithm provided the SVM with fewer, abstracted, and higher qualified training instances. It was able to shorten the training time and improve the performance of a resultant SVM. The obtained SVM model could classify the network traffic data more accurately through the simple feature selection procedure. The KDD Cup 1999 dataset was used to evaluate the proposed system. Compared with other intrusion detection systems that are based on the same dataset, this system showed better performance in the detection of DoS and Probe attacks, and the best performance in overall accuracy. In Table 2, the studies related to class imbalance are summarized.

3. Background

3.1. KDD Dataset. The KDD CUP 1999 dataset [6] used in our experiments is a modification of data generated by the DARPA (Defense Advanced Research Projects Agency) intrusion detection evaluation program in 1988. The DARPA dataset is intercepted data that contain a wide range of attacks generated in a military network environment. The dataset has greatly contributed to the investigation and evaluation of intrusion detection. The dataset has been prepared and managed by MIT's Lincoln laboratory. In 1999, the modified DARPA dataset was used in the KDD CUP 1999 intrusion detection competition. MIT's Lincoln laboratory has a similar experimental environment to the typical U. S. Air Force LAN (local area network). Raw TCP dump data were generated over nine weeks. As in a real Air Force environment, the LAN was activated and various attacks were executed. However, there was a disadvantage in that there was no noise in the real data. However, the KDD CUP 1999 dataset served as a testbed to overcome the vulnerabilities of signature-based IDSs in detecting new attack types and attracted the attention of many researchers. The KDD CUP 1999 dataset is most widely used for the evaluation of such a system. There are many previous approaches using the dataset and it will be possible to compare the approaches with a new method.

Table 3 represents the files in the KDD CUP 1999 dataset and the details for those. The files "kddcup.data_10_percent.gz" and "corrected.gz" are used as training data and test data, respectively. The training data are compressed binary TCP dump data collected over approximately seven weeks with approximately 5 million connection records. The testing data are collected over approximately two weeks. They are composed of approximately 2 million connection records. Connection records are a collection of TCP packets flowing from the source IP to the destination IP, and these are classified into a normal or attack class. In the case of connection records belonging to an attack class, these are represented by exactly one specific attack type. The size of each connection record is approximately 100 bytes. Attack types are categorized into four classes, such as DoS, R2L, U2R, and Probe, as shown in Table 4.

3.2. SMOTE: Synthetic Minority Oversampling Technique. SMOTE [3] is a method of generating new instances using existing ones from rare or minority class. First, we identify the fc-nearest neighbors in a class with a small number of instances and calculate the differences between a sample and these fc neighbors. We multiply the differences by an arbitrary value between 0 and 1 and get a resultant value. Next, an instance that is generated using the resultant value is added to the training data. As a result, SMOTE works by adding any points that slightly move existing instances around its neighbors. In the aspect of increasing the number of instances in rare classes, SMOTE is similar to random oversampling. However, it does not regenerate the same instance. It creates a new instance by appropriately combining existing instances, thus making it possible to avoid the disadvantage of overfitting to a certain degree.

4. Modeling

4.1. Problem Definition. We attempt to maximize classification performance of the KDD CUP 1999 intrusion detection dataset that has class imbalance. The dataset has severe class imbalance. Therefore, data preprocessing for adjusting the class ratio is required to alleviate the imbalance. The class imbalance can be adjusted using undersampling, oversampling, and SMOTE techniques. We use the SMOTE technique. All tuples of SMOTE ratios should be tested to optimize the ratios of each class. However, there are time and cost constraints to conduct experiments on all cases. Therefore, we try to find the tuple of SMOTE ratios that shows the best performance by experimenting with few tuples of SMOTE ratios. Formula (1) represents the method to calculate class imbalance ratio of each class. Figure 1 shows the structure of the dataset which is used in the proposed method. Table 5 shows class imbalance ratios of Train A, Train B which is the first half of Train A, Validation which is the second half of Train A, and Test. Train A is the original train data. Train B and Validation in Table 5 are basically the same. Train B in Table 5 shows the instances after applying the SMOTE ratios in Table 6. We define the three classes of U2R, R2L, and Probe as rare classes because the classes have relatively fewer instances than other classes.

Label cardinality of D is the average number of labels of the examples in D:

[mathematical expression not reproducible]. (1)

4.2. Proposed Method. We attempt to optimize the SMOTE ratios of rare classes to mitigate the class imbalance. It is difficult to test all tuples of SMOTE ratios in a short period of time. Therefore, we attempt to identify an efficient method with a small number of experiments and reduce computation time.

We create an SVR model with a small number of experiments and try to get the best tuple of the SMOTE ratios by inputting enough tuples of SMOTE ratios into the model. We also verify the results through experiments. The numbers of 100 and 1,000,000, which are used in the experiments, are decided by considering computation time and 100 instances are generated randomly from a uniform distribution. We use random sampling method instead of grid one. If we can use more than 100 instances, grid sampling is not bad, but the method is not appropriate to sample very few instances uniformly. We set the ranges for the rare classes through preliminary experiments, as shown in Table 7.

We randomly generate 100 tuples of SMOTE ratios within the maximum ranges of Table 7. We conduct experiments by inputting the 100 tuples into an SVM classifier. As results, five recall values are given to each of the 100 tuples. An SVR model is created using the 100 tuples and the root mean square of the recall values. We randomly generate 1,000,000 tuples of SMOTE ratios and input them into the SVR model to derive the optimal solution. We conduct experiments to verify the quality of the best tuple.

Formula (2) represents procedure of the proposed method. The method shows good performance with very few tests and significantly reduces the amount of computations which are required to find the best SMOTE ratios. Figure 2 represents its pseudocode.

Procedures of the proposed methods as follows:

(1) Set the ranges for the rare classes through preliminary experiments, as shown in Table 7. The ranges were searched by inputting successive [2.sup.t] where t is a nonnegative integer.

(2) Generate randomly few tuples of SMOTE ratios from a uniform distribution (independent variable).

(3) After drawing recall metrics by giving the tuples into an SVM classifier, calculate RMS with the metrics (dependent variable).

(4) Create an SVR model [4] with the tuples and RMS.

(5) Find the best tuple among a lot of tuples, which are generated randomly from a uniform distribution, through the SVR model.

// N: normal, U: change to U2R, R: R2L,

D: DoS, P: change to Probe, m: the number of classes,

// [U.sub.max], [R.sub.max], [P.sub.max]: maximum range of each class,

// [T.sub.model]: tuples of SMOTE ratios required for model creation,

// [T.sub.eval]: tuples of SMOTE ratios required to evaluate the model,

// [t.sub.best]: the best tuple among [T.sub.eval],

[mathematical expression not reproducible]. (2)

Figure 3 shows a hierarchy of the methods in LibSVM. Table 8 represents the time complexity of SVM. Table 9 shows the time complexity of the proposed methods.

5. Experiments

We randomly generate 100 tuples of SMOTE ratios and use the tuples to create an SVR model. We find the best tuple by giving 1,000,000 randomly generated tuples of SMOTE ratios into the SVR model. The experiment results with the best tuple were improved by approximately 20 percent compared with the previous approach [5]. The SVR model was generated using only 100 tuples of SMOTE ratios. As with the SVR model, the computation time was dramatically reduced and the tuple of SMOTE ratios with the highest efficiency was found.

Formula (3) gives the root mean square (RMS) using the recall values, which are the results of experiments with the 100 tuples of the SMOTE ratios of the U2R, R2L, and Probe classes. The 100 tuples are randomly generated within the range of Table 7. Variable N is the normal, U is the U2R, R is the R2L, D is the DoS, and P is the Probe class. Table 10 shows parameters of SVR and SVM. Table 11 shows parameters of RNN-LSTM. Table 12 shows the measures drawn by creating an SVR model using the 100 tuples of SMOTE ratios and the RMS. The correlation coefficient was more than 0.7, which indicates a strong positive linear relationship. The RMSE was 0.006, which means that the difference between the expected value and the actual one is very small. Since the root relative squared error is a measure that compares the standard deviation of the actual values with the differences between the predicted and actual values, it is not a significant factor in evaluating the performance of the model. Table 13 shows the recall metrics of experiments by the best tuple. The best tuple represents 1,000 times for the U2R, 451 times for the R2L, and 1 time for the Probe, as shown in Table 6. Table 6 shows the difference of SMOTE ratios between the proposed method and the previous one. The proposed method searches an optimal solution among a lot of SMOTE ratios, but the previous one uses only fixed SMOTE ratios.

[mathematical expression not reproducible]. (3)

Figure 4 compares recall metrics of the proposed method with that of the previous approach [5]. RNN-LSTM is slightly superior to other methods. In the SVM tests, the performances of the U2R, R2L, and Probe classes were improved by approximately 22.6%, 58.9%, and 2.3%, respectively. Figure 5 represents SMOTE ratios of the U2R, R2L, and Probe used to create the SVR model. Table 14 [22] compares the proposed methods with other work by the detection rate. Figure 6 shows a graph for the RMS of the results obtained by inputting 1,000,000 tuples of SMOTE ratios into the SVR model. The reason for defining the RMS of Formula (3) as the objective value is to make the recall values of rare classes well reflected by experimental results. An RMS of the best tuple is about 0.979. Table 15 shows recall values of previous work [5].
Figure 2: Pseudocode of the proposed method.

// SMOTE is not applied to Normal and DoS classes
// D: preprocessed KDD CUP 1999 dataset
// [D.sub.train]: the first 50% of D, [D.sub.test]: the last 50% of D
// [D.sub.train_b]: the first 50% of [D.sub.train],
   [D.sub.validation]: the last 50% of [D.sub.train]
// [S.sub.rand_1]: 100 of randomly generated SMOTE ratios
// [S.sub.RMS]: root mean squared values of recall metrics drawn
      from a test using [D.sub.train_b]
      with [S.sub.rand_1] applied and Validation
// [SVR.sub.model]: a model created by [S.sub.rand_1] and [S.sub.RMS]
// [S.sub.rand_2]: 1,000,000 of randomly generated SMOTE ratios
// O: optimal SMOTE ratio

([D.sub.train], [D.sub.test]) [left arrow] split D into 50:50
([D.sub.train_b], [D.sub.validation]) [left arrow] split [D.sub.train]
  into 50:50
[S.sub.RMS] [left arrow] root mean square using [D.sub.train_b] with
  [S.sub.rand_1] applied and [D.sub.validation]
[SVR.sub.model] [left arrow] an SVR model created using [S.sub.rand_1]
  and [S.sub.RMS]
O [left arrow] argmax([SVR.sub.model]([S.sub.rand_2] ))
Do experiments using [D.sub.train] with O applied and [D.sub.test]


Tables 16 and 17 show confusion matrix of SVM and RNN-LSTM, respectively. We conducted experiments with SVM and decision tree on the three dataset combinations of (Train B, Validation), (Train B, Test), and (Train A, Test) datasets. The results showed that SVM was better than the decision tree. Table 16 represents recall values of the previous methods and SVM was superior to other work. Parameters and datasets of the proposed SVM test is identical to those of the previous one.

6. Conclusions

In this study, we have attempted to mitigate the problem of class imbalance in the KDD CUP 1999 intrusion detection dataset. As results, we obtained the best SMOTE ratios of rare classes, reduced the number of experiments by creating an SVR model, and had a significant performance improvement over the previous approach [5]. The best SMOTE ratios of rare classes drawn by the SVR model were 1,000 times for U2R, 451 times for R2L, and 1 time for Probe. The recall values for rare classes were 0.615 for the U2R in RNN LSTM, 0.302 for the R2L in SVM, and 0.997 for the Probe in decision tree, respectively.

We proposed a new method to find the best SMOTE ratios that have high efficiency with a small number of experiments. The proposed method dramatically reduced the number of adjustments for classes. Therefore, the computation time required for the experiments could be shortened.

In future, it will be meaningful to investigate the change of test results according to the number of tuples of SMOTE ratios. We can identify better SMOTE ratios using the models created by other machine-learning techniques. Also, we will apply evolutionary computations or other meta-heuristic algorithms to identify the best tuple.

https://doi.org/10.1155/2018/9704672

Data Availability

The KDD CUP 1999 data used to support the findings of this study are available at http://kdd.ics.uci.edu/databases/ kddcup99/kddcup99.html.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by Wonkwang University in 2017.

References

[1] Intrusion Detection System, 2018, https://en.wikipedia.org/ wiki/Intrusion_detection_system.

[2] R. C. Staudemeyer, "Applying long short-term memory recurrent neural networks to intrusion detection," South African Computer Journal, vol. 56, no. 1, pp. 136-154, 2015.

[3] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.

[4] A. J. Smola and B. Scholkopf, "A tutorial on support vector regression," Statistics and computing, vol. 14, no. 3, pp. 199-222, 2004.

[5] J. H. Seo, "A study on the performance evaluation of unbalanced intrusion detection dataset classification based on machine learning," Journal of the Korean Institute of Intelligence Systems, vol. 27, no. 5, pp. 466-474, 2017.

[6] S. J. Stolfo, "KDD cup 1999 dataset, UCI KDD Repository," 1999, http://kdd.ics.uci.edu.

[7] K. Leung and C. Leckie, "Unsupervised anomaly detection in network intrusion detection using clusters," in Proceedings of Twenty-Eighth Australasian Computer Science Conference, vol. 38, pp. 333-342, Newcastle, Australia, January 2005.

[8] Y. X. Meng, "The practice on using machine learning for network anomaly intrusion detection," in Proceedings of International Conference on Machine Learning and Cybernetics, vol. 2, pp. 576-581, IEEE, Guilin, China, July 2011.

[9] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf, "Support vector machines," IEEE Intelligent Systems and their Applications, vol. 13, no. 4, pp. 18-28, 1998.

[10] J. J. Davis and A. J. Clark, "Data preprocessing for anomaly based network intrusion detection: a review," Computers & Security, vol. 30, no. 6-7, pp. 353-375, 2011.

[11] R. C. Staudemeyer and C. W. Omlin, "Evaluating performance of long short-term memory recurrent neural networks on intrusion detection data," in Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference on--SAICSIT'13, pp. 218-224, ACM Press, London, October 2013.

[12] G. Kim, H. Yi, J. Lee, Y. Paek, and S. Yoon, "LSTM-based system-call language modeling and robust ensemble method for designing host-based intrusion detection systems," 2016, http://arxiv.org/abs/1611.01726.

[13] J. Kim, N. Shin, S. Y. Jo, and S. H. Kim, "Method of intrusion detection using deep neural network," in Proceedings of IEEE International Conference on Big Data and Smart Computing, pp. 313-316, IEEE, Jeju-do, South Korea, February 2017.

[14] T. T. H. Le, J. Kim, and H. Kim, "An effective intrusion detection classifier using long short-term memory with gradient descent optimization," in Proceedings of International

Conference on Platform Technology and Service (PlatCon), pp. 1-6, Busan, South Korea, September 2017.

[15] N. Japkowicz, "The class imbalance problem: significance and strategies," in Proceedings of the 2000 International Conference on Artificial Intelligence, pp. 111-117, 2000.

[16] N. Japkowicz and S. Stephen, "The class imbalance problem: a systematic study," Intelligent Data Analysis, vol. 6, pp. 429-449, 2002.

[17] N. Chawla, A. Lazarevic, L. Hall, and K. Bowyer, "SMOTEBoost: Improving prediction of the minority class in boosting, Knowledge Discovery in Databases: PKDD," in Proceedings of European Conference on Principles of Data Mining and Knowledge Discovery, pp. 107-119, 2003.

[18] C. Drummond and R. C. Holte, "C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling," in Proceedings of Workshop on Learning from Imbalanced Datasets II, pp. 1-8, Washington DC, 2003.

[19] Z. H. Zhou and X. Y. Liu, "Training cost-sensitive neural networks with methods addressing the class imbalance problem," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 1, pp. 63-77, 2006.

[20] J. Burez and D. Van den Poel, "Handling class imbalance in customer churn prediction," Expert Systems with Applications, vol. 36, no. 3, pp. 4626-4636, 2009.

[21] C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse, and A. Napolitano, "RUSBoost: a hybrid approach to alleviating class imbalance," IEEE Transactions on Systems, Man, and Cybernetics--Part A: Systems and Humans, vol. 40, no. 1, pp. 185-197, 2010.

[22] S. J. Horng, M. Y. Su, Y. H. Chen et al., "A novel intrusion detection system based on hierarchical clustering and support vector machines," Expert systems with Applications, vol. 38, no. 1, pp. 306-313, 2011.

[23] A. Abdiansah and R. Wardoyo, "Time complexity analysis of support vector machines (SVM) in LibSVM," International Journal Computer Application, vol. 128, no. 3, pp. 28-34, 2015.

[24] A. N. Toosi and M. Kahani, "A new approach to intrusion detection based on an evolutionary soft computing model using neuro-fuzzy classifiers," Computer Communications, vol. 30, no. 10, pp. 2201-2212, 2007.

[25] B. Pfahringer, "Winning the KDD99 classification cup: bagged boosting," ACM SIGKDD Explorations Newsletter, vol. 1, no. 2, pp. 65-66, 2000.

[26] I. Levin, "KDD-99 classifier learning contest: LLSoft's results overview," ACM SIGKDD Explorations, vol. 1, no. 2, pp. 67-75, 2000.

[27] M. R. Sabhnani and G. Serpen, "Application of machine learning algorithms to KDD intrusion detection dataset with in misuse detection context," in Proceedings of the International Conference on Machine Learning: Models, Technologies, and Applications, pp. 209-215, Las Vegas, NV, USA, June 2003.

[28] W. Xuren, H. Famei, and X. Rongsheng, "Modeling intrusion detection system by discovering association rule in rough set theory framework," in Proceedings of International Conference on Computational Intelligence for Modelling, Control and Automation, 2006 and International Conference on Intelligent Agents, Web Technologies and Internet Commerce, p. 24, Vienna, Austria, November-December 2006.

Jae-Hyun Seo [ID] (1) and Yong-Hyuk Kim [ID] (2)

(1) Department of Computer Science and Engineering, Wonkwang University, 460 Iksandae-ro, Iksan-si, Jeonbuk 54649, Republic of Korea

(2) School of Software, Kwangwoon University, 20 Kwangwoon-ro, Nowon-gu, Seoul 01897, Republic of Korea Correspondence should be addressed to Yong-Hyuk Kim; yhdfly@kw.ac.kr

Received 30 April 2018; Revised 6 August 2018; Accepted 2 October 2018; Published 1 November 2018

Academic Editor: Giosue Lo Bosco

Caption: Figure 1: The structure of the dataset used in the proposed method.

Caption: Figure 3: Hierarchy of the methods in LibSVM [23].

Caption: Figure 5: SMOTE ratios of the U2R, R2L, and Probe used to create the SVR model (RMS values). (a) U2R vs. R2L. (b) R2L vs. Probe. (c) Probe vs. U2R.

Caption: Figure 6: Bar chart representing RMS for 1,000,000 tuples of SMOTE ratios which are randomly generated. The red dots represent SMOTE ratios with high RMS value.
Table 1: Related work with KDD CUP 1999.

Authors                Year                Method

Leung and Leckie [7]   2005     Density-based and grid-based
                                         clustering
Meng [8]               2011         SVM, neural networks,
                                      and decision tree
Davis and Clark [10]   2011          Data preprocessing
Staudemeyer and        2013               LSTM-RNN
Omlin [11]
Kim et al. [12]        2016           LSTM and ensemble
Kim et al. [13]        2017                  DNN
Le et al. [14]         2017                  DNN
Seo [5]                2017     SVM, k-NN, and decision tree

Table 2: Related work with class imbalance.

Authors                       Year                 Method

Japkowicz [15]                2000      Multilayer perceptron (MLP)
Japkowicz and Stephen [16]    2002         Decision tree and SVM
Chawla et al. [17]            2003          SMOTE and SMOTEBoost
Drummond and                  2003             Decision tree
Holte [18]
Zhou and Liu [19]             2006     Cost-sensitive neural networks
Liu et al. [9]                2009              EasyEnsemble
Burez and Van den Poel [20]   2009     Gradient boosting and random
                                                   forest
Seiffert et al. [21]          2010   RUSBoost, SMOTEBoost, SMOTE, and
                                                  AdaBoost
Honrng et al. [22]            2011    SVM and hierarchical clustering

Table 3: Train and test dataset with labels of KDD CUP 1999
dataset.

Dataset                                    Details

Kddcup.data.gz                    Training dataset (743 MB)
Kddcup.data_10_percent.gz          10% subset of training
(23 attack types)                      dataset (75 MB)
Corrected.gz (23 attack types)      Test dataset (45 MB)

Table 4: Five main categories of KDD CUP 1999 dataset.

Attacks                         Descriptions

Normal                         Normal traffic

DoS                  Denial of service, e.g., syn flood

R2L              Unauthorized access from a remote machine,
                          e.g., guessing password

U2R                     Unauthorized access to local
                 superuser (root) privileges, e.g., various
                           buffer overflow attacks

Probing      Surveillance and other probing, e.g., port scanning

Table 5: Class imbalance ratios of Train A, Train B,
Validation, and Test dataset.

Classes    # Train A     Ratio (%)      #Train B     Ratio (%)

Normal      97,278         24.52%        48,639        10.13%
U2R           52           0.01%         26,026        5.17%
R2L          1,126         0.23%        254,476        92.70%
DoS         391,458       381.68%       195,729        58.73%
Probe        4,107         0.84%         4,108         0.78%

Classes    #Validation     Ratio (%)       #Test      Ratio (%)

Normal       48,639          24.52%       60,593        26.15%
U2R            26            0.01%          39          0.01%
R2L            563           0.23%         5,993        2.09%
DoS          195,729        381.68%       223,298      323.61%
Probe         2,053          0.84%         2,377        0.82%

Table 6: Comparison between the proposed SMOTE ratio and the
previous one.

Classes   The proposed (%)    The previous [5] (%)

Normal           --                    --
U2R            100,000               12,000
R2L            49,500                 900
DoS              --                    --
Probe            100                  150

Table 7: Range of SMOTE ratio for the rare classes.

                   U2R            R2L          Probe

SMOTE ratio    100-100,000%   100-50,000%   100-20,000%

Table 8: Time complexity of the methods in LibSVM [23].

Num.                              Methods

1         svm_train          Parse command line
2                               Read_problem
3                           SVM _check_parameter
4                                SVM_train
5                              SVM_save_model

6        svm_predict           SVM load model
7                       SVM check probability model
8                         SVM_predict_probability
9                               SVM_predict

Num.     Complexity (Big-O)       Worst case

1               O(n)           O([n.sup.2] x m)
2              O(mn)
3        O(mn + [n.sup.2])
4          O([n.sup.2]m)
5              O(mn)

6              O(mn)             O([n.sup.3])
7               O(1)
8           O([n.sup.2])
9           O([n.sup.3])

Table 9: Time complexity of the proposed methods.

Num. of the procedures             Complexity (Big-O)
of the proposed methods

1                          // g is the number of experiments
                                         O(g)

                           // h is the number of 100 tuples
2                            which are randomly generated.
                                          O(h)

3                               O(h) + svm_train + O(h)

4                              // The time complexity of
                               svr_train is identical to
                                      svm_train.
                                    O(h) + svr train
                            // k is the number of 1,000,000
                               tuples which are randomly
                                       generated.

5                             // If an algorithm does not
                           depend on n, which is a symbol of
                               amounts of data, then the
                           algorithm has constant complexity
                              or symbolized by O(1) [23].
                            Therefore, the time complexity
                           of svr_test is identical to O(1).
                                   [k.sup.?] svr_test

Num. of the procedures        Worst case
of the proposed methods

1                                O(g)

2                                O(h)

3                          O([n.sup.2] x h)

4

                           O([n.sup.2] x h)

5                                O(k)

Table 10: SVR and SVM parameters.

SVR parameters                               Values

Batch size                                     100
C                                               1
Filter type                          Normalize training data
Kernel                                     PolyKernel
                      Epsilon                1.0E-12
RegSMOImproved    Epsilon Param.              0.001
optimizer            Tolerance                0.001

SVR parameters      SVM parameters              Values

Batch size            Batch size                  100
C                         c                        1
Filter type          Filter type        Normalize training data
Kernel                 Epsilon                 1.00E-12
                      Calibrator               Logistic
RegSMOImproved          Kernel                PolyKernel
optimizer          Tolerance Param.              0.001

Table 11: RNN-LSTM parameters.

Parameters                   Values

Input layers                   41
Iteration (hidden layers)      82
Classes                        5
Batch size                    Full
Epochs                        5000
Learning rate                0.001
Optimizer                   RMSProp

Table 12: Results of the SVR model (10-fold cross-validation).

Measures                                  Values

Correlation coefficient                   0.760
Mean absolute error                       0.005
Root mean squared error                   0.006
Relative absolute error                   60.9%
Root relative squared error               64.6%
                                U2R      292.071
Standard deviation of SMOTE     R2L      158.932
  ratios                       Probe      58.276

Table 13: Recall metrics of SVM, decision tree,
and RNN-LSTM tests.

               Train B + Validation

Classes      SVM       DT      LSTM

Normal      0.961     0.999    0.967
U2R         0.808     0.769    0.769
R2L         0.982     0.961    0.975
DoS         0.999     1.000    0.999
Probe       0.924     0.992    0.974

                  Train B + Test

Classes      SVM       DT      LSTM

Normal      0.977     0.993    0.947
U2R         0.641     0.462    0.641
R2L         0.275     0.235    0.260
DoS         0.855     0.999    0.998
Probe       0.928     0.988    0.977

                  Train A + Test

Classes      SVM       DT      LSTM

Normal      0.977     0.994    0.982
U2R         0.564     0.256    0.615
R2L         0.302     0.261    0.274
DoS         0.871     0.996    0.999
Probe       0.941     0.997    0.996

Table 14: Comparisons with other works by
detection rate.

                                   Normal     U2R      R2L      DoS

These methods              SVM      97.7      56.4     30.2     87.1
                           LSTM     98.2      61.5     27.4     99.9
SVM and clustering [22]             99.3      19.7     28.8     99.5
ESC-IDS [24]                        98.2      14.1     31.5     99.5
KDD'99 winner [25]                  99.5      13.2     8.4      97.1
KDD'99 runner-up [26]               99.4      11.8     7.3      97.5
Multiclassifier [27]                 N/A      29.8     9.6      97.3
Association rule [28]               99.5      3.8      7.9      96.8

                           Probe     Acc.      FP

These methods              94.1      88.2     2.4
                           99.6      98.1     0.4
SVM and clustering [22]    97.5      95.7     0.7
ESC-IDS [24]               84.1      95.3     1.9
KDD'99 winner [25]         83.3      91.8     0.6
KDD'99 runner-up [26]      84.5      91.5     0.6
Multiclassifier [27]       88.7      N/A      N/A
Association rule [28]      74.9      N/A      N/A

Table 15: Recall metrics of previous work [5].

Classes     SVM    k-NN    Decision tree

Normal     0.990   1.000       1.000
U2R        0.460   0.440       0.280
R2L        0.190   0.140       0.130
DoS        0.870   1.000       1.000
Probe      0.920   0.830       1.000

Table 16: Confusion matrix of SVM.

                                   Predicted class

                        Normal   U2R    R2L      DoS     Probe

               Normal   59,196   299    249      683      166
               U2R        3      22     14        0        0
Actual class   R2L      4,074    106   1,810      2        1
               DoS      28,759    0      4     194,517    18
               Probe     127      0     11        3      2,236

Table 17: Confusion matrix of RNN-LSTM.

                                    Predicted class

                        Normal   U2R    R2L      DoS     Probe

               Normal   59,528    92    657      86       230
                U2R       4       24     10       0        1
Actual class    R2L     4,316     23    1640     12        2
                DoS       20      0      2     223,169    107
               Probe      4       0      0        6      2,367

Figure 4: Comparison bar chart of the RNN-LSTM, SVM,
decision tree, and previous SVM tests [1]. The dotted red
lines represent the best recall values among rare classes.

         LSTM    SVM (poly)   Decision tree   Previous (SVM)

Normal   0.982     0.977          0.994          0.990

U2R      0.615     0.564          0.256          0.460

R2L      0.274     0.302          0.261          0.190

DoS      0.999     0.871          0.996          0.870

Probe    0.996     0.941          0.997

Note: Table made from bar graph.
COPYRIGHT 2018 Hindawi Limited
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2018 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Research Article
Author:Seo, Jae-Hyun; Kim, Yong-Hyuk
Publication:Computational Intelligence and Neuroscience
Date:Jan 1, 2018
Words:7263
Previous Article:Binary Morphological Filtering of Dominant Scattering Area Residues for SAR Target Recognition.
Next Article:Neurophysiological Profile of Antismoking Campaigns.
Topics:

Terms of use | Privacy policy | Copyright © 2021 Farlex, Inc. | Feedback | For webmasters |