Encountering imbalance in credit card fraud detection with metaheuristics.
Increased online transactions and card-based transactions has led to a vulnerability of card based frauds. This vulnerability not only affects the unaware but even the most elite customers. The intensity of such frauds can be understood using Figure 1. In the maximum attributable payment methods that are prone to attacks, it can be identified that the credit card fraudulent cases occupy the maximum levels.
Figure 2 shows the percentage of customers affected and the loss incurred (in billions) due to credit card frauds. Though the affected customer ratio is less, it can be observed that the loss due to credit card frauds in the year 2013 is 11.1 billion dollars. Hence the targeted population can be assumed to be a small group of high valued customers.
Several detection techniques have been proposed in literature to detect credit card frauds. Problems existing in such scenarios is that the transaction data which serves as the basis for such transactions contain confidential data. Hence due to the problem pertaining to the breach of confidentiality, organizations (banks) are apathetic in providing the data for research.
Though credit card frauds tend to be anomalies, Classification techniques are often used to detect frauds. The major drawback of a transaction data set is that it is huge and the anomalies when compared to normal data are very meager. Hence the transaction data set involving credit card fraud detection are classified as imbalanced data sets. The imbalance nature makes it difficult for the classification algorithm to analyze and perform effective classification. In a high level of imbalance, the imbalanced data has a huge possibility of being ignored during the training phase, hence the trained model itself might provide inappropriate results during the classification of actual data.
The problems encountered during classification of credit card data has been discussed in . It refers to non-stationery data distribution, data imbalance and velocity of the data as major challenges persisting in credit card fraud detection data sets. Opposed to the general theory of classification, a few transactions are not sufficient in case of credit card fraud detection domain, especially due to the sparse distribution of certain categories . Several learning methods have been proposed to identify these anomalies. Clustering and classification methods play a major role in the identification of anomalies and misuse based fraud detection , [15,18,19]. Machine Learning algorithms also play a vital role in misuse based frauds . Various other methods include rule based [2,3] and tree based [16,13]. But the major issue in these methods is that they do not consider the problem of imbalance and data distribution. A few of the recent methods includes incorporating data imbalance.  employs an aggregation strategy for fraud detection, while  uses modified Fisher Discriminant analysis to detect frauds. Artificial Immune Systems [7,5] have also played a major role in detecting credit card fraud detections.
Imbalanced data is to be handled in a more careful manner, when it comes to problems involving predictions. This paper presents an enhanced filter based method to perform classification of data (Figure 3).
The initial data preparation phases tends to modify the data and prepare them for the classification. A modified form of PSO is used, which has an inbuilt feature selection mechanism that reduces the workload of the regular PSO, making it more accurate and less time consuming.
The process of data preparation begins by analyzing data. Fields are analyzed and data ranges, data types and other data statistics related to the data such as their minimum values, maximum values and their mean values are identified. If the values are nominal instead of numerical, their categories and the category counts are identified. String data that tend to contain values that are of least use in analysis are eliminated. This will provide a detailed view of the data that can be used for inconsistency resolution.
Data to be used for fraud detection is usually obtained from real time log records, hence is prone to missing values and inconsistencies. Data inconsistencies take the form of missing values, null values or inappropriate values. Missing or null values are handled by replacing them with their appropriate data values without bringing in major fluctuations in the data set as a whole. Inappropriate values can be identified while analyzing the data itself. Inappropriate values can range from data type mismatches to value range mismatches. Though a part of this can be resolved by using the data properties identified from the previous phase, human interaction becomes mandatory in most cases. This phase is completed by preparing the necessary data and eliminating the unnecessary data and finally by resolving inconsistencies by a supervised approach.
[FIGURE 3 OMITTED]
Feature selection is the process of analyzing data and eliminating certain attributes that do not contribute to the results. Feature selection methods fall into two broad categories, the wrappers, that evaluate the features using a learning algorithm and eliminate them on the basis of the resultant accuracies, and the filters that evaluate the importance of features using heuristics based on the general characteristics of data [10,11]. Though wrappers provide better results than filters, they are more expensive and are intractable for large databases with many features. Wrappers (Figure 4) are totally dependent on the learning algorithm being used, hence they need to be re-run when switching between learning algorithms.
[FIGURE 4 OMITTED]
Filters (Figure 5) provide faster results and are learning algorithm independent, hence they are considered to be better than wrappers. Their downside is the lack of accuracy when compared to wrappers, but this outweighs their scaling nature with large databases. They function as effective subset selectors for wrapper methods in order to reduce the processing time of wrappers. This paper presents a correlation based feature selection method that is embedded to PSO to create an optimized and faster classification algorithm.
[FIGURE 5 OMITTED]
The CFS based feature selection method evaluates the accuracy of the subset of attributes by considering the individual predictive ability of each of the feature and the degree of redundancy existing between them. Subsets containing attributes that have high correlation with the class attribute and low correlation between themselves are preferred. A good feature set is said to contain features containing most correlation with the class and least no correlation with each other.
A feature is said to be relevant iff there exists some vi and c for which p(Vi = vi) > 0 such that
p(C = c\[V.sub.i] = [v.sub.i]) [not equal to] p (C = c)
CFS only measures the correlation between nominal features, so numeric features are first discretized and then the process is carried out. However, the generalized correlation-based feature selection does not depend on any particular data transformation, the correlation between any two variables is alone measured. Hence the technique can be applied to a variety of problems involving even numerical values. CFS is a completely automatic algorithm, which does not require any supervision in terms of threshold limits. It operates on the original feature space, hence it can be interpreted in terms of the original features. Hence the CFS filtering technique does not incur high computational cost, due to the repeated invoking of the learning algorithm.
If the correlation between the components are known, and the inter-correlation between is provided, then the correlation can be predicted by
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
Where r_zcis the correlation between the summed components and the outside variable, k is the number of components, [(r_zi).sup.-] is the average of the correlations between the components and the outside variable, and (r_ii) is the average inter-correlation between components [6,9,20].
The evaluator method used here is the Best first, which Searches the space of attribute subsets by greedy hillclimbing augmented with a backtracking facility.
Due to the variable nature of data, in banking transactions, the data tends to contain several data types. Though metaheuristics can handle numerical data, they do not work on nominal data. Hence normalization becomes a major requirement for performing effective analysis. Range normalization is used for normalizing both numerical and nominal values.
Range Normalization is a technique used to map a number to a specific range. It helps flatten the fluctuations in various attributes in a data set, which in turn helps avoid bias. After analysis, the normalized fields are denormalized again to obtain the original values for evaluation.
Mathematically speaking, the equation to normalize is:
f(x) = (x - [d.sub.L])([n.sub.H] - [n.sub.L])/([d.sub.H] - [d.sub.L]) + [n.sub.L]
Similarly, the equation to denormalize is:
f(x) = ([d.sub.L] - [d.sub.H])x - ([n.sub.H], [d.sub.L] + [d.sub.H]. [n.sub.L]/([d.sub.H] - [d.sub.L])
PSO based Fraud Detection:
The normalized data is taken as the input for the Particle Swarm Optimization (PSO). Usual PSO techniques use raw data for analysis. This paper enhances the accuracy of prediction by using cleaned normalized data for classification.
PSO uses components called particles that are dispersed into the search space to perform the process of optimization. The number of particles to be used for a problem is not defined. Hence it is to be defined on trial and error basis. The initial distribution of particles is carried out in uniform distribution. The particles are then provided with defined velocities. The global best (gbest)and the particle best (pbest) are initialized and the environment is all set to begin acceleration.
Triggering Particle Movement:
Particle movement, which marks the beginning of the optimization process, is started. The velocity set by the particle defines the speed and the direction of movement of the particle. After the initial movement, pbestandgbestvalues are calculated. The particles current fitness (calculated with the current coordinates occupied by the particle) is calculated and if it is found to be smaller than the particle's pbestthen the current value is set as the pbest. The current pbestis compared with the gbestand if the fitness of the pbest is found to be lesser than the gbest, then it is set as the new gbest. The particle's current location determines its velocity, which can be calculated from equation
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
Where rp and rgare the random numbers, Pi, dandgdare the parameter best and the global best values, xi,dis the value current particle position, and the parameters w, [phi]_p, and [phi]_g are selected by the practitioner.
This process is continued till all the coordinates have been visited and recorded. The process of selecting coordinates is performed in a random manner; hence there also occurs probability of selecting a coordinate multiple times. This leads to a better scope for optimization. In case of time constrained applications, this process is repeated till a specified amount of time and depending on the nodes visited during this time, accuracy of the optimization technique is calculated.
RESULTS AND DISCUSSION
A modified version of PSO was implemented using C#.NET and simulations were carried out. Artificial Neural Networks was implemented using the ENCOG 3.0 framework using C#.NET. Modified PSO was compared with Artificial Neural Networks and the accuracy and prediction rates were calculated.
Figure 6 shows the accuracy exhibited by the modified PSO and Artificial Neural Networks. Though machine learning algorithms are expected to exhibit better accuracies, due to the imbalance present in the data set, the distribution of the data in the training and the test phase is skewed, hence the accuracy has drastically dropped down in ANN.
Time taken to evaluate the dataset is compared for ANN and PSO and it can be observed that they exhibit a huge time difference (Figure 7), which can be attributed to the optimized nature of PSO and the training phase of the ANN.
[FIGURE 8 OMITTED]
[FIGURE 9 OMITTED]
The prediction level of ANN is shown in Figure 8 and that to modified PSO is shown in Figure 9. It can be observed that PSO exhibits better detection rates than ANN.
Prediction rates of the dominant data by ANN and PSO of an inconsistent data set is shown in Figures 10, 11. It can be observed that both the methods content close with 98% and 99% accuracy rates.
Prediction rates of the submissive data by ANN and PSO are shown in the Figures. A huge difference is observed in terms of correctly classified results. It can be observed that ANN has a failure rate of 99%, while modified PSO manages to classify almost 51% of the submissive data correctly.
Online transactions has created increased vulnerabilities leading to many customers falling prey to it. Credit card fraud is one such case. The most challenging aspect of countering credit card frauds is that they are very rare, hence the data sets are prone to imbalance. This paper presents a modified Particle Swarm Optimization technique, that solves this problem by embedding feature selection techniques to enhance the classification accuracy. A comparison of this technique with ANN shows that this technique perform well both in terms of the dominant data and the submissive data.
[1.] Bolton, R.J and D.J. Hand, 2001. Unsupervised profiling methods for fraud detection. Credit Scoring and Credit Control, VII: 235-255.
[2.] Clark, P and T. Niblett, 1989. The cn2 induction algorithm. Machine Learning, 3: 261-283.
[3.] Cohen, W.W., 1995. Fast effective rule induction. In Machine learning-international workshop then conference (pp. 115-123). Morgan Kaufmann Publishers, INC.
[4.] Dorronsoro, J., F. Ginel, C. Sgnchez and C. Cruz, 1997. Neural fraud detection in credit card operations. Neural Networks, 8: 827-834.
[5.] Gadi, Alonso, G., X. Wang and A. Lago, 2008. Credit card fraud detection with artificial immune system. Artificial immune systems. Springer Berlin Heidelberg, pp: 119-131.
[6.] Ghiselli, E.E., 1964. Theory of Psychological Measurement. McGrawHill, New York.
[7.] Halvaiee, Soltani, N and M. kbari, 2014. A novel model for credit card fraud detection using Artificial Immune Systems. Applied Soft Computing, 24: 40-49.
[8.] Hand, David J. and Martin J. Crowder, 2012. Overcoming selectivity bias in evaluating new fraud detection systems for revolving credit operations. International Journal of Forecasting, 28(1): 216-223.
[9.] Hogarth, R.M., 1977. Methods for aggregating opinions. In H. Jungermann and G. de Zeeuw, editors, Decision Making and Change in Human Affairs. D. Reidel Publishing, Dordrecht-Holland.
[10.] Kohavi, R and G. John, 1996. Wrappers for feature subset selection. Artificial Intelligence, special issue on relevance, 97(1-2): 273-324.
[11.] Kohavi, R., 1995. Wrappers for Performance Enhancement and Oblivious Decision Graphs. PhD thesis, Stanford University.
[12.] Mahmoudi, Nader and N. Duman, 2015. Detecting credit card fraud by Modified Fisher Discriminant Analysis. Expert Systems with Applications., 42(5): 2510-2516.
[13.] Olshen, L and C. Stone, 1986. Classification and regression trees. Wadsworth International Group.
[14.] Pozzolo, D., A.O. Caelen, Y. Borgne, S. Waterschoot and G. Bontempi, 2014. Learned lessons in credit card fraud detection from a practitioner perspective. Expert systems with applications., 41(10): 4915-4928.
[15.] Quah, J.T and M. Sriganesh, 2008. Real-time credit card fraud detection using computational intelligence. Expert Systems with Applications, 35: 1721-1732.
[16.] Quinlan, J., 1993. C4. 5: Programs for machine learning (Vol. 1). Morgan Kaufmann.
[17.] Sanjeev, Jha, Guillen, M and J. Westland, 2012. Employing transaction aggregation strategy to detect credit card fraud. Expert systems with applications., 39(16): 12650-12657.
[18.] Weston, D., D. Hand, N. Adams, C. Whitrow and P. Juszczak, 2008. Plastic card fraud detection using peer group analysis. Advances in Data Analysis and Classification, 2: 45-62.
[19.] Whitrow, C., D.J. Hand, P. Juszczak, D. Weston and N.M. Adams, 2009. Transaction aggregation as a strategy for credit card fraud detection. Data Mining and Knowledge Discovery, 18: 30-55.
[20.] Zajonic, R.B., 1962. A note on group judgements and group size. Human Relations, 15: 177-180.
(1) Sivakumar Nadarajan and (2) Dr. Balasubramanian Ramanujam
(1) Research Scholar, PG and Research Department of Computer Science, J.J. College of Arts and Science, Bharathidasan University, Tiruchirappalli, Tamil Nadu, India.
(2) Professor, Research Supervisor, PG and Research Department of Computer Science, J.J. College of Arts and Science, Bharathidasan University, Tiruchirappalli, Tamil Nadu, India.
Received 25 April 2016; Accepted 28 May 2016; Available 2 June 2016
Address For Correspondence:
Sivakumar Nadarajan, Research Scholar, PG and Research Department of Computer Science, J.J. College of Arts and Science, Bharathidasan University, Tiruchirappalli, Tamil Nadu, India.
Fig. 1: Percentage of Fraudulent Transactions Attributable To Payments Methods Among Merchants Accepting Specific Payment Methods 2011 2012 2013 2014 Debit card 18% 20% 15% 16% Credit card 65% 60% 58% 52% Alternative payment Methods (PayPal, BillMeLater, eBillme, Google Checkout, Etc) 27% 9% 23% 13% Checks 18% 46% 46% 41% Note: Table made from bar graph. Fig. 2: Total Existing Card Fraud Losses and Incidence Rate by Year Billions. U.S Percent of consumer 2006 $12.7 2.58% 2007 $11.0 2.48% 2008 $13.7 3.19% 2009 $14.5 3.48% 2010 $7.9 2.34% 2011 $8.6 3.25% 2012 $8.0 3.14% 2013 $11.1 4.60% Note: Table made from bar graph. Fig. 6: PSO vs. ANN: Accuracy Accuracy Method Modified PSO ANN Accuracy 99.95723099 66.86715752 Note: Table made from bar graph. Fig. 7: PSO vs. ANN: Time Taken Time(ms) Method Modified PSO ANN 790531 2508349 Note: Table made from bar graph. Fig. 10: ANN: Prediction rate of dominant data ANN--Dominant Data Prediction Corrects% 98.68303716 Incorrect% 1.316962843 Note: Table made from pie chart. Fig. 11: PSO: Prediction rate of Dominant Data PSO(M)-Dominant Data Prediction Correct% 0.011237121 Incorrect% 99.98876288 Note: Table made from pie chart. Fig. 12: ANN: Prediction Rate of Submissive Data ANN--Submissive Data Prediction Correct% 0.362574015 Incorrect% 99.63742599 Note: Table made from pie chart. Fig. 13: PSO: Prediction Rate of Submissive Data PSO(M)--Submissive Data Prediction Correct% 48.97959184 Incorrect% 51.02040816 Note: Table made from pie chart.
|Printer friendly Cite/link Email Feedback|
|Author:||Nadarajan, Sivakumar; Ramanujam, Balasubramanian|
|Publication:||Advances in Natural and Applied Sciences|
|Date:||Jun 1, 2016|
|Previous Article:||Screening of Yemeni medicinal plant for antibacterial and antifungal activates.|
|Next Article:||Author attribution using stylometry for multi-author scientific publications.|