Printer Friendly

Skin Diseases Predictive Model Using Individual Base and Ensemble Base Approach.

1 Introduction

The skin of a human represents one of the predominant organs of the body (Sudha, Aramudhan, & Kannan, 2017) that protects against infection, heat, injury, and any part of the body in which ultraviolet (UV) radiation can cause damage (Parvin & Jafar, 2017). Several types of reactions are becoming more pronounced as symptoms in skin disease prediction (Amarathunga, Ellawala, Abeysekara, & Amalraj, 2015). The skin diseases can be very risky, mostly when it is not given attention at an early stage. The technique of data mining includes an approach of getting knowledge that is hidden from huge data (Adeyemo & Adeyeye, 2015).

The approaches available in data mining contribute to the transformation of large data into relevant data for knowledge discovery, prediction and intelligent decision making in healthcare systems (Aro, Muhammed, Ayoade, & Oladipo, 2017). The field of medicine is significant for the application of data mining and analysis technique as it concerns human life which is the most important thing in the society (Singh, Naveen, & Samota, 2013).

Numerous applications have been developed through data mining include data analysis for effective medical policy making, prevention and detection of various diseases, errors reduction in hospitals and falsified insurance detection (Kaur & Bawa, 2015). The analytical predictive system is a branch of data mining that automatically produce model for classification using training dataset and employ the same model for further prediction of classes of datasets that are not classified (Varun, Vijay & Gayathri, 2012). Data mining using predictive approach handles learning algorithms to help clinicians in monitoring tasks, diagnostics and therapeutics. In modeling through predictive method, the collection of data is achieved first, then a model will be calculated statistically, predictions are made and the model is validated as further data become available. Clinical data mining is based on research strategic to reclaim, study and deduce both qualitative and quantitative information available from medical datasets or records.

Many classification techniques in data mining such as Naive Bayes, KNN, Neural Network, Logistic Regression, Support Vector Machine (SVM) and decision tree have been used to develop predictive and diagnostic systems (Ahmad,, Qamar, & Rizvi, 2015). Among these techniques, the decision tree has been discovered as one of the most powerful classification techniques used in learning process and decision support system (Seema, Rathi & Mamta, 2012).

A predictive technique like decision tree can be used in classification, clustering and predictive task. This technique applies a divide-conquer approach to split the problem search space into subsets. The decision tree's classification is an easy to understand approach, which is especially used when there is a need to know the trained knowledge model structure.

This paper selected J48 algorithm under the decision tree while Multi-layered perceptron was picked from the Neural Network. These learning algorithms have complementary advantages and disadvantages. Decision Tree's knowledge representation is easily understood by humans, unlike the case of Neural Networks. Decision Trees have the problem of dealing with noise during data training phase, which is not obtainable in Neural Networks. Decision Trees learn fast and Neural Networks learn relatively slowly. Therefore, with these outstanding features combining Decision Trees and Neural Network using an ensemble-based approach should produce better predictive system for skin diseases.

2 Related Work

Kadhim, (2017) used technique in data mining to classify skin diseases. For mining and processing of image data phase, the study employed decision tree. Digital image principles and decision tree were applied to detect skin diseases using some attributes identified in a digital image of a skin. The main phases in the system are: preprocessing, features extraction and classification by decision tree. The classification phase was enhanced in order to make system more accurate.

Karthik, et al (2017) presented a predictive system for dermatological condition using Naive Bayesian classification approach. The system focused only skin diseases of 6 categories. The Naive Bayes result was based on the parameter settings produced by probability occurrence of a particular kind of dermatological disease along with the percentage. The dermatology dataset was downloaded from the UCI repository site was used for testing of the developed skin disease predictive model.

Sudha et al. (2017) discussed the relationship among input and response attributes for improving disease diagnosis in medical area. The Response Surface Methodology (RSM) was used to develop a relationship between input attributes of skin disease and predicts the psoriasis patients with the help of independent and dependent variables. The performance of RSM model revealed that the developed empirical relationship has the greatest conformity with test results. The Analysis of Variance (ANOVA) was performed to mathematical analyze the outcome. The final experimental result showed that the developed empirical model is suitable for skin disease.

Manjusha, Sankaranaayana and Seena (2015) studied dermatological diseases of different types with like symptoms which may even prove deadly if not properly attended at exact time. The study used medical dataset of 230 instances with 22 attributes. Analysis was conducted on data gathered from the southern part of Kerala, India. For better prediction, calculation of accuracy of mining algorithm is important. Two data mining classification algorithms: Naive Bayes (NB) and J48 were used for data analysis. WEKA Open Source data mining tool was used to carry out analysis and also to reveal the chances of different dermatological diseases and also to find out the probability of occurrence of each disease.

3 Methodology

The proposed predictive model considered the individual-based and ensemble-based methods for prediction of skin diseases. The J48 algorithm was selected under the decision tree category while Multi-layered perceptron was chosen from the Neural Network category. This experiment was carried out step-by-step with focus on applying the selected classification algorithms and their ensemble on the dermatology dataset. The procedures included environment setup, data preprocessing, choosing the data mining software, running the simulation and evaluation of the performance of the classifiers.

The preprocessing stage involves putting together of clinical and histopathological attributes of patients and collating all of them together in a single database and formatting the database into a usable dataset for the purpose of this study. The WEKA (Wekaito Environment for Knowledge Analysis) data mining tool was employed for conducting this study. The J48 and Multi-layered perceptron algorithm was applied on the dataset as well as their ensemble too. The selected classifiers were combined through a technique referred to as ensemble by majority vote. This study concern is on 6 major skin diseases as its research subjects.

3.1 Skin Disease Dataset

This study was developed on the basis of the information gotten from the publicly available online skin diseases dataset from Dermatology Database of Gazi University, School of Medicine. After filtering and correcting missing values, 366 skin disease data were obtained which was used for prediction of six skin diseases which showed similar symptoms.

Among all, 112 pieces of data were psoriasis, 61 seboreic dermatitis, 72 lichen planus, 49 pityriasis rosea, 52 cronic dermatitis and 20 pityriasis rubra pilaris. It was also discovered that the database contains 34 attributes, 33 of which were linear valued and one of them was nominal. They all shared the clinical features of erythema and scaling, with very little differences. The diseases in this group were psoriasis, seboreic dermatitis, lichen planus, pityriasis rosea, cronic dermatitis, and pityriasis rubra pilaris as shown in Table 1

3.2 Individual Base Classifier for Skin Diseases Predictive System

The skin disease dataset will serve as input to each base classifier. Multilayer perceptron neural network and J48 decision tree machine learning algorithm and their individual performance will be recorded respectively and the output measured and evaluated.

3.3 Ensemble Base Classifier for Skin Diseases Predictive System

The ensemble base skin disease predictive model was developed by combining the J48 and Multi-layered perceptron algorithms together via the Majority Voting method. The skin disease dataset also served as the input to this developed predictive model which was trained and tested using the 10-fold cross validation technique and the results of the model performances were measured and evaluated accordingly.

4. RESULTS AND DISCUSSION

4. 1 Experimental Result of Multilayer Perceptron Algorithm for Classification

This phase discusses the results of multilayer perceptron neural network for classification of skin disease as shown in Figure 2, Table 2 and Table 3 respectively.

4.2 Experimental Results of J48 Decision Tree for Classification

This section mentions the results of J48 Decision Tree algorithm for classification of skin disease as shown in Figure 3, Table 4 and Table 5 respectively.

4.3 Experimental Results of Ensemble-Based For Classification

The results of the combination of Multilayer Perceptron Neural Network and J48 Decision Tree Algorithm ensemble approach for classification of skin disease are shown in Figure 3, Table 4 and Table 5 respectively

4.4 Experimental Results of Comparative Analysis of Developed Skin Disease Predictive System

The results of each phase of the predictive system of skin diseases were properly analyzed in this section. This is shown in Table 7 and Figure 8.

The results of comparative analysis from Table 8, and graphical representation in Figure 7, shows that the ensemble approach of skin diseases predictive model gives the most moderate result out of the three skin disease predictive models developed as it is carefully positioned to avoid over-fitting which might likely occur in the MLP Neural Network predictive model and also under-fitting in the J48 Decision Tree predictive model, thereby standing as a more reliable model for skin disease predictive system.

5 Summary and Discussion

The prediction of skin disease model using three data mining classification techniques and ensemble-based approach gave different results for each machine learning algorithm (classifier) used. It was observed that the accuracy of 96.9945% was obtained in multilayer perceptron neural network (MLP), while 93.9891% was recorded in J48 decision tree. The misclassification rate of 3.0055% was obtained in MLP, while the J48 presented misclassification of 6.0109%. The MLP got 0.97 True Positive Rate (TP RATE), while J48 got 0.94 True Positive (TP RATE). For the False Positive Rate (FP RATE), MLP recorded 0.006, while J48 obtained 0.015 for False Positive Rate (FP RATE). For precision, MLP gave 0,97, while the J48 gave 0.94. The MLP gave recall rate of 0.97, while the J48 recorded 0.94.

Finally, the overall results from the skin diseases predicted model using ensemble approach based on majority voting gave outstanding results compared with individual based method of multilayer neural network and J48 decision tree algorithm.

It was observed that the ensemble method gave accuracy of 95.355% which is the most the modest result out of the predictive models developed as it is carefully positioned to avoid over-fitting which might occur in the MLP Neural Network predictive model and also under-fitting in the J48 Decision Tree predictive model, thereby standing as a more reliable model for skin diseases.

6 Conclusion

In this paper, skin diseases predictive model was developed using individual-based and ensemble data mining methods. The best prediction model was obtained by combining J48 and MLP using the Majority voting technique. It was concluded that the developed model helps the doctors to clear their confusion when making decision and predicting diseases with similar symptoms.

At the end of the analysis, it was discovered that the majority voting technique was able to help build the model in less than 2 minutes (65.03secs). It also produced a more accurate and correct result of 95.35% as against the individual-based model. This proves that the combination of Decision Tree and Neural Networks by ensemble method is of great potential that can be successful used in medical diagnosis of skin diseases.

References

Adeyemo, O. O. & Adeyeye, T. O. (2015). Comparative Study of ID3 / C4 . 5 Decision tree and Multilayer Perceptron Algorithms for the Prediction of Typhoid Fever. African Journal of Computing & ICT, 8(1), 103-112.

Ahmad, P., Qamar, S. & Rizvi, S. Q. (2015). Techniques of Data Mining In Healthcare : A Review. International Journal of Computer Applications, 120(15), 38-50.

Amarathunga, A. A. L. C., Ellawala, E. P. W. C., Abeysekara, G. N. & Amalraj, C. R. J. (2015). Expert System For Diagnosis Of Skin Diseases. International Journal of Scientific & Technology Research, 4(1), 174-178.

Aro, T. O., Muhammed, B. J., Ayoade, O. B. & Oladipo, I. D. (2017). A Review on Data Mining Techniques for Heart Disease Prediction. Anale. Seria Informatica, XV(1), 99-103.

Kadhim, Q. K. (2017). Classification of Human Skin Diseases using Data Mining. International Journal of Advanced Engineering Research and Science, 4(1), 159-163.

Karthik, S., Rahul, B. S., Vibhudhi, M. & Tej, T. (2017). Prediction of Dermatological Condition Using Naive Bayesian Classification. International Journal of Pharmacy & Technology, 9(1), 28988-28994.

Kaur, S. & Bawa, R. K. (2015). Future Trends of Data Mining in Predicting the Various Diseases in Medical Healthcare System. International Journal of Energy, Information and Communications, 6(4), 17-34.

Manjusha, K K, Sankaranaayana, K. & Seena, P. (2015). Data Mining in Dermatological Diagnosis : A Method for Severity Prediction. International Journal of Computer Applications, 117(11), 11-14.

Parvin, S. R. & Jafar, O. A. M. (2017a). Prediction of Skin Diseases using Data Mining Techniques. Internationa Journal of Advanced Research in Computer and Communication Engineering, 6(7),313-318. http://doi.org/10.17148/IJARCCE.2017.6754

Seema, Rathi, M. & Mamta. (2012). Decision Tree: Data Mining Techniques. International Journal of Latest Trends in Engineering and Technology (IJLTET), 1(3), 150-155.

Singh, D., Naveen, H. & Samota, J. (2013). Analysis of Data Mining Classification with Decision Tree Technique. Global Journal of Computer Science and Technology, 13(13), 1-6.

Sudha, J., Aramudhan, M. & Kannan, S. (2017). Development of a mathematical model for skin disease prediction using response surface methodology. Biomedical Research, 355-359.

Varun K. M., Vijay, S. V. & Gayathri, D. B. (2012). Hepatitis Prediction Model based on Data Mining Algorithm and Optimal Feature Selection to Improve Predictive Accuracy. International Journal of Computer Applications, 51(19), 13-16.

Tinuke Omolewa Oladele (1), Taye Oladele Aro (2), Adebisi Samuel Segun (3)

(1,2,3) Department of Computer Science, University of Ilorin, Ilorin, Nigeria. tinuoladele@gmail.com, taiwo774@gmail.com, adebisimuel@gmail.com
Table 1: Skin Disease Database

Class Code  Class                     Number of Instances

1           psoriasis                 112
2           seboreic dermatitis        61
3           lichen planus              72
4           rosea                      49
5           cronic dermatitis          52
6           pityriasis rubra pilaris   20

Table 2: Performance Evaluation of MLP Neural Network

Parameters                            Dermatology Data set

Correctly Classified Instances (%)    96.9945
Incorrectly Classified Instances (%)   3.0055
Kappa Statistics                       0.9624
Mean Absolute Error                    0.0411
Root Mean Squared Error                0.0927
Relative Absolute Error               15.4167
Root Relative Squared Error           25.4006

Table 3: Performance Evaluation of MLP Neural Network

Parameters                Dermatology Data set

True Positive (TP RATE)   0.970
False Positive (FP RATE)  0.006
Precision                 0.970
Recall                    0.970
F-Measure                 0.970
ROC Area                  0.999

Table 4: Performance Evaluation of J48 Decision Tree

Parameters                            Dermatology Data set

Correctly Classified Instances (%)    93.9891
Incorrectly Classified Instances (%)   6.0109
Kappa Statistics                       0.9246
Mean Absolute Error                    0.0264
Root Mean Squared Error                0.1365
Relative Absolute Error                9.9147
Root Relative Squared Error           37.397

Table 5: Performance Evaluation of J48 Decision Tree

Parameters                Dermatology Data set

True Positive (TP RATE)   0.940
False Positive (FP RATE)  0.015
Precision                 0.940
Recall                    0.940
F-Measure                 0.940
ROC Area                  0.977

Table 6: Performance evaluation of Ensemble method.

Parameters                            Dermatology Data Set

Correctly Classified Instances (%)    95.3552
Incorrectly Classified Instances (%)   4.64480
Kappa Statistics                       0.94180
Mean Absolute Error                    0.01550
Root Mean Squared Error                0.12440
Relative Absolute Error                5.81000
Root Relative Squared Error           34.0999

Table 7: Performance measurement of Ensemble method.

Parameters                Dermatology Data set

True Positive (TP RATE)   0.954
False Positive (FP RATE)  0.010
Precision                 0.953
Recall                    0.954
F-Measure                 0.953
ROC Area                  0.972

Table 8: Comparative Analysis of Performance of Machine Learning
Algorithms and Ensemble-based

                                     Dermatology Data set
Classifier       Accuracy (%)  TP Rate  FP Rate  Precision  ROC Area

MLP              96.9945       0.97     0.006    0.970      0.999
J48              93.9891       0.94     0.150    0.940      0.977
Ensemble Method  95.3552       0.954    0.010    0.953      0.972
COPYRIGHT 2018 University of the West of Scotland, School of Computing
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2018 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Oladele, Tinuke Omolewa; Aro, Taye Oladele; Segun, Adebisi Samuel
Publication:Computing and Information Systems
Article Type:Report
Date:Oct 1, 2018
Words:2665
Previous Article:Development of Iris Biometric Template Security Using Steganography.
Next Article:RSA Cryptosystem Encryption Based on Three Moduli Set With Common Factor {2n+2, 2n+1,2n}.
Topics:

Terms of use | Privacy policy | Copyright © 2022 Farlex, Inc. | Feedback | For webmasters |