Printer Friendly

Analyzing the risk of diabetes mellitus using patient EMR by association rule mining.


Diabetes is a lifelong continual disease, which has affected majority of population in the world due to their varying food habits. For now, more than 26.9% senior citizens of the population in US are affected by diabetes. Around the world, every year 1.9 million found with diabetes while 7.0 million doesn't even realize they have diabetes who suffers undiagnosed and untreated. Diabetes mellitus is a metabolic disorder which is caused by severe hyperglycemia which happens due to discrepancies in fat, protein and carbohydrates in the body. Always diabetes causes long term damage in the human body, which can cause failures in the vital organs of the body which ensures the metabolism. Insulin controls the diabetes and increase in the level of diabetes mellitus can cause rise and fall of insulin secretion in the body. The symptoms of Diabetes mellitus [1] is always extensive thirst, urinal infections, sudden vision impairment and loss in weight. When the disease goes out of control, it can cause diabetic acidosis which can lead to coma (if the case is untreated) or death. Sometimes, in few cases there may be no symptoms at all which may show up in the later ages.

In our paper, we consider the seriousness of Diabetes and have proposed a method to find out a solution to mitigate a risk which helps the population of people who are in the verge of getting affected by the disease. The use of association mining along with the available EMR from centralized database from health institution can help in analyzing the patient records using the association rule mining to reach at a conclusion about population that has been affected by diabetes.

In the diabetes mellitus, the blood sugar content doesn't decrease at all to normal which is between 70 mg/dl and 120 ml/dl. The oral insulin intake can facilitate the drop in the sugar level. The association rule mining considers the discrepancy in the diabetes mellitus sugar level as a parameter which is the rule. The consequent is the dropping after the food intake or insulin intake.

I. Classification Of Diabetes Mellitus:

Based on the results which shows up after analysis of the diagnosis [2], diabetes is classified into two types. Many diabetic patients can show varying results, sometimes a combination of both of the classes. For an instance, a patient affected with GDM (Gestational Diabetic Mellitus) may have the symptoms of hyperglycemia which in fact is the second class of diabetes mellitus. Strangely, a patient who gets affected by diabetes due to heterochthonous medicines may become the one with normal hyperglycemia once if the medicines are discontinued. After few years they may see the discrepancies which can cause them pancreatitis. Hence for both the practitioner and patient, it's of reduced importance to label the actual sort of polygenic disorder than it's to grasp the pathologic process of the symptom and to treat it effectively.

A. Immune-mediated diabetes: Type I:

The first classification [3] of the diabetic mellitus is the Beta-cell annihilation which leads to discrepancy in insulin segregation. This affects only 5-10% of the people affected by diabetes which kindles the destruction of the b-cells of human pancreas. This can cut back immunity which affects the metabolic cells. The path which the destruction of the cell traverses is antibodies destruction, insulin reduction and GAD65 affected by auto antibodies that are affected. One or more than few antibodies of this kind shows up in more than 85% of individuals detected with diabetes.

B. Insulin Deficient Diabetes: Type II:

90-90% of the population affected with diabetes are said to fall under the second category of the diabetic classification. It is generally called the insulin resistant diabetes. Often throughout their lifetime, patient may need the insulin treatment time and again to cope up.

II. Existing System:

The existing system that we analyzed before constructing out idea works on statistical modelling. Both type I and type II diabetes are structured and analyzed in the existing system with small set of patient records. It is used to build prediction based results time-to-time which helps in editing the patient records manually. Censoring the record happens when full objective or record about the diagnosis of the patient is missing. During inspection, if the patient refuses to cooperate with the study or if we understand if the patient is not affected by diabetes at the time period we monitored or if the patient has developed diabetes at the end of the monitoring, partial information about the patient can persist in the database. This can consume time during the clinical research but the database can make it easy. This makes the existing system to survive in the clinical research.

III. Proposed System:

We focus on summarizing the techniques while providing guidance to the medical practitioners about choosing the right way of categorization. Our aim is to present a application that is capable of identifying body conditions, medication and prediction of morbidities which assists in clinical examinations. The factors dervied can assist in the prediction of diabetes mellitus risk. We had tough time choosing between TopK and BUS[5]. We found BUS is more redundant than TopK which allows the rebuilding of database time and again providing much patient coverage. We found BUS algorithm best suited for our study. APRX-COLLECTION, RPGlobal, TopK and BUS are efficient and most prevalently used techniques that can compress all the natively present rule sets in EMR which forcasts the risk of Diabetes mellitus in the subjected population of patients. Though Association rule set summation doesn't provide much guidance about applicabilty, it is the most suited technique to categorize the data. Association rule implementation is very efficient since they provide the Medical data with related set of conditions. These set of conditions can come up with most accurate solution for prevention and intensive-care for the diabetes mellitus problem. The continuous outcome of results can help and minimally modify the poligenic disorder in the subpopulation of the patients. The overview of the techiques that we use are as follows:

* BUS--Bottom up Summarization works on the record of the patients and on the rules generated during association mining. BUS openly has restriction over the repetition in the patient records which is covered with minimum parameters or rules. The reduction in the variation of the data is directly proportional to the reduction in the repetition of data.

* Association rule mining concentrates in getting the relations, interrelationship, arrangement and structures of records in the database. An item set in association rule mining in our proposal covers the patient. In association rule, I -> J, I -> J, if "I" applies to the patient, then it is likely to get applied to "J" as well. I becomes the anecdote while J is the rule that is consequent to I.

IV. Experimental Setup:

For implementing the above discussed method, we need the following support of the hardware and softwares in our environment.

V. Architecture Diagram:



Once we have the above setup in our systems, we proceed with the experimental flow. We have divided our experimental work into three modules which are as follows:

* Allowance of Health Center Database

* Elucidating Database Collection in EMR

* Implementation of Association Summarization Techniques

A. Allowance of Health Center Database:

At the beginning of the experiment, we have no DB records of the patients. The association summarization technique that we are implementing in the experiment is for distributed database and not for a single database. The permission of the access center of central database should be granted at the beginning from the health care administrator to carry on with the next two phases.

B. Elucidating Database Collection in EMR:

The patient records that we have collected are preserved in our application with maximum privacy. The patient medical details are only elucidated. Any personal information which is irrelevant to us about the particular patient under examination can be identified with their ID itself. The patient data can then be filtered.

C. Implementation of Association Summarization Techniques:

The diabetic mellitus is considered by us as prime rule by association mining in the experiment that we are carrying out here. "I" is the item set in the experiment which implies the patient and "J" is the association rule that refers to the factors which depend on the patient record. The proficiently co-founding factors or symptoms are used to analyze and determine the treatment factor. BUS openly restricts the redundancy which helps in reducing the repetitiveness of patient record which ensures the high level accuracy of the result. Experimentally, we have found most of the people affected by diabetes are undiagnosed due to their unawareness in understanding the disease.

1. Aprx-Collection:

This calculation finds the risk components of the principle most outlined subsets of the rundown tenet will be substantial standards in the one of a kind (unsummarized) set and these subset leads infer Similar chance of diabetes.

2. RPGlobal:

The primary downsides of APRX-COLLECTION were the excess in the standard set and the weakening of the danger. The RPGlobal outline is like APRX-COLLECTION in that it is mostly worried with the outflow of the standard, and henceforth it performs an extremely forceful pressure. In any case, it addresses the two disadvantages by taking constructing so as to understand scope into record and the rundown from guidelines in the first run set (instead of an augmented set).

3. Top-K:

The Redundancy-Aware Top K (TopK) calculation further decreases the excess in the guideline set which was conceivable through working on patients instead of the expressions of the standards. While this methodology relinquished the exceptional pressure rates of the past two calculation, TopK still accomplishes high pressure rate.


BUS works on the patients what's more, not on the guidelines. Thus, excess as far as principle expression can happen. In any case, BUS unequivocally controls the excess in the patient space through the parameter commanding the base number of new (already revealed) cases (patients with diabetes occurrence) that need to be secured by every standard. In this manner the decreased variability in the standard expression does not decipher into expanded repetition.




We arrive at a conclusion that by using Association mining with Bottom up summarization approach called BUS, we can mitigate the risk of Diabetes mellitus in patients to greater extent. The data in the hospitals are stored in various data centers. The centralized database is considered where the BUS or Bottom up summarization helps in clustering the data. The EMR that is been used in majority of hospitals today is of great use in the electronic information interchange among the datacenters. The new analysis in the medical treatment can be efficiently done with EMR across continents. The connection and compatibility of the algorithms while determining a range with few rules and criterion can fetch the outline of the disease from the population of patients subjugated to the analysis. Association rule is more diligent and impressive to bring out representative rule based analysis, which highlights the risk of Diabetes mellitus in patients.


[1.] Gyorgy, J., Simon, "Extending Association Rule Summarization Techniques to Assess Risk of Diabetes Mellitus," IEEE Transactions on Knowledge and Data Engineering.

[2.] Jayalakshmi, T., 2010. "A Novel Classification Method for Diagnosis of Diabetes Mellitus Using Artificial Neural Networks", Data Storage and Data Engineering (DSDE).

[3.] Albert, K.G.M.M. and P.Z. Zimmet, 1998. "Definition, diagnosis and classification of diabetes mellitus and its complications. Part 1: diagnosis and classification of diabetes mellitus," 15(7): 539-553.

[4.] Reinhart, I., 2012. "Electronic medical referral system: A forum-based approach," Information Reuse and Integration (IRI), IEEE 13th International Conference on, On page(s): 572-577.

[5.] Minqing Hu and Bing Liu, "Mining Opinion Features in Customer Reviews," 760 USER MODELING.

[6.] Rexeena, X., B. Suganya Devi, S. Saranya, 2014. "Risk Assessment for Diabetes Mellitus using Association Rule Mining," International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), 3-2.

[7.] Kumaresan, A., A. subashini, 2015." Virtuous Approach For Identification Of Outlier With Imperfect Data Labels," International Journal of Innovative Research And Studies (IJiRS), 4(5).

(1) C. Bhuvaneshwari, (2) A. Sangeetha, (3) K. Vijayakumar and (4) A. Kumaresan

(1) PG Scholar, Department of Computer Science and Engineering SKP Engineering College Thiruvannamalai-606611

(2) PG Scholar, Department of Computer Science and Engineering SKP Engineering College Thiruvannamalai-606611

(3) Professor, Department of Computer Science and Engineering SKP Engineering College Thiruvannamalai-606611

(4) Associate Professor Department of Computer Science and Engineering SKP Engineering College Thiruvannamalai-606611.

Received February 2016; Accepted 18 April 2016; Available 25 April 2016

Address For Correspondence:

C. Bhuvaneshwari, PG Scholar, Department of Computer Science and Engineering SKP Engineering College Thiruvannamalai606611.


This work is licensed under the Creative Commons Attribution International License (CC BY).
Table 1:

Serial No            Experimental Setup

              Support Needed      Specification

1            Number of system           5
2             Accessing Time       30 minutes
3            Protocol Needed          IPv4
4             Total RAM size         1024MB
5             Software Tools     JDK 1.6, Tomcat
6                Database          Oracle 10g
COPYRIGHT 2016 American-Eurasian Network for Scientific Information
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2016 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Bhuvaneshwari, C.; Sangeetha, A.; Vijayakumar, K.; Kumaresan, A.
Publication:Advances in Natural and Applied Sciences
Date:May 1, 2016
Previous Article:Detection of isolation attack using OLSR protocol on MANET.
Next Article:Intelligent information extractor mechanism and validating image in electrocardiographic data.

Terms of use | Privacy policy | Copyright © 2018 Farlex, Inc. | Feedback | For webmasters