Printer Friendly

Prediction of periodontitis in patients using big data analytics.


Big data analytics is the process of collecting, organizing and analyzing large sets of data (called big data) to discover patterns and other useful information. Big data analytics can help organizations to better understand the information contained within the data and will also help identify the data that is most important to the business and future business decisions. Analysts working with big data basically want the knowledge that comes from analyzing the data.

To analyze such a large volume of data, big data analytics is typically performed using specialized software tools and applications for predictive analytics, data mining, text mining, and forecasting and data optimization. Collectively these processes are separate but highly integrated functions of high-performance analytics. Using big data tools and software enables an organization to process extremely large volumes of data that a business has collected to determine which data is relevant and can be analyzed to derive better business decisions in the future. For most organizations, big data analysis is a challenge. Consider the sheer volume of data and the different formats of the data (both structured and unstructured data) that is collected across the entire organization and the many different ways and different types of data which can be combined, contrasted and analyzed to find patterns and other useful business information.

The first challenge is in breaking down data silos to access all data an organization stores in different places and often in different systems. The second big data challenge is in creating platforms that can pull in unstructured data as easily as structured data. This massive volume of data is typically so large that it's difficult to process using traditional database and software methods.

The primary goal of big data analytics is to help companies make more informed business decisions by enabling data scientists, predictive modelers and other analytics professionals to analyze large volumes of transaction data, as well as other forms of data that may be untapped by conventional business intelligence (BI) programs. That could include Web server logs and Internet clicks stream data, social media content and social network activity reports, text from customer emails and survey responses, mobile-phone call detail records and machine data captured by sensors connected to the Internet of Things. Some people exclusively associate big data with semi-structured and unstructured data of that sort, but consulting firms like Gartner Inc. and Forrester Research Inc. also consider transactions and other structured data to be valid components of big data analytics applications.

In our work, The problem statement is based on given patient statistics, who are expecting the probability of patients getting a dental ailment. Clinical decisions are regularly made based on medical doctors' intuition rather than at the wealthy information data hidden in the database. This practice results in unwanted biases, errors and an modern medical practice which impacts the quality of treatment provided to sufferers. We propose the integration of medical selection aid with analytical algorithm which could lessen clinical errors, increase the affected person's protection, decrease unwanted practice variation, and enhance patient outcome. This notion is promising as statistics modeling and evaluation tools, e.g., data mining, since it has the capacity to generate a knowledge rich environment that can assist to noticeably improve the effectiveness of clinical decision making. Disease prediction usage is one among the highly exciting and rewarding strategy in Big data Analysis. The shortage of specialists and excessive wrong disease identified instances has necessitated the need to develop a fast and efficient detection gadget. The main objective of this analysis is to become aware of the key styles or features from the scientific statistics which forms a pattern using the clustering model. The attributes which might be extra relevant to Dental disease will be found. This will assist the scientific practitioners to understand the basis of the causes of sickness and its extensiveness.

Related Work:

In cancer research, Fortunato, proposed an approach that is flexible since the cancer registry can receive information from an increasing number of sources and where features of the sources (e.g. automation, coding) and their content vary over time. Real-time registration is central to the realization of an effective integration within the oncology system. The registry tool has already allowed collaboration of oncologic multidisciplinary groups concerned with skin melanoma and thyroid cancers [1]. The potential for UHS to occur in wearers of protective clothing creates a need for a predictive on-body monitoring and actuation system to increase wearer safety. Specifically, the case study of EOD operatives during missions was considered here and the key parameters for a predictive model of heat stress were described. Refinement of the model parameters may allow the same prediction mechanism to be employed in a variety of other applications [2].Bum Ju Lee, 'study focused only anthropometric measurements. Recently, anthropometric measurements have been recommended as alternatives to predict future diseases in public health and clinical practice. Although CT, MRI, and dual-energy X-ray absorptiometry techniques have high capability, the techniques are costly and invasive [3]. As Srinivas expected, using PIs along with AR-model predictions increased the effective prediction horizon, enabling earlier detection of the onset of core temperature rise than otherwise possible using AR-model predictions alone. We also found that none of the three proposed alert algorithms was consistently superior in each of the four assessed measures of performance [4].


1. Define the original user-attribute matrix, R, of size m x n, which includes the sample of 'm' users on 'n' periodontitis-attribute.

2. Preprocess periodontitis-attribute matrix R in order to eliminate all repeated data values.

3. Calculate the ESVD of R norm and obtain matrices U, S and V, of size mxm, m x n, and n x n, respectively. Their connection is said by: R norm = U- S- VT.

4. Complete the dimensionality diminishment venture by keeping just k diagonal sections from lattice S to get a kxk framework, Sk.

5 Process V Sk and afterward compute two framework items: Uk V SkT, which speaks to m periodontitis attribute and V Sk VT k, which speaks to n periodontitis-attribute in the k dimensional element space.

6. Ascertain the closeness between all periodontitis-attribute, ij and if by figuring their Adjusted Cosine Similarity as takes after:


where k is the number of pseudo-users, selected when performing the dimensionality reduction step. Here cosine similarity is calculated for periodontitis attribute.

7. Close with Prediction state, accomplished by the accompanying weighted entirety:


which calculates the prediction for periodontitis-[attributeu.sub.a] on item [i.sub.j].

Proposed Work:

We demonstrate a method for collaborative filtering of future Prediction. Previous work on recommender systems typically relies on feedback on a particular attribute of a disease, such as an age, and generalizes this to other items or other people.

We examine the topic of unseen attribute recommendation through a user study of recommendation, where we aim to correctly estimate a filtering function for each user. Then by decomposing user parameters into shared and individual dimensions, we induce a similarity metric between users based on the degree to which they share these dimensions.

We show that the collaborative filtering predictions of disease are more effective than pure content-based recommendation.

This is determined by the number of diseases which the patients have in common. In the most basic case, patients are removed only if they have no diseases in before each application of collaborative filtering, clustering is applied to the training set to discover connected components of patients. This served to remove the influence of patients who have little or no similarity with the testing patient for whom predictions are being common with the active patient. Thus, removing these patients does not result in loss of information, but effectively reduces the runtime of the algorithm.

In practice, we cluster such that all patients in the training set have two or more diseases in common with the known diagnoses of the active patient. Introducing the constraint that clustering patients in the training set must have at least two common diseases with the active (testing set) patient enforces stronger similarities for all patients influencing the predictions.

Essentially, we build a network of patients that are connected by at least two diseases and then perform collaborative filtering in this network. In theory, this helps to avoid the noise resulting from common diseases which introduce a very high number of weak influences.

The clustering provides an additional benefit by reducing the number of diseases predicted on, which both simplifies and improves the collaborative filtering results.

Flow Chart of the process of the project:


Expected Results:

The proposed solution contains the following results where each of them depicts the results obtained.
Table 1: Sample Data Collected


AFFX-BioB-5 at     1627.75    P           0.00010954
AFFX-BioB-M at     2560.42    P           4.43E-05
AFFX-BioB-3 at     1486.42    P           4.43E-05
AFFX-BioC-5 at     4732.74    P           4.43E-05
AFFX-BioC-3 at     6056.05    P           4.43E-05
AFFX-BioDn-5 at    11630.8    P           4.43E-05
AFFX-BioDn-3 at    15568.5    P           4.43E-05
AFFX-CreX-5 at     53870.5    P           5.17E-05
AFFX-CreX-3 at     58445      P           4.43E-05
AFFX-DapX-5 at     1.52818    A           0.804734
AFFX-DapX-M at     8.23282    A           0.699425
AFFX-DapX-3 at     8.22572    A           0.686277
AFFX-LysX-5 at     15.7748    A           0.440636
AFFX-LysX-M at     8.85365    A           0.876428
AFFX-LysX-3 at     19.9383    A           0.354404
AFFX-PheX-5 at     2.74111    A           0.941572
AFFX-PheX-M at     2.46388    A           0.989738
AFFX-PheX-3 at     60.9883    A           0.39692
AFFX-ThrX-5 at     8.07711    A           0.868639
AFFX-ThrX-M at     32.8075    A           0.396911
AFFX-ThrX-3 at     5.77291    A           0.921998
AFFX-TrpnX-5 at    56.7294    A           0.340661

Fig. 1 shows the dataset collected is been taken as an input. Then Fig.2 shows the data acquisition process where collected dataset been viewed by user. The viewed data will be filtered for clustering which is shown in Fig.3. Thereby the various levels of clustering will be processed. That has been processed in Fig.4(a), 4(b), 4(c).






Finally, a Clubcf algorithm is applied. The strategy will filter most of the unwanted knowledge with the formula in the initial section. Then using the Clubcf method the massive knowledge is analyzed using specific algorithms.

The Colloborative Filtering is a statistics Mining technique, which has been derived to approaches the input fed to the system which in turn provides resultant results to expect the future track or path of the sickness. This work has the distinctiveness in that it studies or analyses the complete disease and helps prevention by forecasting its future trend and it's specialized to save you from practice errors in Dental diseases prediction, in advance before it occur and make disease prevention a reality.

This work may be tested with the collective information of dental database archives to check the effectiveness of the system and based totally on the generated result; future prediction may be carried out to strategize treatment plans.


[1.] Fortunato Bianconi, Valerio Brunori, Paolo Valigi, Francesco La Rosa and Fabrizio Stracci, 2012. IEEE Transactions On Systems, Man, And Cybernetics--Part A: Systems And Humans, 42: 6.

[2.] Elena Gaura, John Kemp and James Brusey, 2013. Leveraging Knowledge From Physiological Data:OnBody Heat Stress Risk Prediction With Sensor Networks, IEEE Transactions On Biomedical Circuits And Systems, 7: 6.

[3.] Bum Ju Lee, Boncho Ku, 2014. Prediction of Fasting Plasma Glucose Status Using Anthropometric Measures for DiagnosingType 2 Diabetes, IEEE Journal Of Biomedical And Health Informatics, 18: 2.

[4.] Srinivas Laxminarayan, Mark J. Buller, William J. Tharion and Jaques Reifman, 2015. Human Core Temperature Prediction for Heat-Injury Prevention, IEEE Journal Of Biomedical And Health Informatics, 19(3).

(1) K. G. Rani Roopha Devi, (2) Dr. R. Mahendra Chozhan, (3) Dr. M. Karthika

(1) MPhil (CS) Research Scholar, NMSSVN COLLEGE, Nagamalai, Madurai.

(2) BDS, MIDA, MBA (Hosp Admin)USA, GDC (UK), MDBA, MSc Psychology, PGDCR, MRSH

(3) Assis. prof, MCA Department NMSSVN College, Nagamalai, Madurai.

Received 27 May 2016; Accepted 28 June 2016; Available 12 July 2016

Address For Correspondence:

K. G. Rani Roopha Devi MPhil (CS) Research Scholar, NMSSVN COLLEGE, Nagamalai, Madurai.
COPYRIGHT 2016 American-Eurasian Network for Scientific Information
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2016 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Devi, K.G. Rani Roopha; Chozhan, R. Mahendra; Karthika, M.
Publication:Advances in Natural and Applied Sciences
Date:Jun 30, 2016
Previous Article:A novel randomised cryptographic technique.
Next Article:Enhanced partition aware engine for efficient load balancing computing using fluid queue model.

Terms of use | Privacy policy | Copyright © 2020 Farlex, Inc. | Feedback | For webmasters