A Risk Model Based Heart Disease Prediction Using Data Lake Architecture
- Resource Type
- Conference
- Authors
- M, Dilli Babu; Ramesh, K.; Renjith, P.N.; Prabha, B.
- Source
- 2022 First International Conference on Electrical, Electronics, Information and Communication Technologies (ICEEICT) Electrical, Electronics, Information and Communication Technologies (ICEEICT), 2022 First International Conference on. :1-6 Feb, 2022
- Subject
- Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Fields, Waves and Electromagnetics
General Topics for Engineers
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Heart
Machine learning algorithms
XML
Predictive models
Big Data applications
Feature extraction
Data models
EHR
PHR
KNN
Naïve Bayes classifier
Data Lake
Random forest Classifier
- Language
Recently there is a gradual increase of heart diseases at a very fast rate. It is extremely essential to provide strategies for earlier detection of heart disease in patients. With the advent of data lake approach, the health record data available from various providers can be enriched for predicting the risk factor of the heart disease among different patients. The Personal health record (PHR) available in the data lake along with the Electronic health record (EHR) is used to outline the cardiovascular patient risk model. The risk model is based on feature extracted from the physical health records. The preprocessing involves the PHR which is stored in XML format to be merged along with the patient master database. Here in this paper, 3 different classification algorithms are used. The machine learning algorithms KNN, Naïve Bayes' Classifier, Random Forest Classifier are employed on the data. These machine learning algorithms are employed for predicting and classifying the patients with heart disease. It is then followed by the risk score calculation. The calculated risk score clearly identifies the patients with most likely occurrence of heart disease. The proposed model predicts the accuracy of patients to have most likely heart diseases. The proposed model improved the prediction of heart disease in patients.