eArticles

Home

eArticles

검색결과 돌아가기

검색화면

Export 프린트

Machine Learning Models For Patient Medical Cost Prediction and Trend Analysis Using Open Healthcare Data

Resource Type: Conference
Authors: Ravishankar Rao, A.; Jain, Raunak; Singh, Mrityunjai; Garg, Rahul
Source: 2023 IEEE 3rd International Conference on Electronic Communications, Internet of Things and Big Data (ICEIB) Electronic Communications, Internet of Things and Big Data (ICEIB), 2023 IEEE 3rd International Conference on. :292-296 Apr, 2023
Subject: Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Costs
Codes
Computational modeling
Catheterization
Machine learning
Predictive models
Big Data
machine learning
prediction
length of stay
healthcare
Language

Online Access

Full Text (IEEE)

초록

We analyzed de-identified patient data from the New York State SPARCS system, consisting of 9 million patient records from 2016 through 2019. Each patient record contains 35 features including patient demographics, clinical diagnoses, length of stay, and total cost. We used big data and machine learning techniques, Python Pandas libraries, and the SciKit Learn toolkit. We examined trends in the cost distributions and identified the diagnosis codes that correspond to the largest changes. The distributions are long-tailed and have a peak near USD 9000. We compared cost samples from 2016−2019 and applied the Kolmogorov-Smirnov test to show that the samples arise from different statistical distributions (p-value < 0.0001). The dataset contained 305 unique clinical diagnoses. Of these, 275 showed positive increases in cost, which represents 90% of the categories. The largest cost increases were for "EXTENSIVE 3RD DEGREE OR FULL THICKNESS BURNS" with a 96.5% increase and “CARDIAC CATHETERIZATION FOR OTHER NON-CORONARY CONDITIONS” with a 96% increase. We developed models to predict costs using machine learning, too. The model input consisted of patient demographics and diagnosis codes. The model output was the predicted cost for treatment. We investigated the Catboost regression model and computed the R2 score for performance evaluation. We achieved R2 values in the range of 0.59 to 0.88. The higher R2 value is obtained when the length of stay is used as an input feature. Though the cost distributions were different from 2016−2019, the R2 scores for the proposed models for the years 2016 through 2019 were consistent. The methodology in this study helps providers and policymakers predict healthcare costs for planning purposes better. The trends in the costs and the identification of diagnostic codes associated with large cost increases guide expenditure in the most needed area. The results suggest that the age group "70 and older" benefits from targeted interventions.

공지

DAU Library

eArticles

요약정보

Machine Learning Models For Patient Medical Cost Prediction and Trend Analysis Using Open Healthcare Data

Online Access

초록