Machine Learning Models Powered by Big Data for Health Insurance Expense Forecasting


International Research Journal of Economics and Management Studies
© 2023 by IRJEMS
Volume 2  Issue 1
Year of Publication : 2023
Authors : Jaya Vardhani Mamidala, Sunil Jacob Enokkaren, Avinash Attipalli, Varun Bitkuri, Raghuvaran Kendyala, Jagan Kurma
irjems doi : 10.56472/25835238/IRJEMS-V2I1P143

Citation:

Jaya Vardhani Mamidala, Sunil Jacob Enokkaren, Avinash Attipalli, Varun Bitkuri, Raghuvaran Kendyala, Jagan Kurma. "Machine Learning Models Powered by Big Data for Health Insurance Expense Forecasting" International Research Journal of Economics and Management Studies, Vol. 2, No. 1, pp. 333-344, 2023.

Abstract:

Health insurance is the critical tool that can be adopted to reinforce the healthcare systems, especially among the low-income groups, and it does so by enhancing health outcomes, productivity and labor supply. Being aware of the cost of healthcare in terms of precise expense projections is important to policymakers and insurance agents. The research suggests a regime of machine learning to estimate health insurance expenses with the publicly available medical insurance cost prediction data hosted on Kaggle. The dataset consists of 2.7k records and their significant attributes such as age, BMI, and whether they smoke or not as well as their region. A thorough preprocessing procedure has been conducted; it includes data cleaning, removal of outliers, one-hot, and Z-score standardization. Gradient Boosting (GB) regression model was also applied in prediction of insurance expenses taking advantage of the ensemble learning behavior, which continuously minimizes errors in predictions. Predictive accurateness was high as the R 2 of the model was 92.0 and the Mean Squared Error (MSE) was given as 86.8. The outcomes confirm that Gradient Boosting is effective in fitting complex and non-linear relations that represent a viable source and mature scale solution to predicting personal healthcare expenditures.

References:

[1] R. Singh and A. Singh, “A Study of Health Insurance in India,” Int. J. Manag. IT Eng., vol. 10, no. 4, pp. 558–2249, 2020.
[2] S. H. Zolfani, R. Dehnavieh, A. Poursheikhali, O. Prentkovskis, and P. Khazaelpour, “Foresight Based on MADM-Based Scenarios’ Approach: A Case about Comprehensive Sustainable Health Financing Models,” Symmetry (Basel)., vol. 12, no. 1, p. 61, Dec. 2019, doi: 10.3390/sym12010061.
[3] A. R. Sarker et al., “Determinants of enrollment of informal sector workers in cooperative-based health scheme in Bangladesh,” PLoS One, 2017, doi: 10.1371/journal.pone.0181706.
[4] M. A. Hanifi et al., “Profile: The Chakaria Health and Demographic Surveillance System,” Int. J. Epidemiol., vol. 41, no. 3, pp. 667–675, Jun. 2012, doi: 10.1093/ije/dys089.
[5] W. Raghupathi and V. Raghupathi, “Big data analytics in healthcare: promise and potential,” Heal. Inf. Sci. Syst., vol. 2, no. 1, p. 3, Dec. 2014, doi: 10.1186/2047-2501-2-3.
[6] J. Archenaa and E. A. M. Anita, “A Survey of Big Data Analytics in Healthcare and Government,” Procedia Comput. Sci., vol. 50, pp. 408–413, 2015, doi: 10.1016/j.procs.2015.04.021.
[7] T. G. McGuire, “Demand for Health Insurance,” Handb. Heal. Econ., vol. 2, pp. 317–396, 2011, doi: 10.1016/B978-0-444-53592-4.00005-0.
[8] A. J. Trujillo, F. Ruiz, J. F. P. Bridges, J. L. Amaya, C. Buttorff, and A. M. Quiroga, “Understanding Consumer Preferences in the Context of Managed Competition,” Appl. Health Econ. Health Policy, vol. 10, no. 2, pp. 99–111, Mar. 2012, doi: 10.2165/11594820-000000000-00000.
[9] G. F. Anderson et al., “Attributes common to programs that successfully treat high-need, high-cost individuals,” Am. J. Manag. Care, 2015.
[10] E. Owusu-sekyere and D. A. Bagah, “Towards a Sustainable Health Care Financing in Ghana: Is the National Health Insurance the Solution ?,” vol. 4, no. 5, pp. 185–194, 2014, doi: 10.5923/j.phr.20140405.06.
[11] W. H. O. (World H. Organization), “Fifty-Eighth World Health Assembly,” Wha58/2005/Rec/1, no. May, pp. 1–159, 2005.
[12] S. D, S. V, and J. D, “Application of Machine Learning Techniques in Healthcare,” 2020, pp. 289–304. doi: 10.4018/978-1-5225-9902-9.ch015.
[13] N. A. Akbar, A. Sunyoto, M. Rudyanto Arief, and W. Caesarendra, “Improvement of decision tree classifier accuracy for healthcare insurance fraud prediction by using Extreme Gradient Boosting algorithm,” in 2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), IEEE, Nov. 2020, pp. 110–114. doi: 10.1109/ICIMCIS51567.2020.9354286.
[14] S. R. Mahardini and M. Dachyar, “The critical improvement of hospital claim fulfillment towards public insurance, using BPR and MIS approach,” in Proceedings of ICAE 2020 - 3rd International Conference on Applied Engineering, 2020. doi: 10.1109/ICAE50557.2020.9350551.
[15] M. A. Laagu and A. S. Arifin, “Analysis the Issue of Increasing National Health Insurance (BPJS Kesehatan) Rates through Community Perspectives on Social Media: A Case Study of Drone Emprit,” in 2020 International Conference on Smart Technology and Applications (ICoSTA), IEEE, Feb. 2020, pp. 1–7. doi: 10.1109/ICoSTA48221.2020.1570615599.
[16] Y. Zhu, H. Wu, and M. D. Wang, “Feature Exploration and Causal Inference on Mortality of Epilepsy Patients Using Insurance Claims Data,” in 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), 2019, pp. 1–4. doi: 10.1109/BHI.2019.8834638.
[17] A. R. Rao and D. Clarke, “A comparison of models to predict medical procedure costs from open public healthcare data,” in Proceedings of the International Joint Conference on Neural Networks, 2018. doi: 10.1109/IJCNN.2018.8489257.
[18] E. Peters and N. Maxemchuk, “A Privacy-Preserving Distributed Medical Insurance Claim Clearinghouse & EHR Application,” in Proceedings - 2017 IEEE 2nd International Conference on Connected Health: Applications, Systems and Engineering Technologies, CHASE 2017, 2017. doi: 10.1109/CHASE.2017.62.
[19] H. Peng and M. You, “The Health Care Fraud Detection Using the Pharmacopoeia Spectrum Tree and Neural Network Analytic Contribution Hierarchy Process,” in 2016 IEEE Trustcom/BigDataSE/ISPA, IEEE, Aug. 2016, pp. 2006–2011. doi: 10.1109/TrustCom 2016.0306.
[20] J. Rampal, P. Singh, R. Kaur, and K. Singh, “An Ensemble Model to predict Health Insurance Premium using Machine Learning,” Journal-Dogorangsang.in, vol. 10, no. 8, pp. 191–199, 2020.
[21] X. Zhao, Y. Zhang, S. Xie, Q. Qin, S. Wu, and B. Luo, “Outlier Detection Based on Residual Histogram Preference for Geometric Multi-Model Fitting,” Sensors, vol. 20, no. 11, 2020, doi: 10.3390/s20113037.
[22] A. Natekin and A. Knoll, “Gradient boosting machines, a tutorial,” Front. Neurorobot., 2013, doi: 10.3389/fnbot 2013.00021.
[23] A. Stazio, J. G. Victores, D. Estevez, and C. Balaguer, “A Study on Machine Vision Techniques for the Inspection of Health Personnel’s Protective Suits for the Treatment of Patients in Extreme Isolation,” Electronics, vol. 8, no. 7, 2019, doi: 10.3390/electronics8070743.
[24] M. A. Morid, K. Kawamoto, T. Ault, J. Dorius, and S. Abdelrahman, “Supervised Learning Methods for Predicting Healthcare Costs: Systematic Literature Review and Empirical Evaluation,” AMIA ... Annu. Symp. proceedings. AMIA Symp., 2017.
[25] C. Yang, C. Delcher, E. Shenkman, and S. Ranka, “Machine learning approaches for predicting high cost high need patient expenditures in health care,” Biomed. Eng. Online, vol. 17, no. S1, pp. 81–118, Nov. 2018, doi: 10.1186/s12938-018-0568-3.
[26] Rajiv, C., Mukund Sai, V. T., Venkataswamy Naidu, G., Sriram, P., & Mitra, P. (2022). Leveraging Big Datasets for Machine Learning-Based Anomaly Detection in Cybersecurity Network Traffic. J Contemp Edu Theo Artific Intel: JCETAI/102.
[27] Sandeep Kumar, C., Srikanth Reddy, V., Ram Mohan, P., Bhavana, K., & Ajay Babu, K. (2022). Efficient Machine Learning Approaches for Intrusion Identification of DDoS Attacks in Cloud Networks. J Contemp Edu Theo Artific Intel: JCETAI/101.
[28] Bhumireddy, J. R., Chalasani, R., Tyagadurgam, M. S. V., Gangineni, V. N., Pabbineedi, S., & Penmetsa, M. (2020). Big Data-Driven Time Series Forecasting for Financial Market Prediction: Deep Learning Models. Journal of Artificial Intelligence and Big Data, 2(1), 153–164.DOI: 10.31586/jaibd.2022.1341
[29] Nandiraju, S. K. K., Chundru, S. K., Vangala, S. R., Polam, R. M., Kamarthapu, B., & Kakani, A. B. (2022). Advance of AI-Based Predictive Models for Diagnosis of Alzheimer’s Disease (AD) in Healthcare. Journal of Artificial Intelligence and Big Data, 2(1), 141–152.DOI: 10.31586/jaibd.2022.1340
[30] Tyagadurgam, M. S. V., Gangineni, V. N., Pabbineedi, S., Penmetsa, M., Bhumireddy, J. R., & Chalasani, R. (2022). Designing an Intelligent Cybersecurity Intrusion Identify Framework Using Advanced Machine Learning Models in Cloud Computing. Universal Library of Engineering Technology, (Issue).
[31] Vangala, S. R., Polam, R. M., Kamarthapu, B., Kakani, A. B., Nandiraju, S. K. K., & Chundru, S. K. (2022). Leveraging Artificial Intelligence Algorithms for Risk Prediction in Life Insurance Service Industry. Available at SSRN 5459694.
[32] Polam, R. M., Kamarthapu, B., Kakani, A. B., Nandiraju, S. K. K., Chundru, S. K., & Vangala, S. R. (2021). Data Security in Cloud Computing: Encryption, Zero Trust, and Homomorphic Encryption. International Journal of Emerging Trends in Computer Science and Information Technology, 2(3), 70-80.
[33] Gangineni, V. N., Pabbineedi, S., Penmetsa, M., Bhumireddy, J. R., Chalasani, R., & Tyagadurgam, M. S. V. Efficient Framework for Forecasting Auto Insurance Claims Utilizing Machine Learning Based Data-Driven Methodologies. International Research Journal of Economics and Management Studies IRJEMS, 1(2).
[34] Vattikonda, N., Gupta, A. K., Polu, A. R., Narra, B., Buddula, D. V. K. R., & Patchipulusu, H. H. S. (2022). Blockchain Technology in Supply Chain and Logistics: A Comprehensive Review of Applications, Challenges, and Innovations. International Journal of Emerging Research in Engineering and Technology, 3(3), 99-107.
[35] Narra, B., Vattikonda, N., Gupta, A. K., Buddula, D. V. K. R., Patchipulusu, H. H. S., & Polu, A. R. (2022). Revolutionizing Marketing Analytics: A Data-Driven Machine Learning Framework for Churn Prediction. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 3(2), 112-121.
[36] Polu, A. R., Narra, B., Buddula, D. V. K. R., Patchipulusu, H. H. S., Vattikonda, N., & Gupta, A. K. BLOCKCHAIN TECHNOLOGY AS A TOOL FOR CYBERSECURITY: STRENGTHS, WEAKNESSES, AND POTENTIAL APPLICATIONS.
[37] Bhumireddy, J. R., Chalasani, R., Tyagadurgam, M. S. V., Gangineni, V. N., Pabbineedi, S., & Penmetsa, M. (2022). Big Data-Driven Time Series Forecasting for Financial Market Prediction: Deep Learning Models. Journal of Artificial Intelligence and Big Data, 2(1), 153–164.DOI: 10.31586/jaibd.2022.1341
[38] Nandiraju, S. K. K., Chundru, S. K., Vangala, S. R., Polam, R. M., Kamarthapu, B., & Kakani, A. B. (2022). Advance of AI-Based Predictive Models for Diagnosis of Alzheimer’s Disease (AD) in Healthcare. Journal of Artificial Intelligence and Big Data, 2(1), 141–152.DOI: 10.31586/jaibd.2022.1340
[39] HK, K. (2020). Design of Efficient FSM Based 3D Network on Chip Architecture. INTERNATIONAL JOURNAL OF ENGINEERING, 68(10), 67-73.
[40] Krutthika, H. K. (2019, October). Modeling of Data Delivery Modes of Next Generation SOC-NOC Router. In 2019 Global Conference for Advancement in Technology (GCAT) (pp. 1-6). IEEE.
[41] Ajay, S., Satya Sai Krishna Mohan G, Rao, S. S., Shaunak, S. B., Krutthika, H. K., Ananda, Y. R., & Jose, J. (2018). Source Hotspot Management in a Mesh Network on Chip. In VDAT (pp. 619-630).
[42] Nair, T. R., & Krutthika, H. K. (2010). An Architectural Approach for Decoding and Distributing Functions in FPUs in a Functional Processor System. arXiv preprint arXiv:1001.3781.
[43] Gopalakrishnan Nair, T. R., & Krutthika, H. K. (2010). An Architectural Approach for Decoding and Distributing Functions in FPUs in a Functional Processor System. arXiv e-prints, arXiv-1001.
[44] Krutthika H. K. & A.R. Aswatha. (2021). Implementation and analysis of congestion prevention and fault tolerance in network on chip. Journal of Tianjin University Science and Technology, 54(11), 213–231. https://doi.org/10.5281/zenodo.5746712
[45] Krutthika H. K. & A.R. Aswatha. (2020). FPGA-based design and architecture of network-on-chip router for efficient data propagation. IIOAB Journal, 11(S2), 7–25.
[46] Krutthika H. K. & A.R. Aswatha (2020). Design of efficient FSM-based 3D network-on-chip architecture. International Journal of Engineering Trends and Technology, 68(10), 67–73. https://doi.org/10.14445/22315381/IJETT-V68I10P212
[47] Krutthika H. K. & Rajashekhara R. (2019). Network-on-chip: A survey on router design and algorithms. International Journal of Recent Technology and Engineering, 7(6), 1687–1691. https://doi.org/10.35940/ijrte.F2131.037619
[48] Polam, R. M., Kamarthapu, B., Kakani, A. B., Nandiraju, S. K. K., Chundru, S. K., & Vangala, S. R. (2021). Big Text Data Analysis for Sentiment Classification in Product Reviews Using Advanced Large Language Models. International Journal of AI, BigData, Computational and Management Studies, 2(2), 55-65.
[49] Gangineni, V. N., Tyagadurgam, M. S. V., Chalasani, R., Bhumireddy, J. R., & Penmetsa, M. (2021). Strengthening Cybersecurity Governance: The Impact of Firewalls on Risk Management. International Journal of AI, BigData, Computational and Management Studies, 2, 10-63282.
[50] Pabbineedi, S., Penmetsa, M., Bhumireddy, J. R., Chalasani, R., Tyagadurgam, M. S. V., & Gangineni, V. N. (2021). An Advanced Machine Learning Models Design for Fraud Identification in Healthcare Insurance. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 2(1), 26-34.
[51] Kamarthapu, B., Kakani, A. B., Nandiraju, S. K. K., Chundru, S. K., Vangala, S. R., & Polam, R. M. (2021). Advanced Machine Learning Models for Detecting and Classifying Financial Fraud in Big Data-Driven. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 2(3), 39-46.
[52] Tyagadurgam, M. S. V., Gangineni, V. N., Pabbineedi, S., Penmetsa, M., Bhumireddy, J. R., & Chalasani, R. (2021). Enhancing IoT (Internet of Things) Security Through Intelligent Intrusion Detection Using ML Models. International Journal of Emerging Research in Engineering and Technology, 2(1), 27-36.
[53] Vangala, S. R., Polam, R. M., Kamarthapu, B., Kakani, A. B., Nandiraju, S. K. K., & Chundru, S. K. (2021). Smart Healthcare: Machine Learning-Based Classification of Epileptic Seizure Disease Using EEG Signal Analysis. International Journal of Emerging Research in Engineering and Technology, 2(3), 61-70.
[54] Kakani, A. B., Nandiraju, S. K. K., Chundru, S. K., Vangala, S. R., Polam, R. M., & Kamarthapu, B. (2021). Big Data and Predictive Analytics for Customer Retention: Exploring the Role of Machine Learning in E-Commerce. International Journal of Emerging Trends in Computer Science and Information Technology, 2(2), 26-34.
[55] Penmetsa, M., Bhumireddy, J. R., Chalasani, R., Tyagadurgam, M. S. V., Gangineni, V. N., & Pabbineedi, S. (2021). Next-Generation Cybersecurity: The Role of AI and Quantum Computing in Threat Detection. International Journal of Emerging Trends in Computer Science and Information Technology, 2(4), 54-61.
[56] Polu, A. R., Vattikonda, N., Gupta, A., Patchipulusu, H., Buddula, D. V. K. R., & Narra, B. (2021). Enhancing Marketing Analytics in Online Retailing through Machine Learning Classification Techniques. Available at SSRN 5297803.
[57] Kalla, D. (2022). AI-Powered Driver Behavior Analysis and Accident Prevention Systems for Advanced Driver Assistance. International Journal of Scientific Research and Modern Technology (IJSRMT) Volume, 1.
[58] Dinesh, K. (2022). Navigating the link between internet user attitudes and cybersecurity awareness in the era of phishing challenges. International Advanced Research Journal in Science, Engineering and Technology.
[59] Kalla, D., Kuraku, D. S., & Samaah, F. (2021). Enhancing cyber security by predicting malwares using supervised machine learning models. International Journal of Computing and Artificial Intelligence, 2(2), 55-62.
[60] Katari, A., & Kalla, D. (2021). Cost Optimization in Cloud-Based Financial Data Lakes: Techniques and Case Studies. ESP Journal of Engineering & Technology Advancements (ESP-JETA), 1(1), 150-157.
[61] Kalla, D., Smith, N., Samaah, F., & Polimetla, K. (2021). Facial Emotion and Sentiment Detection Using Convolutional Neural Network. Indian Journal of Artificial Intelligence Research (INDJAIR), 1(1), 1-13.
[62] Polu, A. R., Buddula, D. V. K. R., Narra, B., Gupta, A., Vattikonda, N., & Patchipulusu, H. (2021). Evolution of AI in Software Development and Cybersecurity: Unifying Automation, Innovation, and Protection in the Digital Age. Available at SSRN 5266517.

Keywords:

Health Insurance, Cost Prediction, Machine Learning, Healthcare Analytics, Policy Feedback, Insurance Expenses.