Performance Measurement of Classification Model with Data Oversampling in Supervised Learning Algorithms for Heart Disease

Authors

  • Anis Masruriyah Universitas Buana Perjuangan Karawang
  • Hilda Novita Universitas Buana Perjuangan Karawang
  • Cici Sukmawati Universitas Buana Perjuangan Karawang
  • Angga Ramadhan Universitas Buana Perjuangan Karawang
  • Siti Arif Universitas Buana Perjuangan Karawang
  • Budi Dermawan Universitas Singaperbangsa Karawang

DOI:

https://doi.org/10.31294/coscience.v4i1.2389

Keywords:

ADASYN, Classificcation, Heart Disease, SMOTE, Supervised Learning

Abstract

Heart disease remains a leading cause of death in Indonesia and worldwide. In the realm of data mining, class imbalance between heart disease and normal samples within datasets presents a significant challenge. This disparity can lead to model bias toward the majority class, resulting in suboptimal performance in identifying instances of heart disease. This study addresses this issue by implementing oversampling techniques, particularly Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADASYN). The findings reveal that models without oversampling achieve accuracy and precision exceeding 80%, but exhibit poor class separation performance. In contrast, models employing oversampling, despite experiencing reductions in accuracy and precision, enhance their ability to distinguish between heart disease and normal classes. The top-performing model utilizing the Random forest algorithm with SMOTE attains an AUC value of 0.868, signifying a significant improvement in class separation. These discoveries provide essential guidance for the development of more effective and accurate heart disease classification models. The utilization of oversampling techniques, such as SMOTE, proves to be an effective strategy for mitigating class imbalances in heart disease data mining. While accuracy and precision may decrease, the model's capability to identify heart disease becomes more reliable, with notable outcomes assessed using AUC. This research contributes significantly to enhancing efforts in heart disease prevention and treatment through sophisticated and sustainable data mining techniques.

 

Published

2024-01-31