Komparasi Kinerja Algoritma Logistic Regression, Random Forest, dan Naïve Bayes dalam Klasifikasi Risiko Kredit

Authors

Keywords:

klasifikasi, komparasi, risiko kredit, supervised learning, classification, comparison, credit risk

Abstract

Klasifikasi risiko kredit merupakan proses penting dalam industri keuangan untuk mengidentifikasi nasabah yang berpotensi mengalami gagal bayar. Penelitian ini bertujuan membandingkan kinerja algoritma supervised learning antara lain Logistic Regression, Random Forest, dan Naïve Bayes. Pada klasifikasi risiko kredit ini menggunakan dataset German Credit yang terdiri dari 1.000 data nasabah. Tahapan penelitian meliputi data preprocessing, normalisasi data, pembagian data latih dan data uji dengan rasio 80:20, pemodelan menggunakan KNIME Analytics Platform 5.9.0, serta evaluasi menggunakan Accuracy, Precision, Recall, F1-Score, Confusion Matrix, dan Cohen's Kappa. Fokus evaluasi penelitian diarahkan pada kemampuan model dalam mendeteksi kelas bad sebagai representasi nasabah berisiko. Hasil pengujian menunjukkan bahwa Random Forest memberikan performa terbaik dengan akurasi 81,50%, precision 0,94, recall 0,82, F1-Score 0,87, dan Cohen's Kappa 0,51. Logistic Regression memperoleh akurasi 76,10% dan Naïve Bayes sebesar 75,50%. Temuan ini menunjukkan bahwa Random Forest lebih efektif dalam mengidentifikasi nasabah berisiko dibandingkan dua algoritma lainnya. Kontribusi penelitian terletak pada pendekatan evaluasi berbasis risiko yang menekankan kemampuan deteksi kelas bad sebagai dasar rekomendasi model klasifikasi risiko kredit.

 

Credit risk classification is a critical process in the financial industry for identifying customers at risk of default. This study aims to compare the performance of supervised learning algorithms, including Logistic Regression, Random Forest, and Naïve Bayes. For this credit risk classification, the German Credit dataset, consisting of 1,000 customer records, was used. The research stages include data preprocessing, data normalization, splitting the data into training and test sets at an 80:20 ratio, modeling using the KNIME Analytics Platform 5.9.0, and evaluation using Accuracy, Precision, Recall, F1-Score, Confusion Matrix, and Cohen’s Kappa. The evaluation focused on the models’ ability to detect the “bad” class, which represents high-risk customers. The test results showed that Random Forest delivered the best performance with an accuracy of 81.50%, precision of 0.94, recall of 0.82, F1-Score of 0.87, and Cohen’s Kappa of 0.51. Logistic Regression achieved an accuracy of 76.10%, and Naïve Bayes achieved 75.50%. These findings indicate that Random Forest is more effective at identifying at-risk customers than the other two algorithms. The contribution of this research lies in its risk-based evaluation approach, which emphasizes the ability to detect the bad class as the basis for credit risk classification model recommendations.

Downloads

Published

2026-07-03

Issue

Section

Articles