Penerapan: Penerapan Metode SMOTE Untuk Mengatasi Imbalanced Data Pada Klasifikasi Ujaran Kebencian

Ridwan Ridwan; Eni Heni Hermaliani; Muji Ernawati

doi:10.31294/coscience.v4i1.2990

Authors

Ridwan Ridwan Universitas Nusa Mandiri
Eni Heni Hermaliani Universitas Nusa Mandiri
Muji Ernawati Universitas Nusa Mandiri

DOI:

https://doi.org/10.31294/coscience.v4i1.2990

Keywords:

Imbalanced Data, Oversampling, SMOTE, Hate Speechs

Abstract

Hate speech is the spread of hatred towards individuals or groups on the basis of ethnicity, religion, race, and other characteristics that can lead to discrimination, violence, and social conflict. Unbalanced data can cause negative results in classification results. The Synthetic Minority Oversampling Technique (SMOTE) method is used to deal with unbalanced data. Feature extraction uses Bag of Words and TD-IDF, then the training data are oversampled using the SMOTE, SVM-SMOTE, Kmeans-SMOTE, and Borderline-SMOTE methods. This classification uses the Random Forest, Support Vector Machine, Logistic Regression, and Naive Bayes algorithms using Twitter data. The research results show that the application of the Borderline-SMOTE method to handle imbalanced data produces better performance than other SMOTE methods based on accuracy, recall,precision and F1-Score values with respective values of 84.09%, 85.25%, 84,55% and 81.16%. The Random Forest algorithm produces higher performance values than other algorithms.