Semi-Supervised Bullying Detection in Narrative Student Counselling Reports Using a Hybrid CNN-LSTM with Pseudo-Labelling

Authors

  • Suwarno Universitas Internasional Batam Author
  • Muthia Andini Universitas Internasional Batam Author
  • Mangapul Siahaan Universitas Internasional Batam Author

DOI:

https://doi.org/10.31294/ji.v13i1.11512

Keywords:

Bullying detection, Deep learning, Hybrid CNN-LSTM, Student counselling reports, Natural language processing

Abstract

Bullying incidents in schools are often documented in narrative student counselling reports containing informal language, emotional expressions, and contextual dependencies, which pose challenges for automated text classification, particularly under limited labeled data conditions. This study aims to develop a bullying detection model for narrative student counselling reports using a Hybrid CNN-LSTM architecture combined with a pseudo-labelling-based semi-supervised learning approach. The proposed model is trained through a two-stage process, consisting of pre-training on approximately 70,000 publicly available abusive-language texts and fine-tuning using 1,000 anonymized student counselling reports validated by guidance counsellors. Pseudo-labelling is employed to expand the training data while preserving domain relevance and adhering to ethical considerations. Experimental results show that the proposed model achieves an accuracy of 0.8698, a recall of 0.8570, and an F1-score of 0.7951. Although the precision value (0.7415) is relatively lower, higher recall is prioritized to reduce the risk of overlooking potential bullying cases in the school counselling context. Comparative analysis with Logistic Regression and Linear SVM indicates that the Hybrid CNN-LSTM model demonstrates more stable performance when processing longer narrative inputs that require contextual interpretation. This study contributes empirical evidence on the effectiveness of semi-supervised deep learning for bullying detection in low-resource, narrative student counselling data, a setting that remains underexplored in prior work.

Downloads

Download data is not yet available.

Author Biographies

  • Suwarno, Universitas Internasional Batam

    Information System Department

  • Muthia Andini, Universitas Internasional Batam

    Information System Department

  • Mangapul Siahaan, Universitas Internasional Batam

    Information System Department

References

Afriyani, S., Surono, S., & Solihin, I. M. (2024). Chi-Square Feature Selection with Pseudo-Labelling in Natural Language Processing. JTAM (Jurnal Teori Dan Aplikasi Matematika), 8(3), 896. https://doi.org/10.31764/jtam.v8i3.22751

Akar, F. (2024). Performance Analysis of NLP-Based Machine Learning Algorithms in Cyberbullying Detection. Erzincan Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 17(2), 445–459. https://doi.org/10.18185/erzifbed.1474112

Alhakim, A., Meriana, A., Besley, B., & Khoesasi, W. (2022). Prosiding National Conference for Community Service Project (NaCosPro). ProsidingNational Conference for Community Service Project (NaCosPro), 4(1), 104–114. http://journal.uib.ac.id/index.php/nacospro

Andini, M. (2025). Indonesian Multi-Source Bullying & Cyberbullying Datase. Github. https://github.com/muthiaandinini/Bullying/blob/main/Readme.md

Barrios-Cogollo, C., Gómez Gómez, J., & De-La-Hoz-Franco, E. (2025). Comparative Analysis of Classification Models for Cyberbullying Detection in University Environments. Applied Sciences (Switzerland), 15(18). https://doi.org/10.3390/app151810100

Chen, Q., Zhu, Y., & Chui, W. H. (2021). A Meta-Analysis on Effects of Parenting Programs on Bullying Prevention. In Trauma, Violence, and Abuse (Vol. 22, Issue 5, pp. 1209–1220). SAGE Publications Ltd. https://doi.org/10.1177/1524838020915619

Christian, Y., Wibowo, T., & Lyawati, M. (2024). Sentiment Analysis by Using Naïve Bayes Classification and Support Vector Machine, Study Case Sea Bank. Sinkron, 9(1), 258–275. https://doi.org/10.33395/sinkron.v9i1.13141

Crompton, H., & Burke, D. (2023). Artificial intelligence in higher education: the state of the field. International Journal of Educational Technology in Higher Education, 20(1). https://doi.org/10.1186/s41239-023-00392-8

Hafiza, A. A., & Setiawan, E. B. (2025). Enhancing Cyberbullying Detection on Platform “X” Using IndoBERT and Hybrid CNN-LSTM Model. Jurnal Teknik Informatika (Jutif), 6(2), 655–672. https://doi.org/10.52436/1.jutif.2025.6.2.4321

Hamapu, A. (2024, March 3). Polisi Ungkap Motif Pelaku Bully Remaja di Batam: Sakit Hati-Saling Ejek. DetikNews. https://news.detik.com/berita/d-7222364/polisi-ungkap-motif-pelaku-bully-remaja-di-batam-sakit-hati-saling-ejek

Handayani, S., Isnanto, R., & Warsito, B. (2025). Co-training pseudo-labeling for text classification with support vector machine and long short-term memory. IAES International Journal of Artificial Intelligence, 14(3), 2158–2168. https://doi.org/10.11591/ijai.v14.i3.pp2158-2168

Hedderich, M. A., Lange, L., Adel, H., Strötgen, J., & Klakow, D. (2021). A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios. https://aclanthology.org/2021.naacl-main.201/

Källmén, H., & Hallgren, M. (2021). Bullying at school and mental health problems among adolescents: a repeated cross-sectional study. Child and Adolescent Psychiatry and Mental Health, 15(1). https://doi.org/10.1186/s13034-021-00425-y

KPAI. (2023, November 29). Rakornas dan Ekspose KPAI 2023, Membangun Indonesia Bebas Kekerasan Terhadap Anak. KPAI. https://www.kpai.go.id/publikasi/rakornas-dan-ekspose-kpai-2023-membangun-indonesia-bebas-kekerasan-terhadap-anak

Le, H. T. H., Tran, N., Campbell, M. A., Gatton, M. L., Nguyen, H. T., & Dunne, M. P. (2019). Mental health problems both precede and follow bullying among adolescents and the effects differ by gender: A cross-lagged panel analysis of school-based longitudinal data in Vietnam. International Journal of Mental Health Systems, 13(1). https://doi.org/10.1186/s13033-019-0291-x

Maragheh, H. K., Gharehchopogh, F. S., Majidzadeh, K., & Sangar, A. B. (2024). A Hybrid Model Based on Convolutional Neural Network and Long Short-Term Memory for Multi-label Text Classification. Neural Processing Letters, 56(2). https://doi.org/10.1007/s11063-024-11500-8

Purba, M., Paisal, P., Pambudi Darmo, C., Noprisson, H., & Ayumi, V. (2024). Model Of Indonesian Cyberbullying Text Detection Using Modified Long Short-Term Memory. JITK (Jurnal Ilmu Pengetahuan Dan Teknologi Komputer), 10(1), 9–14. https://doi.org/10.33480/jitk.v10i1.5239

Rahamim, A., Uziel, G., Goldbraich, E., & Anaby-Tavor, A. (2022). Text Augmentation Using Dataset Reconstruction for Low-Resource Classification. Findings of the Association for Computational Linguistics: ACL 2023, 7389–7402.

Raj, C., Agarwal, A., Bharathy, G., Narayan, B., & Prasad, M. (2021). Cyberbullying detection: Hybrid models based on machine learning and natural language processing techniques. Electronics (Switzerland), 10(22). https://doi.org/10.3390/electronics10222810

Setiawan, Y., Ulva Maulidevi, N., Surendro, K., & Korespondensi, P. (2022). Deteksi Cyberbullying Dengan Mesin Pembelajaran Klasifikasi (Supervised Learning): Peluang Dan Tantangan. Jurnal Teknologi Informasi Dan Ilmu Komputer (JTIIK), 9. https://doi.org/10.25126/jtiik.202296747

Ullah, K., Ahsan, M., Hasanat, S. M., Haris, M., Yousaf, H., Raza, S. F., Tandon, R., Abid, S., & Ullah, Z. (2024). Short-Term Load Forecasting: A Comprehensive Review and Simulation Study with CNN-LSTM Hybrids Approach. IEEE Access, 12, 111858–111881. https://doi.org/10.1109/ACCESS.2024.3440631

UNICEF. (2021, June 25). Indonesia: Hundreds of children and young people call for kindness and an end to bullying. UNICEF. https://www.unicef.org/indonesia/press-releases/indonesia-hundreds-children-and-young-people-call-kindness-and-end-bullying?

Walsh, I., Fishman, D., Garcia-Gasulla, D., Titma, T., Pollastri, G., Capriotti, E., Casadio, R., Capella-Gutierrez, S., Cirillo, D., Del Conte, A., Dimopoulos, A. C., Del Angel, V. D., Dopazo, J., Fariselli, P., Fernández, J. M., Huber, F., Kreshuk, A., Lenaerts, T., Martelli, P. L., … Tosatto, S. C. E. (2021). DOME: recommendations for supervised machine learning validation in biology. In Nature Methods (Vol. 18, Issue 10, pp. 1122–1127). Nature Research. https://doi.org/10.1038/s41592-021-01205-4

Xu, P., Song, M., Liu, L., Liu, B., Sun, H., Jing, L., & Yu, J. (2024). Noisy Multi-Label Text Classification via Instance-Label Pair Correction. In Findings of the Association for Computational Linguistics: NAACL 2024 (pp. 1446–1458). Association for Computational Linguistic. https://doi.org/10.18653/v1/2024.findings-naacl.93

Yan, W., Yuan, Y., Yang, M., Zhang, P., & Peng, K. (2023). Detecting the risk of bullying victimization among adolescents: A large-scale machine learning approach. Computers in Human Behavior, 147. https://doi.org/10.1016/j.chb.2023.107817

Yang, L., Huang, B., Guo, S., Lin, Y., & Zhao, T. (2023). A Small-Sample Text Classification Model Based on Pseudo-Label Fusion Clustering Algorithm. Applied Sciences (Switzerland), 13(8). https://doi.org/10.3390/app13084716

Yang, X., Song, Z., King, I., & Xu, Z. (2023). A Survey on Deep Semi-Supervised Learning. IEEE Transactions on Knowledge and Data Engineering, 35(9), 8934–8954. https://doi.org/10.1109/TKDE.2022.3220219

Yu, S., & Zhao, X. (2021). The negative impact of bullying victimization on academic literacy and social integration: Evidence from 51 countries in PISA. Social Sciences and Humanities Open, 4(1). https://doi.org/10.1016/j.ssaho.2021.100151

Yuliandra, R. (2025). Sepanjang 2025, Kasus Kekerasan terhadap Perempuan dan Anak di Batam Capai 141 Kasus. In Batam Pos. https://batampos.co.id/2025/06/02/sepanjang-2025-kasus-kekerasan-terhadap-perempuan-dan-anak-di-batam-capai-141-kasus/

Zhang, S., Zhao, X., Zhou, T., & Kim, J. H. (2024). Do you have AI dependency? The roles of academic self-efficacy, academic stress, and performance expectations on problematic AI usage behavior. International Journal of Educational Technology in Higher Education, 21(1). https://doi.org/10.1186/s41239-024-00467-0

Zhang, Y., Jiang, M., Meng, Y., Zhang, Y., & Han, J. (2023). PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 12655. https://doi.org/10.18653/v1/2023.emnlp-main.780

Downloads

Published

2026-02-12

Issue

Section

Articles