Semi-Supervised Bullying Detection in Narrative Student Counselling Reports Using a Hybrid CNN-LSTM with Pseudo-Labelling
DOI:
https://doi.org/10.31294/ji.v13i1.11512Keywords:
Bullying detection, Deep learning, Hybrid CNN-LSTM, Student counselling reports, Natural language processingAbstract
Bullying incidents in schools are often documented in narrative student counselling reports containing informal language, emotional expressions, and contextual dependencies, which pose challenges for automated text classification, particularly under limited labeled data conditions. This study aims to develop a bullying detection model for narrative student counselling reports using a Hybrid CNN-LSTM architecture combined with a pseudo-labelling-based semi-supervised learning approach. The proposed model is trained through a two-stage process, consisting of pre-training on approximately 70,000 publicly available abusive-language texts and fine-tuning using 1,000 anonymized student counselling reports validated by guidance counsellors. Pseudo-labelling is employed to expand the training data while preserving domain relevance and adhering to ethical considerations. Experimental results show that the proposed model achieves an accuracy of 0.8698, a recall of 0.8570, and an F1-score of 0.7951. Although the precision value (0.7415) is relatively lower, higher recall is prioritized to reduce the risk of overlooking potential bullying cases in the school counselling context. Comparative analysis with Logistic Regression and Linear SVM indicates that the Hybrid CNN-LSTM model demonstrates more stable performance when processing longer narrative inputs that require contextual interpretation. This study contributes empirical evidence on the effectiveness of semi-supervised deep learning for bullying detection in low-resource, narrative student counselling data, a setting that remains underexplored in prior work.
Downloads
References
Afriyani, S., Surono, S., & Solihin, I. M. (2024). Chi-Square Feature Selection with Pseudo-Labelling in Natural Language Processing. JTAM (Jurnal Teori Dan Aplikasi Matematika), 8(3), 896. https://doi.org/10.31764/jtam.v8i3.22751
Akar, F. (2024). Performance Analysis of NLP-Based Machine Learning Algorithms in Cyberbullying Detection. Erzincan Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 17(2), 445–459. https://doi.org/10.18185/erzifbed.1474112
Alhakim, A., Meriana, A., Besley, B., & Khoesasi, W. (2022). Prosiding National Conference for Community Service Project (NaCosPro). ProsidingNational Conference for Community Service Project (NaCosPro), 4(1), 104–114. http://journal.uib.ac.id/index.php/nacospro
Andini, M. (2025). Indonesian Multi-Source Bullying & Cyberbullying Datase. Github. https://github.com/muthiaandinini/Bullying/blob/main/Readme.md
Barrios-Cogollo, C., Gómez Gómez, J., & De-La-Hoz-Franco, E. (2025). Comparative Analysis of Classification Models for Cyberbullying Detection in University Environments. Applied Sciences (Switzerland), 15(18). https://doi.org/10.3390/app151810100
Chen, Q., Zhu, Y., & Chui, W. H. (2021). A Meta-Analysis on Effects of Parenting Programs on Bullying Prevention. In Trauma, Violence, and Abuse (Vol. 22, Issue 5, pp. 1209–1220). SAGE Publications Ltd. https://doi.org/10.1177/1524838020915619
Christian, Y., Wibowo, T., & Lyawati, M. (2024). Sentiment Analysis by Using Naïve Bayes Classification and Support Vector Machine, Study Case Sea Bank. Sinkron, 9(1), 258–275. https://doi.org/10.33395/sinkron.v9i1.13141
Crompton, H., & Burke, D. (2023). Artificial intelligence in higher education: the state of the field. International Journal of Educational Technology in Higher Education, 20(1). https://doi.org/10.1186/s41239-023-00392-8
Hafiza, A. A., & Setiawan, E. B. (2025). Enhancing Cyberbullying Detection on Platform “X” Using IndoBERT and Hybrid CNN-LSTM Model. Jurnal Teknik Informatika (Jutif), 6(2), 655–672. https://doi.org/10.52436/1.jutif.2025.6.2.4321
Hamapu, A. (2024, March 3). Polisi Ungkap Motif Pelaku Bully Remaja di Batam: Sakit Hati-Saling Ejek. DetikNews. https://news.detik.com/berita/d-7222364/polisi-ungkap-motif-pelaku-bully-remaja-di-batam-sakit-hati-saling-ejek
Handayani, S., Isnanto, R., & Warsito, B. (2025). Co-training pseudo-labeling for text classification with support vector machine and long short-term memory. IAES International Journal of Artificial Intelligence, 14(3), 2158–2168. https://doi.org/10.11591/ijai.v14.i3.pp2158-2168
Hedderich, M. A., Lange, L., Adel, H., Strötgen, J., & Klakow, D. (2021). A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios. https://aclanthology.org/2021.naacl-main.201/
Källmén, H., & Hallgren, M. (2021). Bullying at school and mental health problems among adolescents: a repeated cross-sectional study. Child and Adolescent Psychiatry and Mental Health, 15(1). https://doi.org/10.1186/s13034-021-00425-y
KPAI. (2023, November 29). Rakornas dan Ekspose KPAI 2023, Membangun Indonesia Bebas Kekerasan Terhadap Anak. KPAI. https://www.kpai.go.id/publikasi/rakornas-dan-ekspose-kpai-2023-membangun-indonesia-bebas-kekerasan-terhadap-anak
Le, H. T. H., Tran, N., Campbell, M. A., Gatton, M. L., Nguyen, H. T., & Dunne, M. P. (2019). Mental health problems both precede and follow bullying among adolescents and the effects differ by gender: A cross-lagged panel analysis of school-based longitudinal data in Vietnam. International Journal of Mental Health Systems, 13(1). https://doi.org/10.1186/s13033-019-0291-x
Maragheh, H. K., Gharehchopogh, F. S., Majidzadeh, K., & Sangar, A. B. (2024). A Hybrid Model Based on Convolutional Neural Network and Long Short-Term Memory for Multi-label Text Classification. Neural Processing Letters, 56(2). https://doi.org/10.1007/s11063-024-11500-8
Purba, M., Paisal, P., Pambudi Darmo, C., Noprisson, H., & Ayumi, V. (2024). Model Of Indonesian Cyberbullying Text Detection Using Modified Long Short-Term Memory. JITK (Jurnal Ilmu Pengetahuan Dan Teknologi Komputer), 10(1), 9–14. https://doi.org/10.33480/jitk.v10i1.5239
Rahamim, A., Uziel, G., Goldbraich, E., & Anaby-Tavor, A. (2022). Text Augmentation Using Dataset Reconstruction for Low-Resource Classification. Findings of the Association for Computational Linguistics: ACL 2023, 7389–7402.
Raj, C., Agarwal, A., Bharathy, G., Narayan, B., & Prasad, M. (2021). Cyberbullying detection: Hybrid models based on machine learning and natural language processing techniques. Electronics (Switzerland), 10(22). https://doi.org/10.3390/electronics10222810
Setiawan, Y., Ulva Maulidevi, N., Surendro, K., & Korespondensi, P. (2022). Deteksi Cyberbullying Dengan Mesin Pembelajaran Klasifikasi (Supervised Learning): Peluang Dan Tantangan. Jurnal Teknologi Informasi Dan Ilmu Komputer (JTIIK), 9. https://doi.org/10.25126/jtiik.202296747
Ullah, K., Ahsan, M., Hasanat, S. M., Haris, M., Yousaf, H., Raza, S. F., Tandon, R., Abid, S., & Ullah, Z. (2024). Short-Term Load Forecasting: A Comprehensive Review and Simulation Study with CNN-LSTM Hybrids Approach. IEEE Access, 12, 111858–111881. https://doi.org/10.1109/ACCESS.2024.3440631
UNICEF. (2021, June 25). Indonesia: Hundreds of children and young people call for kindness and an end to bullying. UNICEF. https://www.unicef.org/indonesia/press-releases/indonesia-hundreds-children-and-young-people-call-kindness-and-end-bullying?
Walsh, I., Fishman, D., Garcia-Gasulla, D., Titma, T., Pollastri, G., Capriotti, E., Casadio, R., Capella-Gutierrez, S., Cirillo, D., Del Conte, A., Dimopoulos, A. C., Del Angel, V. D., Dopazo, J., Fariselli, P., Fernández, J. M., Huber, F., Kreshuk, A., Lenaerts, T., Martelli, P. L., … Tosatto, S. C. E. (2021). DOME: recommendations for supervised machine learning validation in biology. In Nature Methods (Vol. 18, Issue 10, pp. 1122–1127). Nature Research. https://doi.org/10.1038/s41592-021-01205-4
Xu, P., Song, M., Liu, L., Liu, B., Sun, H., Jing, L., & Yu, J. (2024). Noisy Multi-Label Text Classification via Instance-Label Pair Correction. In Findings of the Association for Computational Linguistics: NAACL 2024 (pp. 1446–1458). Association for Computational Linguistic. https://doi.org/10.18653/v1/2024.findings-naacl.93
Yan, W., Yuan, Y., Yang, M., Zhang, P., & Peng, K. (2023). Detecting the risk of bullying victimization among adolescents: A large-scale machine learning approach. Computers in Human Behavior, 147. https://doi.org/10.1016/j.chb.2023.107817
Yang, L., Huang, B., Guo, S., Lin, Y., & Zhao, T. (2023). A Small-Sample Text Classification Model Based on Pseudo-Label Fusion Clustering Algorithm. Applied Sciences (Switzerland), 13(8). https://doi.org/10.3390/app13084716
Yang, X., Song, Z., King, I., & Xu, Z. (2023). A Survey on Deep Semi-Supervised Learning. IEEE Transactions on Knowledge and Data Engineering, 35(9), 8934–8954. https://doi.org/10.1109/TKDE.2022.3220219
Yu, S., & Zhao, X. (2021). The negative impact of bullying victimization on academic literacy and social integration: Evidence from 51 countries in PISA. Social Sciences and Humanities Open, 4(1). https://doi.org/10.1016/j.ssaho.2021.100151
Yuliandra, R. (2025). Sepanjang 2025, Kasus Kekerasan terhadap Perempuan dan Anak di Batam Capai 141 Kasus. In Batam Pos. https://batampos.co.id/2025/06/02/sepanjang-2025-kasus-kekerasan-terhadap-perempuan-dan-anak-di-batam-capai-141-kasus/
Zhang, S., Zhao, X., Zhou, T., & Kim, J. H. (2024). Do you have AI dependency? The roles of academic self-efficacy, academic stress, and performance expectations on problematic AI usage behavior. International Journal of Educational Technology in Higher Education, 21(1). https://doi.org/10.1186/s41239-024-00467-0
Zhang, Y., Jiang, M., Meng, Y., Zhang, Y., & Han, J. (2023). PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 12655. https://doi.org/10.18653/v1/2023.emnlp-main.780
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Suwarno, Muthia Andini, Mangapul Siahaan (Author)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.





