Optimizing Sentiment Analysis on the Linux Desktop Using N-Gram Features
DOI:
https://doi.org/10.31294/informatika.v12i1.12255Keywords:
n-Gram Feature, Sentiment Analysis, Linux DesktopAbstract
Linux, or GNU/Linux, is a widely used open-source operating system built on the Linux kernel that is available for anyone to use, known for its security and privacy advantages. With advancements in information technology, protecting privacy has become increasingly challenging due to data extraction practices done by major tech companies. This has encouraged some Mastodon users to switch to Linux, with many expressing their opinions on using Linux as their main operating system. This research seeks to analyze the sentiments of Mastodon users toward Linux through sentiment analysis to understand whether the trend is predominantly positive, negative, or neutral. The methodology used includes collecting data with the help of the Mastodon.py library which then gets manually labelled with the assistance of a linguistic expert as well as a linguistic rule proposed by previous research. The text mining process includes preprocessing steps which includes feature extraction with n-Gram to gain the most optimized result as well as employing feature selection using TF-IDF. The Naïve Bayes algorithm is employed for text classification. The entire process of data analysis is conducted with the help of AI Studio (RapidMiner) software. The results show that the highest-performing model for sentiment analysis is achieved with an n-gram value of 3, revealing user sentiment polarity towards Linux on Mastodon as follows: 42% positive, 28% negative, and 30% neutral. The sentiment analysis model has an accuracy of 63%, with a precision of 70%, recall of 80%, and an f1-score of 74% which shows that this method is able to optimize the sentiment analysis process.
Downloads
References
Abbas, M., Kamran, A., Memon, Jamali, A. A., Saleemullah Memon, & Anees Ahmed. (2019). Multinomial Naive Bayes Classification Model for Sentiment Analysis. Unpublished. https://doi.org/10.13140/RG.2.2.30021.40169
Abdullah, N. A. S., & Rusli, N. I. A. (2021). Multilingual Sentiment Analysis: A Systematic Literature Review. Pertanika Journal of Science and Technology, 29(1). https://doi.org/10.47836/pjst.29.1.25
Aljameel, S. S., Alabbad, D. A., Alzahrani, N. A., Alqarni, S. M., Alamoudi, F. A., Babili, L. M., Aljaafary, S. K., & Alshamrani, F. M. (2021). A Sentiment Analysis Approach to Predict an Individual’s Awareness of the Precautionary Procedures to Prevent COVID-19 Outbreaks in Saudi Arabia. International Journal of Environmental Research and Public Health, 18(1), Article 1. https://doi.org/10.3390/ijerph18010218
Atmadja, A. R., Uriawan, W., Pritisen, F., Maylawati, D. S., & Arbain, A. (2019). Comparison of Naive Bayes and K-nearest neighbours for online transportation using sentiment analysis in social media. Journal of Physics: Conference Series, 1402(7), 077029. https://doi.org/10.1088/1742-6596/1402/7/077029
Bazuku, R., Anab, A., Gyemerah, S., & Mohammed, I. D. (2023). An Overview of Computer Operating Systems and Emerging Trends (SSRN Scholarly Paper No. 4609975). Social Science Research Network. https://papers.ssrn.com/abstract=4609975
Blum, R. (2023). Linux Fundamentals (2nd ed.). Jones & Bartlett Learning.
Bochkarev, V., Shevlyakova, A., & Solovyev, V. (2012). Average word length dynamics as indicator of cultural changes in society. Social Evolution and History, 14, 153–175.
Boras, M., Balen, J., & Vdovjak, K. (2020). Performance Evaluation of Linux Operating Systems. 2020 International Conference on Smart Systems and Technologies (SST), 115–120. https://doi.org/10.1109/SST49455.2020.9264055
Brembs, B., Lenardic, A., Murray-Rust, P., Chan, L., & Irawan, D. E. (2023). Mastodon over Mammon: Towards publicly owned scholarly knowledge. Royal Society Open Science, 10(7), 230207. https://doi.org/10.1098/rsos.230207
Cao, L., & Shen, H. (2022). CSS: Handling imbalanced data by improved clustering with stratified sampling. Concurrency and Computation: Practice and Experience, 34(2), e6071. https://doi.org/10.1002/cpe.6071
Cheng, C.-H., & Chen, H.-H. (2019). Sentimental text mining based on an additional features method for text classification. PLOS ONE, 14(6), e0217591. https://doi.org/10.1371/journal.pone.0217591
Humble, K. P. (2021). International law, surveillance and the protection of privacy. In The Right to Privacy Revisited. Routledge.
Kissell, J. (2024). Take Control of Your Online Privacy, 5th Edition. alt concepts.
Lazuardi, M. T., Suprapti, T., & Wijaya, Y. A. (2023). Perancangan Model Sentimen Tweet Terhadap Pilkada Dki Jakarta Tahun 2017 Menggunakan Algoritma Naïve Bayes. Jati (Jurnal Mahasiswa Teknik Informatika), 7(1), Article 1. https://doi.org/10.36040/jati.v7i1.6328
Lestandy, M., Abdurrahim, A., & Syafa’ah, L. (2021). Analisis Sentimen Tweet Vaksin COVID-19 Menggunakan Recurrent Neural Network dan Naïve Bayes. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(4), Article 4. https://doi.org/10.29207/resti.v5i4.3308
Pang, Y., Xue, X., & Namin, A. S. (2015). Predicting Vulnerable Software Components through N-Gram Analysis and Statistical Feature Selection. 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), 543–548. https://doi.org/10.1109/ICMLA.2015.99
Putu, N. L. P. M., Amrullah, A. Z., & Ismarmiaty. (2021). Analisis Sentimen dan Pemodelan Topik Pariwisata Lombok Menggunakan Algoritma Naive Bayes dan Latent Dirichlet Allocation. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(1), Article 1. https://doi.org/10.29207/resti.v5i1.2587
Saraswathi, N., Sasi Rooba, T., & Chakaravarthi, S. (2023). Improving the accuracy of sentiment analysis using a linguistic rule-based feature selection method in tourism reviews. Measurement: Sensors, 29, 100888. https://doi.org/10.1016/j.measen.2023.100888
Sianipar, J. F., Ramadhan, Y. R., & Jaelani, I. (2023). Analisis Sentimen Pembangunan Kereta Cepat Jakarta-Bandung di Media Sosial Twitter Menggunakan Metode Naive Bayes. KLIK: Kajian Ilmiah Informatika Dan Komputer, 4(1), Article 1. https://doi.org/10.30865/klik.v4i1.1033
Tiffani, I. E. (2020). Optimization of Naïve Bayes Classifier By Implemented Unigram, Bigram, Trigram for Sentiment Analysis of Hotel Review. Journal of Soft Computing Exploration, 1(1), Article 1. https://doi.org/10.52465/joscex.v1i1.4
Xia, X., & Yan, J. (2021). Construction of Music Teaching Evaluation Model Based on Weighted Naïve Bayes. Scientific Programming, 2021(1), 9. https://doi.org/1058-9244
Xu, S. (2016). Bayesian Naïve Bayes classifiers to text classification. 44(1). https://doi.org/10.1177/0165551516677946
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Muhamad Taufiq Hidayat, Rudi Kurniawan, Tati Suprapti (Author)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.





