Efektivitas Logistic Regression dalam Analisis Sentimen Berbahasa Indonesia pada Komentar YouTube tentang Isu Ketenagakerjaan
Main Article Content
Abstract
Article Summary
This study examines the development of a sentiment classification system for Indonesian-language YouTube comments addressing employment issues through the implementation of Logistic Regression algorithm. The research dataset comprises 2,755 comments extracted from a video themed "Job Seeker Stories," with 1,020 comments manually labeled into three sentiment categories: positive, neutral, and negative. The research methodology includes text preprocessing stages, feature transformation using TF-IDF, data splitting with stratified sampling, class imbalance handling through SMOTE, and hyperparameter optimization using GridSearchCV. Model evaluation yielded 44% accuracy with varying performance distribution across classes. The negative class demonstrated optimal performance with an F1-score of 0.55, while neutral and positive classes achieved scores of 0.34 and 0.29, respectively. Class distribution imbalance and implicit characteristics of positive comments became primary obstacles in the classification process. Research findings indicate that the combination of Logistic Regression, TF-IDF, and SMOTE has potential as a baseline method for sentiment analysis of Indonesian social media comments. Nevertheless, deep learning-based model development is necessary to improve accuracy and linguistic nuance interpretation capabilities. The analysis also identified negative sentiment dominance in public responses, reflecting societal concerns regarding the national employment situation.
Keywords
Article Keywords
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC-BY 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
Badan Pusat Statistik. (2025, Februari). BPS: Jumlah pengangguran naik jadi 7,28 juta orang per Februari 2025. Tempo. https://www.tempo.co/ekonomi/bps-jumlah-pengangguran-naik-jadi-7-28-juta-orang-per-februari-2025-1344338
Birjali, M., Kasri, M., & Beni-Hssane, A. (2021). A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowledge-Based Systems, 226, 107134. https://doi.org/10.1016/j.knosys.2021.107134
Dablain, D., Krawczyk, B., & Chawla, N. V. (2022). DeepSMOTE: Fusing deep learning and SMOTE for imbalanced data. IEEE Transactions on Neural Networks and Learning Systems, 34(9), 6390β6404. https://doi.org/10.1109/TNNLS.2021.3136503
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171β4186). https://doi.org/10.18653/v1/N19-1423
Elreedy, D., & Atiya, A. F. (2019). A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class imbalance. Information Sciences, 505, 32β64. https://doi.org/10.1016/j.ins.2019.07.070
Gosain, A., & Sardana, S. (2017). Handling class imbalance problem using oversampling techniques: A review. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 79β85). IEEE. https://doi.org/10.1109/ICACCI.2017.8125820
Hudha, M., Supriyati, E., & Listyorini, T. (2022). Analisis sentimen pengguna YouTube terhadap tayangan #matanajwamenantiterawan dengan metode naΓ―ve bayes classifier. JIKO (Jurnal Informatika dan Komputer), 5(1), 1β6.
Liu, B. (2022). Sentiment analysis and opinion mining. Springer Nature.
Misrun, C. A., Haerani, E., Fikry, M., & Budianita, E. (2023). Analisis sentimen komentar YouTube terhadap Anies Baswedan sebagai bakal calon presiden 2024 menggunakan metode naive bayes classifier. Jurnal Coscitech (Computer Science and Information Technology), 4(1), 207β215. https://doi.org/10.37859/coscitech.v4i1.4790
Passos, D., & Mishra, P. (2022). A tutorial on automatic hyperparameter tuning of deep spectral modelling for regression and classification tasks. Chemometrics and Intelligent Laboratory Systems, 223, 104520. https://doi.org/10.1016/j.chemolab.2022.104520
Ramos, J. (2003). Using tf-idf to determine word relevance in document queries. In Proceedings of the First Instructional Conference on Machine Learning (pp. 29β48). Citeseer.
Rianto, Mutiara, A. B., Wibowo, E. P., & Santosa, P. I. (2021). Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation. Journal of Big Data, 8, 1β16. https://doi.org/10.1186/s40537-021-00413-1
Sanjaya, G., & Lhaksmana, K. M. (2020). Analisis sentimen komentar YouTube tentang terpilihnya menteri kabinet Indonesia maju menggunakan lexicon based. eProceedings of Engineering, 7(3).
Xu, Z., Shen, D., Nie, T., & Kou, Y. (2020). A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data. Journal of Biomedical Informatics, 107, 103465. https://doi.org/10.1016/j.jbi.2020.103465