Efektivitas Logistic Regression dalam Analisis Sentimen Berbahasa Indonesia pada Komentar YouTube tentang Isu Ketenagakerjaan

Hamdan Santani Mulyono; Usep Saprudin

doi:10.63447/jimik.v6i3.1481

PDF

📅 Published: Sep 10, 2025

DOI

Digital Object Identifier

10.63447/jimik.v6i3.1481

Article Statistics

Total Views

542 views

Total Downloads

380 downloads

Article Metrics

Citation Metrics

AI Research Hub

Dimensions

Scientific Connections

Connected Papers

Research Visualization

Scite

Citation Insights

Google Scholar

Find Articles

Semantic Scholar

Insights and Analytics

ResearchGate

Research Network

Garuda

Indonesian Journals

Scilit

Scientific Publications

Crossref

DOI and Metadata

Share This Article

Facebook

Share on Facebook

Twitter

Share on Twitter

Share on LinkedIn

Share via WhatsApp

Email

Share via Email

Copy Link

Copy to Clipboard

Authors

Details of Authors

Hamdan Santani Mulyono

Universitas Dharma Wacana

Usep Saprudin

Universitas Dharma Wacana

Abstract

Article Summary

This study examines the development of a sentiment classification system for Indonesian-language YouTube comments addressing employment issues through the implementation of Logistic Regression algorithm. The research dataset comprises 2,755 comments extracted from a video themed "Job Seeker Stories," with 1,020 comments manually labeled into three sentiment categories: positive, neutral, and negative. The research methodology includes text preprocessing stages, feature transformation using TF-IDF, data splitting with stratified sampling, class imbalance handling through SMOTE, and hyperparameter optimization using GridSearchCV. Model evaluation yielded 44% accuracy with varying performance distribution across classes. The negative class demonstrated optimal performance with an F1-score of 0.55, while neutral and positive classes achieved scores of 0.34 and 0.29, respectively. Class distribution imbalance and implicit characteristics of positive comments became primary obstacles in the classification process. Research findings indicate that the combination of Logistic Regression, TF-IDF, and SMOTE has potential as a baseline method for sentiment analysis of Indonesian social media comments. Nevertheless, deep learning-based model development is necessary to improve accuracy and linguistic nuance interpretation capabilities. The analysis also identified negative sentiment dominance in public responses, reflecting societal concerns regarding the national employment situation.

Keywords

Article Keywords

Sentiment Analysis ; YouTube Comments ; Logistic Regression ; Employment ; Indonesian Language

Downloads

Download data is not yet available.

How to Cite

Mulyono, H. S., & Saprudin, U. (2025). Efektivitas Logistic Regression dalam Analisis Sentimen Berbahasa Indonesia pada Komentar YouTube tentang Isu Ketenagakerjaan. Jurnal Indonesia : Manajemen Informatika Dan Komunikasi, 6(3), 1547-1555. https://doi.org/10.63447/jimik.v6i3.1481

Issue

Vol. 6 No. 3 (2025): September

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC-BY 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.

License: All articles are licensed under

Author Biographies

Hamdan Santani Mulyono, Universitas Dharma Wacana

Program Studi Teknik Informatika, Universitas Dharma Wacana, Kota Metro, Provinsi Lampung, Indonesia

Usep Saprudin, Universitas Dharma Wacana

Program Studi Teknik Informatika, Universitas Dharma Wacana, Kota Metro, Provinsi Lampung, Indonesia

References

Ash, S., & Surya, A. (2022). Analisis sentimen masyarakat terhadap kebijakan vaksinasi COVID-19 pada media sosial Twitter menggunakan metode logistic regression. Analisis Sentimen Masyarakat Terhadap Kebijakan Vaksinasi Covid-19 Pada Media Sosial Twitter Menggunakan Metode Logistic Regression, 3(2), 99–106. https://doi.org/10.37859/coscitech.v3i2.3836

Badan Pusat Statistik. (2025, Februari). BPS: Jumlah pengangguran naik jadi 7,28 juta orang per Februari 2025. Tempo. https://www.tempo.co/ekonomi/bps-jumlah-pengangguran-naik-jadi-7-28-juta-orang-per-februari-2025-1344338

Birjali, M., Kasri, M., & Beni-Hssane, A. (2021). A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowledge-Based Systems, 226, 107134. https://doi.org/10.1016/j.knosys.2021.107134

Dablain, D., Krawczyk, B., & Chawla, N. V. (2022). DeepSMOTE: Fusing deep learning and SMOTE for imbalanced data. IEEE Transactions on Neural Networks and Learning Systems, 34(9), 6390–6404. https://doi.org/10.1109/TNNLS.2021.3136503

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171–4186). https://doi.org/10.18653/v1/N19-1423

Elreedy, D., & Atiya, A. F. (2019). A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class imbalance. Information Sciences, 505, 32–64. https://doi.org/10.1016/j.ins.2019.07.070

Gosain, A., & Sardana, S. (2017). Handling class imbalance problem using oversampling techniques: A review. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 79–85). IEEE. https://doi.org/10.1109/ICACCI.2017.8125820

Hudha, M., Supriyati, E., & Listyorini, T. (2022). Analisis sentimen pengguna YouTube terhadap tayangan #matanajwamenantiterawan dengan metode naïve bayes classifier. JIKO (Jurnal Informatika dan Komputer), 5(1), 1–6.

Liu, B. (2022). Sentiment analysis and opinion mining. Springer Nature.
Misrun, C. A., Haerani, E., Fikry, M., & Budianita, E. (2023). Analisis sentimen komentar YouTube terhadap Anies Baswedan sebagai bakal calon presiden 2024 menggunakan metode naive bayes classifier. Jurnal Coscitech (Computer Science and Information Technology), 4(1), 207–215. https://doi.org/10.37859/coscitech.v4i1.4790

Passos, D., & Mishra, P. (2022). A tutorial on automatic hyperparameter tuning of deep spectral modelling for regression and classification tasks. Chemometrics and Intelligent Laboratory Systems, 223, 104520. https://doi.org/10.1016/j.chemolab.2022.104520

Ramos, J. (2003). Using tf-idf to determine word relevance in document queries. In Proceedings of the First Instructional Conference on Machine Learning (pp. 29–48). Citeseer.

Rianto, Mutiara, A. B., Wibowo, E. P., & Santosa, P. I. (2021). Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation. Journal of Big Data, 8, 1–16. https://doi.org/10.1186/s40537-021-00413-1

Sanjaya, G., & Lhaksmana, K. M. (2020). Analisis sentimen komentar YouTube tentang terpilihnya menteri kabinet Indonesia maju menggunakan lexicon based. eProceedings of Engineering, 7(3).

Xu, Z., Shen, D., Nie, T., & Kou, Y. (2020). A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data. Journal of Biomedical Informatics, 107, 103465. https://doi.org/10.1016/j.jbi.2020.103465

Article Sidebar

Main Article Content