Klasifikasi Kompleksitas Gameplay Berbasis Struktur Kalimat pada Deskripsi Game
Main Article Content
Abstract
Article Summary
Game descriptions on digital distribution platforms play a crucial role in conveying the characteristics of gameplay to players. However, the language complexity of these descriptions varies and may influence players' understanding of the gameplay being offered. This study aims to classify gameplay complexity based on sentence structure in game descriptions using a Natural Language Processing (NLP) approach. The dataset used is the 10k Most Popular Gaming 2025 dataset obtained from Kaggle, with a focus on the game description column. The description data is grouped into three complexity classes: simple, medium, and complex, based on the linguistic characteristics of the text. The research process includes text preprocessing, sentence-structure-based linguistic feature extraction, and data balancing using the balance rank method. Classification is performed using the Logistic Regression, Random Forest Classifier, and Support Vector Machine algorithms. Evaluation results show that the Random Forest Classifier achieves the highest accuracy of 0.85, while Logistic Regression and Support Vector Machine obtain accuracies of 0.81 each. Feature analysis reveals that word count and average sentence length are the most influential features in determining gameplay complexity. Visualization using Principal Component Analysis shows a clear distribution pattern of complexity classes, although some overlap between classes remains. The results of this study demonstrate that sentence-structure-based linguistic analysis is effective in representing gameplay complexity in game descriptions.
Keywords
Article Keywords
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Branco, P., Torgo, L., & Ribeiro, R. (2015). A survey of predictive modelling under imbalanced distributions (pp. 1β48). Retrieved from http://arxiv.org/abs/1505.01658
Breiman, L. (2001). Random forests. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12343 LNCS, 503β515. https://doi.org/10.1007/978-3-030-62008-0_35
Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: A review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374(2065), 20150202. https://doi.org/10.1098/rsta.2015.0202
Liu, F., Jin, T., & Lee, J. S. Y. (2025). Automatic readability assessment for sentences: Neural, hybrid, and large language models. In Language Resources and Evaluation (Springer Netherlands). https://doi.org/10.1007/s10579-024-09800-5
Madge, C. (2022). Proceedings of the LREC 2022 workshop on Games and Natural Language Processing (Games & NLP 2022).
Mensfelt, A., Stathis, K., & Trencsenyi, V. (2024). Autoformalization of game descriptions using large language models. Retrieved from http://arxiv.org/abs/2409.12300
Mustafa, S., & Hama Saeed, M. (2025). Empowering text classification with NLP and explainable AI for enhanced interpretability. Journal of Electrical Systems and Information Technology, 12(1). https://doi.org/10.1186/s43067-025-00273-2
Novikova, J., Balagopalan, A., Shkaruta, K., & Rudzicz, F. (2019). Lexical features are more vulnerable, syntactic features have more predictive power. W-NUT@EMNLP 2019 - 5th Workshop on Noisy User-Generated Text, Proceedings (2001), 431β443. https://doi.org/10.18653/v1/d19-5556
Pan, W., Li, X., Chen, X., & Xu, R. (2025). Textual form features for text readability assessment. Natural Language Processing, 31(3), 800β841. https://doi.org/10.1017/nlp.2024.50
Powers, D. M. W. (2020). Evaluation: From precision, recall, and F-measure to ROC, informedness, markedness, and correlation. 37β63. Retrieved from http://arxiv.org/abs/2010.16061
Shlens, J. (2014). A tutorial on principal component analysis. Retrieved from http://arxiv.org/abs/1404.1100
Tyagi, A. (2021). A review study of natural language processing techniques for text mining. International Journal of Engineering Research & Technology (Ijert), 10(09), 586β589. Retrieved from www.ijert.org
Wang, M. (2023). Research on text classification method based on NLP. Advances in Computer, Signals and Systems, 7(2), 93β100. https://doi.org/10.23977/acss.2023.070213
Zagal, J., Tomuro, N., & Shepitsen, A. (2011). Natural language processing for games studies research. Journal of Simulation & Gaming. Retrieved from http://lang.cs.tut.ac.jp/japtal2012/special_sessions/GAMNLP-12/papers/ZagalTomuro-GamesResearchMethods-2010.pdf