Deteksi Phishing Website Menggunakan Algoritma Random Forest dengan Hyperparameter Tuning GridSearchCV
Main Article Content
Abstract
Article Summary
Phishing websites are one of the cybersecurity threats that deceive users into disclosing sensitive information such as usernames, passwords, and financial data. These attacks are commonly carried out by creating fake web pages that resemble legitimate websites, making them difficult for ordinary users to identify. Therefore, URL-based phishing detection is important in improving user security when accessing websites. This study aims to improve the accuracy of phishing URL classification by applying hyperparameter tuning to the Random Forest algorithm as a popular machine learning method. The dataset used consists of 10,000 URLs, comprising 5,000 phishing URLs and 5,000 legitimate URLs, which were then extracted into 16 relevant numerical technical features. Four model configurations were evaluated, consisting of one default model and three tuned models using GridSearchCV. The experimental results show that the Grid 2 configuration achieved the best performance with an accuracy of 82.65%, precision of 82.70%, recall of 82.65%, F1-score of 82.62%, ROC-AUC of 0.9068, and PR-AUC of 0.8976. In comparison, the default model only achieved an accuracy of 82.40% and an F1-score of 82.38%, while Grid 1 and Grid 3 produced slightly lower performance. These findings indicate that hyperparameter tuning can provide a slight performance improvement compared to the default configuration in detecting phishing websites, although the difference is not statistically significant.
Keywords
Article Keywords
Downloads
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC-BY 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
Berrar, D. (2021). Cross-validation. Dalam Encyclopedia of bioinformatics and computational biology (Vol. 1, hlm. 542–545). Elsevier.
El-Hassani, F. Z., Amri, M., Joudar, N. E., & Haddouch, K. (2024). A new optimization model for MLP hyperparameter tuning: Modeling and resolution by real-coded genetic algorithm. Neural Processing Letters, 56(2), 1–31. https://doi.org/10.1007/s11063-024-11578-0
Hapsari, R. D., & Pambayun, K. G. (2023). Ancaman cybercrime di Indonesia: Sebuah tinjauan pustaka sistematis. Jurnal Konstituen, 5(1), 1–17. https://doi.org/10.33701/jk.v5i1.3208
Huizen, L. M., Ardima, M. B., & Idris, M. (2025). Meningkatkan kinerja SVM: Dampak berbagai teknik seleksi fitur pada akurasi prediksi. Aiti, 22(1), 1–14. https://doi.org/10.24246/aiti.v22i1.1-14
Mahmud, A. F., & Wirawan, S. (2024). Deteksi phishing website menggunakan machine learning metode klasifikasi. Sistemasi: Jurnal Sistem Informasi, 13(4). http://sistemasi.ftik.unisi.ac.id
Nugraha, A. F., Aziza, R. F. A., & Pristyanto, Y. (2022). Penerapan metode stacking dan random forest untuk meningkatkan kinerja klasifikasi pada proses deteksi web phishing. Jurnal Infomedia, 7(1), 39. https://doi.org/10.30811/jim.v7i1.2959
Nugroho, M. W. (2025). Analisis performa algoritma random forest dalam mengatasi overfitting pada model prediksi. Jurnal Teknologi Informasi dan Komputer, 9(4). https://doi.org/10.35870/jtik.v9i4.4236
Pensa, R. G., Crombach, A., Peignier, S., & Rigotti, C. (2025). Explaining random forest and XGBoost with shallow decision trees by co-clustering feature importance. Machine Learning, 114(12). https://doi.org/10.1007/s10994-025-06932-9
Prayetno, F. M., Riski, F., & Safitri, D. L. A. (2026). Analisis Keamanan Siber Pada Sistem Elektronik Berbasis Perspektif Jaringan Komputer Dan Ketentuan Bssn: Studi Pada Imbauan Phishing dan Pencurian Kredensial: Studi pada Imbauan Phishing dan Pencurian Kredensial. Jurnal Teknologi Informasi: Jurnal Keilmuan dan Aplikasi Bidang Teknik Informatika, 20(1), 95-100 https://doi.org/10.47111/jti.v20i1.23404
Subaşı, N. (2024). Comprehensive analysis of grid and randomized search on dataset performance. European Journal of Engineering and Applied Sciences, 7(2), 77–83. https://doi.org/10.55581/ejeas.1581494
Suwarno, D. B., & Hardjianto, M. (2024). Deteksi website phishing dari analisis URL menggunakan algoritma random forest. Jurnal Teknik Informatika, 21(2), 145–152. https://doi.org/10.36080/bit.v21i2.3603
Wahyudi, D., Niswar, M., & Alimuddin, A. A. P. (2022). Website phishing detection application using support vector machine (SVM). Journal of Information Technology and Its Utilization, 5(1), 18–24. https://doi.org/10.56873/jitu.5.1.4836
Yang, Z., Liu, X., Li, T., Wu, D., Wang, J., Zhao, Y., & Han, H. (2022). A systematic literature review of methods and datasets for anomaly-based network intrusion detection. Computers & Security, 116, Article 102675. https://doi.org/10.1016/j.cose.2022.102675
Yates, L. A., Aandahl, Z., Richards, S. A., & Brook, B. W. (2023). Cross validation for model selection: A review with examples from ecology. Ecological Monographs, 93(1). https://doi.org/10.1002/ecm.1557.