Deteksi URL Phishing Menggunakan Hybrid Deep Learning CNN dan XGBoost dengan Teknik Balancing SMOTE-ENN

Paskalis Reynaldy Elroy Gabriel

Authors

Paskalis Reynaldy Elroy Gabriel Universitas Pembangunan Nasional "Veteran" Jawa Timur

Keywords:

phishing detection, convolutional neural network, xgboost, smote-enn, hybrid deep learning, url security, imbalanced data

Abstract

Serangan phishing melalui URL palsu merupakan ancaman keamanan siber yang terus berkembang dan merugikan pengguna internet. Penelitian ini mengusulkan sistem deteksi phishing menggunakan pendekatan hibrida yang menggabungkan Convolutional Neural Network (CNN) untuk ekstraksi fitur otomatis dan XGBoost sebagai classifier. CNN dilatih secara supervised learning untuk mengekstrak 64 fitur high-level dari karakter URL, kemudian digabungkan dengan 40 fitur heuristik yang mencakup panjang URL, jumlah karakter khusus, entropi, dan suspicious keywords. Untuk mengatasi ketidakseimbangan dataset (rasio 1:1,81), diterapkan teknik SMOTE-ENN yang mengombinasikan oversampling dan cleaning data. Model dilatih menggunakan dataset 4.856 URL dengan rasio 80:20 untuk training dan testing. Hasil eksperimen menunjukkan performa yang sangat tinggi dengan akurasi 97.12%, precision 0.9855 untuk kelas phishing, recall 0.9644, F1-score 0.9748 untuk kelas phishing, dan AUC-ROC 0.9960. Hyperparameter tuning menggunakan GridSearchCV menghasilkan cross-validation F1-score 0.9991. Model mampu melakukan inferensi dengan kecepatan 4.855 URL/detik, menjadikannya cocok untuk implementasi real-time.

References

APWG, "Phishing Activity Trends Report, 4th Quarter 2023," Anti-Phishing Working Group, 2024.

J. Hong, "The State of Phishing Attacks," Communications of the ACM, vol. 65, no. 2, pp. 74-81, Feb. 2022.

R. M. Mohammad, F. Thabtah, and L. McCluskey, "Predicting phishing websites based on self-structuring neural network," Neural Computing and Applications, vol. 25, no. 2, pp. 443-458, 2014.

Y. Kim, "Convolutional Neural Networks for Sentence Classification," Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746-1751, 2014.

G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, "A study of the behavior of several methods for balancing machine learning training data," ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 20-29, 2004.

A. K. Jain and B. B. Gupta, "A machine learning based approach for phishing detection using hyperlinks information," Journal of Ambient Intelligence and Humanized Computing, vol. 10, no. 5, pp. 2015-2028, 2019.

M. Bahnsen, E. C. Bohorquez, S. Villegas, J. Vargas, and F. A. González, "Classifying phishing URLs using recurrent neural networks," Electronic Commerce Research and Applications, vol. 14, pp. 1-10, 2017.

S. Marchal, J. François, R. State, and T. Engel, "PhishStorm: Detecting Phishing With Streaming Analytics," IEEE Transactions on Network and Service Management, vol. 11, no. 4, pp. 458-471, Dec. 2014.

J. Saxe and K. Berlin, "Deep neural network based malware detection using two dimensional binary program features," 2015 10th International Conference on Malicious and Unwanted Software (MALWARE), pp. 11-20, 2015.

X. Zhang, J. Zhao, and Y. LeCun, "Character-level convolutional networks for text classification," Advances in Neural Information Processing Systems, vol. 28, 2015.

A. Aljofey, Q. Jiang, Q. Qu, M. Huang, and J. P. Niyigena, "An effective phishing detection model based on character level convolutional neural network from URL," Electronics, vol. 9, no. 9, p. 1514, 2020.

W. Ali and A. A. Ahmed, "Hybrid intelligent phishing website prediction using deep neural networks with genetic algorithm-based feature selection and weighting," IET Information Security, vol. 13, no. 6, pp. 659-669, 2019.

R. S. Rao and A. R. Pais, "Detection of phishing websites using an efficient feature-based machine learning framework," Neural Computing and Applications, vol. 31, no. 8, pp. 3851-3873, 2019.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: Synthetic minority over-sampling technique," Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.

Deteksi URL Phishing Menggunakan Hybrid Deep Learning CNN dan XGBoost dengan Teknik Balancing SMOTE-ENN

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

Similar Articles

about

template

issn

visitor

Current Issue

Information

Make a Submission

website