Esta é uma versão desatualizada publicada em 2023-12-15. Leia a versão mais recente.

Machine Learning-Based Phishing Detection: An Approach for Cyber Attack Prevention

Autores

  • Eduardo C. F. Santos Autor
  • Fernando F. L. Fialho Autor
  • Jennifer C. V. Silva Autor
  • Francescolly S. Barbosa Autor
  • Claudio Frizzarini Autor
  • Marise Miranda Autor

Palavras-chave:

Phishing, Cybersecurity, Social engineering. Machine Learning. Threat Detection. Cybersecurity. Virtual Attacks. Exploratory Data Analysis. Decision tree. Principal Component Analysis (PCA). Browser Plugin. URL validator. AWS (Amazon Web Services). Cloud Computing. Observability. Ransomware. Cyber Threats. Technological Intervention. Real-Time Machine Learning. Secure Digital Environment.

Resumo

DOI:https://doi.org/10.5281/zenodo.10719793

The article addresses the growing threat of virtual scams, especially phishing, driven by technological advancement. Explores various attack techniques, origin, current statistics and future challenges related to this scenario. Highlights the need for intervention through the development of a Machine Learning (ML)-based phishing detection system in Python. In this phishing analysis project, an article from IBM (International Business Machines Corporation) is used as an initial guide, providing a solid framework for the investigation. However, since planning, the proposal was to explore an alternative premise. While the article was valuable as a starting point, the project is deliberately guided by another approach. This strategic decision resulted in new perspectives and innovative discoveries that significantly expanded the scope and depth of study. The research uses an extensive dataset to train the system, focusing on exploratory analysis and the choice of ML models. Models such as Random Forest, data normalization and Principal Component Analysis (PCA) are experimented with to reduce dimensionalities and improve model efficiency. The practical application of the system is discussed, highlighting its role in defending against constantly evolving cyber threats. The article also describes data processing, algorithm implementation and evaluation of model effectiveness, using metrics such as Area under the ROC Curve. Furthermore, the text discusses the creation of a plug-in for browsers, especially Chrome, which serves as a URL validator, allowing users to check the legitimacy of links before clicking. This initiative is accompanied by a detailed explanation of the plugin development using HTML, CSS and JavaScript. Finally, the article details the architecture and technologies used, highlighting the choice of AWS for approval and production environments, including virtual machines, cloud storage and application observability.

Publicado

2023-12-15

Versões