: 10.56472/25835238/IRJEMS-V2I1P144Vetrivelan Tamilmani, Venkata Deepak Namburi, Aniruddha Arjun Singh Singh, Vaibhav Maniar, Rami Reddy Kothamaram, Dinesh Rajendran. "Real-Time Identification of Phishing Websites Using Advanced Machine Learning Methods" International Research Journal of Economics and Management Studies, Vol. 2, No. 1, pp. 345-355, 2023.
Phishing is a method of social engineering that exploits the trust that users have in online services to obtain sensitive information, including financial data and login credentials. Fake emails that impersonate legitimate businesses and agencies are used to redirect users to fraudulent websites, where they are required to submit confidential financial information, such as login credentials, for social media systems to function successfully. The method presented in this research is effective for identifying fraudulent websites by utilizing two ensemble learning models, XGBoost and AdaBoost. More than 11,000 websites, along with 30 parameters and class labels, are utilised in the Kaggle Phishing Website Detector dataset. To ensure high-quality input for model training, thorough preparation is performed, including data cleaning, data normalization, label encoding, and feature extraction. The two models are compared using the following metrics: F1-score (97.77%), AUC-ROC (97.21%), recall (98.58%), accuracy (98.17%), and precision (97.21%). XGBoost outperforms AdaBoost in all four categories. AdaBoost achieves only 95.73%. Further test the robustness of the suggested models by ROC and confusion matrix analysis. A study comparing ensemble approaches to standard classifiers, such as Naïve Bayes, SVM, and Neural Networks, shows that ensemble methods are more effective. According to the results, XGBoost and AdaBoost are the most effective options for detecting phishing websites in the real world, as they are accurate, scalable, and dependable.
[1] M. F. A. Razak, N. B. Anuar, F. Othman, A. Firdaus, F. Afifi, and R. Salleh, “Bio-inspired for Features Optimization and Malware Detection,” Arab. J. Sci. Eng., vol. 43, no. 12, pp. 6963–6979, 2018, doi: 10.1007/s13369-017-2951-y.
[2] M. T. Suleman and S. M. Awan, “Optimization of URL-Based Phishing Websites Detection through Genetic Algorithms,” Autom. Control Comput. Sci., vol. 53, no. 4, pp. 333–341, Jul. 2019, doi: 10.3103/S0146411619040102.
[3] D. D. Rao, “Multimedia-Based Intelligent Content Networking for Future Internet,” in 2009 Third UKSim European Symposium on Computer Modeling and Simulation, 2009, pp. 55–59. doi: 10.1109/EMS.2009.108.
[4] N. S. Zaini et al., “Phishing detection system using machine learning classifiers,” Indones. J. Electr. Eng. Comput. Sci., vol. 17, no. 3, pp. 1165–1171, 2020, doi: 10.11591/ijeecs.v17.i3.pp1165-1171.
[5] A. Thapliyal, P. S. Bhagavathi, T. Arunan, and D. D. Rao, “Realizing Zones Using UPnP,” in 2009 6th IEEE Consumer Communications and Networking Conference, 2009, pp. 1–5. doi: 10.1109/CCNC.2009.4784867.
[6] B. Wei et al., “A Deep-Learning-Driven Light-Weight Phishing Detection Sensor,” Sensors, vol. 19, no. 19, Sep. 2019, doi: 10.3390/s19194258.
[7] A. Kushwaha, P. Pathak, and S. Gupta, “Review of optimize load balancing algorithms in cloud,” Int. J. Distrib. Cloud Comput., vol. 4, no. 2, pp. 1–9, 2016.
[8] A. K. Jain and B. B. Gupta, “Two-level authentication approach to protect from phishing attacks in real time,” J. Ambient Intell. Humaniz. Comput., vol. 9, no. 6, pp. 1783–1796, 2018, doi: 10.1007/s12652-017-0616-z.
[9] J.-L. Chen, Y.-W. Ma, and K.-L. Huang, “Intelligent Visual Similarity-Based Phishing Websites Detection,” Symmetry (Basel)., vol. 12, no. 10, p. 1681, Oct. 2020, doi: 10.3390/sym12101681.
[10] S. S. S. Neeli, “Decentralized Databases Leveraging Blockchain Technology,” vol. 8, no. 1, pp. 1–8, 2020.
[11] T. C. Truong, Q. B. Diep, and I. Zelinka, “Artificial Intelligence in the Cyber Domain: Offense and Defense,” Symmetry (Basel)., vol. 12, no. 3, p. 410, Mar. 2020, doi: 10.3390/sym12030410.
[12] S. S. S. Neeli, “Real-Time Data Management with In-Memory Databases: A Performance-Centric Approach,” p. 49, 2020.
[13] H. P. Kapadia, “Cross-Platform UI/UX Adaptions Engine for Hybrid Mobile Apps,” Int. J. Nov. Res. Dev., vol. 5, no. 9, pp. 30–37, 2020.
[14] A. J. Saleh et al., “An Intelligent Spam Detection Model Based on Artificial Immune System,” Information, vol. 10, no. 6, p. 209, Jun. 2019, doi: 10.3390/info10060209.
[15] Gopi, “Zero Trust Security Architectures for Large-Scale Cloud Workloads,” Int. J. Res. Anal. Rev., vol. 5, no. 2, pp. 960–965, 2018.
[16] D. S. Berman, A. L. Buczak, J. S. Chavis, and C. L. Corbett, “A Survey of Deep Learning Methods for Cyber Security,” Information, vol. 10, no. 4, 2019, doi: 10.3390/info10040122.
[17] J. Kumar, A. Santhanavijayan, B. Janet, B. Rajendran, and B. S. Bindhumadhava, “Phishing website classification and detection using machine learning,” in 2020 International Conference on Computer Communication and Informatics, ICCCI 2020, 2020. doi: 10.1109/ICCCI48352.2020.9104161.
[18] Y. Su, “Research on Website Phishing Detection Based on LSTM RNN,” in Proceedings of 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference, ITNEC 2020, 2020. doi: 10.1109/ITNEC48623.2020.9084799.
[19] S. Zaman, S. M. Uddin Deep, Z. Kawsar, M. Ashaduzzaman, and A. I. Pritom, “Phishing Website Detection Using Effective Classifiers and Feature Selection Techniques,” in ICIET 2019 - 2nd International Conference on Innovation in Engineering and Technology, 2019. doi: 10.1109/ICIET48527.2019.9290554.
[20] P. Yang, G. Zhao, and P. Zeng, “Phishing Website Detection Based on Multidimensional Features Driven by Deep Learning,” IEEE Access, vol. 7, pp. 15196–15209, 2019, doi: 10.1109/ACCESS.2019.2892066.
[21] C. E. Shyni, A. D. Sundar, and G. S. E. Ebby, “Phishing Detection in Websites using Parse Tree Validation,” in 2018 Recent Advances on Engineering, Technology and Computational Sciences (RAETCS), 2018, pp. 1–4. doi: 10.1109/RAETCS.2018.8443961.
[22] A. Subasi, E. Molah, F. Almkallawi, and T. J. Chaudhery, “Intelligent phishing website detection using random forest classifier,” in 2017 International Conference on Electrical and Computing Technologies and Applications, ICECTA 2017, 2017. doi: 10.1109/ICECTA.2017.8252051.
[23] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 785–794. doi: 10.1145/2939672.2939785.
[24] T. Manyumwa, P. F. Chapita, H. Wu, and S. Ji, “Towards Fighting Cybercrime: Malicious URL Attack Type Detection using Multiclass Classification,” pp. 1813–1822, 2020, doi: 10.1109/BigData50022.2020.9378029.
[25] V. Shahrivari, M. M. Darabi, and M. Izadi, “Phishing Detection Using Machine Learning Techniques,” 2020.
[26] A. D. Kulkarni and L. L. Brown, “Phishing Websites Detection using Machine Learning,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 7, pp. 8–13, 2019.
[27] M. Korkmaz, O. K. Sahingoz, and B. DIri, “Detection of Phishing Websites by Using Machine Learning-Based URL Analysis,” 2020 11th Int. Conf. Comput. Commun. Netw. Technol. ICCCNT 2020, no. July 2020, 2020, doi: 10.1109/ICCCNT49239.2020.9225561.
[28] J. Mao et al., “Detecting Phishing Websites via Aggregation Analysis of Page Layouts,” Procedia Comput. Sci., vol. 129, pp. 224–230, 2018, doi: 10.1016/j.procs.2018.03.053.
[29] Polam, R. M., Kamarthapu, B., Kakani, A. B., Nandiraju, S. K. K., Chundru, S. K., & Vangala, S. R. (2021). Big Text Data Analysis for Sentiment Classification in Product Reviews Using Advanced Large Language Models. International Journal of AI, BigData, Computational and Management Studies, 2(2), 55-65.
[30] Gangineni, V. N., Tyagadurgam, M. S. V., Chalasani, R., Bhumireddy, J. R., & Penmetsa, M. (2021). Strengthening Cybersecurity Governance: The Impact of Firewalls on Risk Management. International Journal of AI, BigData, Computational and Management Studies, 2, 10-63282.
[31] Pabbineedi, S., Penmetsa, M., Bhumireddy, J. R., Chalasani, R., Tyagadurgam, M. S. V., & Gangineni, V. N. (2021). An Advanced Machine Learning Models Design for Fraud Identification in Healthcare Insurance. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 2(1), 26-34.
[32] Kamarthapu, B., Kakani, A. B., Nandiraju, S. K. K., Chundru, S. K., Vangala, S. R., & Polam, R. M. (2021). Advanced Machine Learning Models for Detecting and Classifying Financial Fraud in Big Data-Driven. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 2(3), 39-46.
[33] Tyagadurgam, M. S. V., Gangineni, V. N., Pabbineedi, S., Penmetsa, M., Bhumireddy, J. R., & Chalasani, R. (2021). Enhancing IoT (Internet of Things) Security Through Intelligent Intrusion Detection Using ML Models. International Journal of Emerging Research in Engineering and Technology, 2(1), 27-36.
[34] Vangala, S. R., Polam, R. M., Kamarthapu, B., Kakani, A. B., Nandiraju, S. K. K., & Chundru, S. K. (2021). Smart Healthcare: Machine Learning-Based Classification of Epileptic Seizure Disease Using EEG Signal Analysis. International Journal of Emerging Research in Engineering and Technology, 2(3), 61-70.
[35] Kakani, A. B., Nandiraju, S. K. K., Chundru, S. K., Vangala, S. R., Polam, R. M., & Kamarthapu, B. (2021). Big Data and Predictive Analytics for Customer Retention: Exploring the Role of Machine Learning in E-Commerce. International Journal of Emerging Trends in Computer Science and Information Technology, 2(2), 26-34.
[36] Penmetsa, M., Bhumireddy, J. R., Chalasani, R., Tyagadurgam, M. S. V., Gangineni, V. N., & Pabbineedi, S. (2021). Next-Generation Cybersecurity: The Role of AI and Quantum Computing in Threat Detection. International Journal of Emerging Trends in Computer Science and Information Technology, 2(4), 54-61.
[37] Polu, A. R., Vattikonda, N., Gupta, A., Patchipulusu, H., Buddula, D. V. K. R., & Narra, B. (2021). Enhancing Marketing Analytics in Online Retailing through Machine Learning Classification Techniques. Available at SSRN 5297803.
[38] Polu, A. R., Buddula, D. V. K. R., Narra, B., Gupta, A., Vattikonda, N., & Patchipulusu, H. (2021). Evolution of AI in Software Development and Cybersecurity: Unifying Automation, Innovation, and Protection in the Digital Age. Available at SSRN 5266517.
[39] Polu, A. R., Vattikonda, N., Buddula, D. V. K. R., Narra, B., Patchipulusu, H., & Gupta, A. (2021). Integrating AI-Based Sentiment Analysis With Social Media Data For Enhanced Marketing Insights. Available at SSRN 5266555.
[40] Buddula, D. V. K. R., Patchipulusu, H. H. S., Polu, A. R., Vattikonda, N., & Gupta, A. K. (2021). INTEGRATING AI-BASED SENTIMENT ANALYSIS WITH SOCIAL MEDIA DATA FOR ENHANCED MARKETING INSIGHTS. Journal Homepage: http://www. ijesm. co. in, 10(2).
[41] Gupta, A. K., Buddula, D. V. K. R., Patchipulusu, H. H. S., Polu, A. R., Narra, B., & Vattikonda, N. (2021). An Analysis of Crime Prediction and Classification Using Data Mining Techniques.
[42] Rajiv, C., Mukund Sai, V. T., Venkataswamy Naidu, G., Sriram, P., & Mitra, P. (2022). Leveraging Big Datasets for Machine Learning-Based Anomaly Detection in Cybersecurity Network Traffic. J Contemp Edu Theo Artific Intel: JCETAI/102.
[43] Sandeep Kumar, C., Srikanth Reddy, V., Ram Mohan, P., Bhavana, K., & Ajay Babu, K. (2022). Efficient Machine Learning Approaches for Intrusion Identification of DDoS Attacks in Cloud Networks. J Contemp Edu Theo Artific Intel: JCETAI/101.
[44] Bhumireddy, J. R., Chalasani, R., Tyagadurgam, M. S. V., Gangineni, V. N., Pabbineedi, S., & Penmetsa, M. (2020). Big Data-Driven Time Series Forecasting for Financial Market Prediction: Deep Learning Models. Journal of Artificial Intelligence and Big Data, 2(1), 153–164.DOI: 10.31586/jaibd.2022.1341
[45] Nandiraju, S. K. K., Chundru, S. K., Vangala, S. R., Polam, R. M., Kamarthapu, B., & Kakani, A. B. (2022). Advance of AI-Based Predictive Models for Diagnosis of Alzheimer’s Disease (AD) in Healthcare. Journal of Artificial Intelligence and Big Data, 2(1), 141–152.DOI: 10.31586/jaibd.2022.1340
[46] Tyagadurgam, M. S. V., Gangineni, V. N., Pabbineedi, S., Penmetsa, M., Bhumireddy, J. R., & Chalasani, R. (2022). Designing an Intelligent Cybersecurity Intrusion Identify Framework Using Advanced Machine Learning Models in Cloud Computing. Universal Library of Engineering Technology, (Issue).
[47] Vangala, S. R., Polam, R. M., Kamarthapu, B., Kakani, A. B., Nandiraju, S. K. K., & Chundru, S. K. (2022). Leveraging Artificial Intelligence Algorithms for Risk Prediction in Life Insurance Service Industry. Available at SSRN 5459694.
[48] Polam, R. M., Kamarthapu, B., Kakani, A. B., Nandiraju, S. K. K., Chundru, S. K., & Vangala, S. R. (2021). Data Security in Cloud Computing: Encryption, Zero Trust, and Homomorphic Encryption. International Journal of Emerging Trends in Computer Science and Information Technology, 2(3), 70-80.
[49] Gangineni, V. N., Pabbineedi, S., Penmetsa, M., Bhumireddy, J. R., Chalasani, R., & Tyagadurgam, M. S. V. Efficient Framework for Forecasting Auto Insurance Claims Utilizing Machine Learning Based Data-Driven Methodologies. International Research Journal of Economics and Management Studies IRJEMS, 1(2).
[50] Vattikonda, N., Gupta, A. K., Polu, A. R., Narra, B., Buddula, D. V. K. R., & Patchipulusu, H. H. S. (2022). Blockchain Technology in Supply Chain and Logistics: A Comprehensive Review of Applications, Challenges, and Innovations. International Journal of Emerging Research in Engineering and Technology, 3(3), 99-107.
[51] Narra, B., Vattikonda, N., Gupta, A. K., Buddula, D. V. K. R., Patchipulusu, H. H. S., & Polu, A. R. (2022). Revolutionizing Marketing Analytics: A Data-Driven Machine Learning Framework for Churn Prediction. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 3(2), 112-121.
[52] Polu, A. R., Narra, B., Buddula, D. V. K. R., Patchipulusu, H. H. S., Vattikonda, N., & Gupta, A. K. BLOCKCHAIN TECHNOLOGY AS A TOOL FOR CYBERSECURITY: STRENGTHS, WEAKNESSES, AND POTENTIAL APPLICATIONS.
[53] Bhumireddy, J. R., Chalasani, R., Tyagadurgam, M. S. V., Gangineni, V. N., Pabbineedi, S., & Penmetsa, M. (2022). Big Data-Driven Time Series Forecasting for Financial Market Prediction: Deep Learning Models. Journal of Artificial Intelligence and Big Data, 2(1), 153–164.DOI: 10.31586/jaibd.2022.1341
[54] Nandiraju, S. K. K., Chundru, S. K., Vangala, S. R., Polam, R. M., Kamarthapu, B., & Kakani, A. B. (2022). Advance of AI-Based Predictive Models for Diagnosis of Alzheimer’s Disease (AD) in Healthcare. Journal of Artificial Intelligence and Big Data, 2(1), 141–152.DOI: 10.31586/jaibd.2022.134
Phishing Website Detection, Ensemble Learning, Machine Learning, Cybersecurity, Website security, Real-time detection.