Date of Project
4-5-2025
Document Type
Honors Thesis
School Name
College of Arts and Sciences
Department
Computer Science
Major Advisor
Dr. Sayani Sarkar
Second Advisor
Dr. Andrew Karem
Abstract
Adversarial attacks pose a significant threat to the reliability of machine learning-based spam detection systems in social media. This undergraduate thesis, "From Adversarial Attacks to Robust Classifiers: A Study in Social Media Spam Detection – Black Box & White Box," systematically examines the impact of both black-box and white-box adversarial attacks on a range of spam classifiers, including Logistic Regression, Decision Trees, Random Forests, K-Nearest Neighbors, Bagging, Gradient Boosting, and Support Vector Machines. Leveraging a novel dataset derived from Twitter spam messages and enhanced with adversarial perturbations such as synonym replacement and character-level modifications, this study evaluates classifier performance under realistic attack scenarios. Exploratory data analysis reveals how adversarial manipulation alters linguistic features and classification outcomes. Experimental results demonstrate that adversarial attacks can significantly degrade model accuracy, with white-box attacks generally proving more effective than black-box attacks. The thesis further discusses defense strategies and model robustness, offering practical insights for developing more resilient spam detection systems in the face of evolving adversarial threats. This work contributes to the fields of natural language processing, cybersecurity, and social media analytics by highlighting the urgent need for robust classifiers capable of withstanding adversarial manipulation.
Recommended Citation
Penaloza Rumie, Jonathan Jose, "From Adversarial Attacks to Robust Classifiers - A Study in Social Media Spam Detection - Black Box & White Box" (2025). Undergraduate Theses. 188.
https://scholarworks.bellarmine.edu/ugrad_theses/188
Included in
Computational Linguistics Commons, Cybersecurity Commons, Data Science Commons, Digital Communications and Networking Commons, Programming Languages and Compilers Commons, Social Media Commons, Systems and Communications Commons