Undergraduate Theses

From Adversarial Attacks to Robust Classifiers - A Study in Social Media Spam Detection - Black Box & White Box

Jonathan Jose Penaloza Rumie, Bellarmine UniversityFollow

Date of Project

4-5-2025

Document Type

Honors Thesis

School Name

College of Arts and Sciences

Department

Computer Science

Major Advisor

Dr. Sayani Sarkar

Second Advisor

Dr. Andrew Karem

Abstract

Adversarial attacks pose a significant threat to the reliability of machine learning-based spam detection systems in social media. This undergraduate thesis, "From Adversarial Attacks to Robust Classifiers: A Study in Social Media Spam Detection – Black Box & White Box," systematically examines the impact of both black-box and white-box adversarial attacks on a range of spam classifiers, including Logistic Regression, Decision Trees, Random Forests, K-Nearest Neighbors, Bagging, Gradient Boosting, and Support Vector Machines. Leveraging a novel dataset derived from Twitter spam messages and enhanced with adversarial perturbations such as synonym replacement and character-level modifications, this study evaluates classifier performance under realistic attack scenarios. Exploratory data analysis reveals how adversarial manipulation alters linguistic features and classification outcomes. Experimental results demonstrate that adversarial attacks can significantly degrade model accuracy, with white-box attacks generally proving more effective than black-box attacks. The thesis further discusses defense strategies and model robustness, offering practical insights for developing more resilient spam detection systems in the face of evolving adversarial threats. This work contributes to the fields of natural language processing, cybersecurity, and social media analytics by highlighting the urgent need for robust classifiers capable of withstanding adversarial manipulation.

Related Resource

From Adversarial Attacks to Robust Classifiers - A Study in Social Media Spam Detection

Recommended Citation

Penaloza Rumie, Jonathan Jose, "From Adversarial Attacks to Robust Classifiers - A Study in Social Media Spam Detection - Black Box & White Box" (2025). Undergraduate Theses. 188.
https://scholarworks.bellarmine.edu/ugrad_theses/188

Download

Available for download on Thursday, April 30, 2026

Included in

Computational Linguistics Commons, Cybersecurity Commons, Data Science Commons, Digital Communications and Networking Commons, Programming Languages and Compilers Commons, Social Media Commons, Systems and Communications Commons

COinS

ScholarWorks@Bellarmine

Undergraduate Theses

From Adversarial Attacks to Robust Classifiers - A Study in Social Media Spam Detection - Black Box & White Box

Date of Project

Document Type

School Name

Department

Major Advisor

Second Advisor

Abstract

Related Resource

Recommended Citation

Included in

Search

Browse

Author Corner

ScholarWorks@Bellarmine

Undergraduate Theses

From Adversarial Attacks to Robust Classifiers - A Study in Social Media Spam Detection - Black Box & White Box

Author

Date of Project

Document Type

School Name

Department

Major Advisor

Second Advisor

Abstract

Related Resource

Recommended Citation

Included in

Share

Search

Browse

Author Corner