cyberbullying, social media, machine learning, classification, feature extraction


The use of electronic mobile devices, social media, and networking websites has increased tremendously in recent years. Despite the advantages of these systems, such as exchanging ideas and information, being sociable, and providing entertainment, users may encounter adverse behaviors like toxicity, bullying, extremism, and cruelty. The prevalence of such behaviors has grown significantly in cyberspace, posing a threat to individuals and communities. To address this issue, there is a high demand for automated cyberbullying detection systems. Machine learning algorithms have been widely used to build such systems by classifying and detecting cyberbullying. In this study, we employed popular machine learning models such as Logistic Regression (LR), Multinomial Naive Bayes (MNB), K-Nearest Neighbor (KNN), and Extreme Gradient Boosting (XGboost) on a Twitter textual dataset to detect cyberbullying related to ethnicity and religion. To convert the textual data into numerical sets, we used feature extraction techniques such as Bag of Words and TF-IDF. Our results indicate that XGboost and LR achieve the highest performance.


Fall 2023

Course Name

STA 5703 Data Mining 1

Instructor Name

Xie, Rui

Accessibility Status

PDF accessibility verified using Adobe Acrobat Pro Accessibility Checker

Included in

Data Science Commons