cyberbullying detection, social media, machine learning, classification, feature extraction


The popularity of the electronic mobile devices along with social media as well as networking websites have been tremendously increased in the recent year. Most people around the world daily engage in the variety of cyberspace additives. Even though the users can take most advantages of these system such as exchange the idea and information, being sociable, and enjoyments, they might be faced with such adverse behaviors such as toxicity, bullying, extremism, and cruelty. The recent statistics reports that such mentioned behaviors has been noticeably grown on the cyberspace such that can threaten the individuals and even any community. Thus, it is drastically demand to invent a device to detect cyberbullying automatically. To do so, most studies are using the idea of the classifcation and then machine learning algorithms to build such a device. In this study, therefore, we employed some active machine learning models like Logistic Regression(LR), Multinomial Naive Bayes(MNB), K-Nearest Neighbor (KNN), and Extreme Gradient Boosting(XGBoost) on Twitter textual dataset to detect the quality of cyberbullying related to ethnicity and religion. Since the data is contextual, we used some feature extraction techniques like Bag of Words and TFIDF to convert the texts into numerical sets. According the computational results, we saw XGBoost and LR achieves the highest performance.


Fall 2024

Course Name

STA 5703 Data Mining 1

Instructor Name

Rui Xie

Accessibility Status

PDF accessibility verified using Adobe Acrobat Pro Accessibility Checker

Included in

Data Science Commons