Keywords
NLP, Natural Language Processing, Cyberbullying, Twitter, classifcation, TF-IDF, bag-of-words
Abstract
Cyberbullying refers to the act of bullying using electronic means and the internet. In recent years, this act has been identifed to be a major problem among young people and even adults. It can negatively impact one’s emotions and lead to adverse outcomes like depression, anxiety, harassment, and suicide, among others. This has led to the need to employ machine learning techniques to automatically detect cyberbullying and prevent them on various social media platforms. In this study, we want to analyze the combination of some Natural Language Processing (NLP) algorithms (such as Bag-of-Words and TFIDF) with some popular machine learning algorithms (such as Logistic Regression (LR), Naive Bayes (NB), K-Nearest Neighbor (KNN), and Extreme Gradient Boosting( XGboost)) to detect cyberbullying on Twitter. The NLP methods were employed to extract features from tweets and convert them to numerical vectors and these features were analyzed with the machine learning algorithms. Comparing their performances and accuracy, the Extreme Gradient Boosting( XGboost) model emerged as the best-performing classifer irrespective of whether it uses features from bag-of-words or TF-IDF.
Semester
Spring 2024
Course Name
STA 5703 Data Mining 1
Instructor Name
Xie, Rui
College
College of Sciences
STARS Citation
Fiagbe, Roland, "Machine Learning Approaches for Cyberbullying Detection" (2024). Data Science and Data Mining. 18.
https://stars.library.ucf.edu/data-science-mining/18
Accessibility Status
PDF accessibility verified using Adobe Acrobat Pro Accessibility Checker
Included in
Analysis Commons, Applied Statistics Commons, Data Science Commons, Probability Commons, Statistical Methodology Commons, Statistical Models Commons, Statistical Theory Commons