Keywords
Logistic Regression, Multinomial Naive Bayes, KNearest Neighbor, Extreme Gradient Boosting, Bag of Words, Term Frequency-Inverse Document Frequency
Abstract
This study compares some of the popular machine learning techniques like Logistic Regression, Multinomial Naive Bayes, K-Nearest Neighbor, and Extreme Gradient Boosting to classify the tweets into three different categories: cyberbullying based on religion, cyberbullying based on ethnicity, or no cyberbullying. First, various data-cleaning approaches are used to clean the tweet data. After the data is clean and ready, the word embedding techniques, such as a bag of words and term frequency-Inverse document frequency, are used to convert the words into mathematical vectors. Finally, the model will be fitted using the combination of the above-mentioned word embedding techniques and machine learning algorithms.
Course Name
STA 5703 Data Mining 1
Instructor Name
Dr. Rui Xie
College
College of Sciences
STARS Citation
Dhakal, Pradip, "Cyberbullying Detection on Twitter Data Using Machine Learning Classifiers" (2024). Data Science and Data Mining. 23.
https://stars.library.ucf.edu/data-science-mining/23