Keywords

Logistic Regression, Multinomial Naive Bayes, KNearest Neighbor, Extreme Gradient Boosting, Bag of Words, Term Frequency-Inverse Document Frequency

Abstract

This study compares some of the popular machine learning techniques like Logistic Regression, Multinomial Naive Bayes, K-Nearest Neighbor, and Extreme Gradient Boosting to classify the tweets into three different categories: cyberbullying based on religion, cyberbullying based on ethnicity, or no cyberbullying. First, various data-cleaning approaches are used to clean the tweet data. After the data is clean and ready, the word embedding techniques, such as a bag of words and term frequency-Inverse document frequency, are used to convert the words into mathematical vectors. Finally, the model will be fitted using the combination of the above-mentioned word embedding techniques and machine learning algorithms.

Course Name

STA 5703 Data Mining 1

Instructor Name

Dr. Rui Xie

College

College of Sciences

STARS Citation

Dhakal, Pradip, "Cyberbullying Detection on Twitter Data Using Machine Learning Classifiers" (2024). Data Science and Data Mining. 23.
https://stars.library.ucf.edu/data-science-mining/23

Download

Included in

Data Science Commons

COinS

Data Science and Data Mining

Cyberbullying Detection on Twitter Data Using Machine Learning Classifiers

Keywords

Abstract

Course Name

Instructor Name

College

STARS Citation

Included in

Explore

Connect

Data Science and Data Mining

Cyberbullying Detection on Twitter Data Using Machine Learning Classifiers

Author(s)

Keywords

Abstract

Course Name

Instructor Name

College

STARS Citation

Included in

Share

Explore

Connect