Low-resource and label-efficient machine learning methods can be described as the family of statistical and machine learning techniques that can achieve high performance without needing a substantial amount of labeled data. These methods include both unsupervised learning techniques, such as LDA, and supervised methods, such as active learning, each providing different benefits. Thus, this dissertation is devoted to the design and analysis of unsupervised and supervised techniques to provide solutions for the following problems: Unsupervised narrative summary extraction for social media content, Social media text classification with Active Learning (AL), Investigating restrictions and benefits of using Curriculum Learning (CL) for social media text classification. For the first problem, we present a framework that can identify the viral topics over time and provide a narrative summary for the identified topics in an unsupervised manner. Our framework can provide such information with varying time resolution. For the second problem, we present a strategy that conducts data sampling based on the local structures in the embedding space of a large pretrained language model. The data selection for annotation is conducted for the data samples that do not belong to a dominant set as these samples are less similar to the rest of the data points, and accordingly, are more challenging for the model. This criterion is a compelling technique that minimizes the need for large annotated datasets. Then for the third problem, we consider similar data difficulty notions to study the impacts of learning from such a curriculum to train models from easy samples first. This is opposite to the idea of active learning. However, instead of learning from a small number of data and disregarding a substantial amount of information, gradual training from easy samples leads to learning a trajectory to a better local minimum. Our study includes curricula based on both heuristics and model-derived.
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Doctor of Philosophy (Ph.D.)
College of Engineering and Computer Science
Length of Campus-only Access
Doctoral Dissertation (Open Access)
Amanzadeh Oghaz, Toktam, "Low-Resource Machine Learning Techniques for the Analysis of Online Social Media Textual Data" (2022). Electronic Theses and Dissertations, 2020-. 1734.