Abstract

Low-resource and label-efficient machine learning methods can be described as the family of statistical and machine learning techniques that can achieve high performance without needing a substantial amount of labeled data. These methods include both unsupervised learning techniques, such as LDA, and supervised methods, such as active learning, each providing different benefits. Thus, this dissertation is devoted to the design and analysis of unsupervised and supervised techniques to provide solutions for the following problems: Unsupervised narrative summary extraction for social media content, Social media text classification with Active Learning (AL), Investigating restrictions and benefits of using Curriculum Learning (CL) for social media text classification. For the first problem, we present a framework that can identify the viral topics over time and provide a narrative summary for the identified topics in an unsupervised manner. Our framework can provide such information with varying time resolution. For the second problem, we present a strategy that conducts data sampling based on the local structures in the embedding space of a large pretrained language model. The data selection for annotation is conducted for the data samples that do not belong to a dominant set as these samples are less similar to the rest of the data points, and accordingly, are more challenging for the model. This criterion is a compelling technique that minimizes the need for large annotated datasets. Then for the third problem, we consider similar data difficulty notions to study the impacts of learning from such a curriculum to train models from easy samples first. This is opposite to the idea of active learning. However, instead of learning from a small number of data and disregarding a substantial amount of information, gradual training from easy samples leads to learning a trajectory to a better local minimum. Our study includes curricula based on both heuristics and model-derived.

Notes

If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu

Graduation Date

2022

Semester

Fall

Advisor

Garibay, Ivan

Degree

Doctor of Philosophy (Ph.D.)

College

College of Engineering and Computer Science

Department

Computer Science

Degree Program

Computer Science

Identifier

CFE0009820; DP0027761

URL

https://purls.library.ucf.edu/go/DP0027761

Language

English

Release Date

June 2023

Length of Campus-only Access

None

Access Status

Doctoral Dissertation (Open Access)

Share

COinS