Keywords
Automated Feature Engineering; Reinforcement Learning; Deep Sequential Learning;
Abstract
Unlike humans, AI systems are brittle and not robust. They often struggle when faced with novel situations, and are highly sensitive to small perturbations, which can lead to catastrophically poor performance. These systems comprise two main components: the model and the data. In recent decades, research has primarily focused on models, emphasizing advanced structures or algorithms to enhance AI performance. However, the data-centric aspect consumes most of the time and resources of human experts and greatly influences AI systems. Furthermore, the gains from the model-centric part are reaching a plateau. Thus, I shifted my research focus toward data-centric AI in order to identify the ideal feature space for preparing the AI-readiness of data. To realize this, this dissertation introduces two main research perspectives and the corresponding frameworks: 1) the decision-making perspective and 2) the generative AI perspective. The decision-making perspective formulates feature selection and feature generation as Markov decision-making processes. Within this perspective, reinforcement learning is used to develop practical frameworks due to its proficiency in optimizing such processes. Specifically, for feature selection, a single agent is employed to determine the selection of individual features in an iterative manner. For feature generation, a cascading reinforced agent structure is proposed to select candidate features and operations for generating new features. The generative AI perspective assumes that the knowledge derived from discrete feature learning records can be effectively integrated into a continuous space. This integration facilitates the exploration of an optimal feature space inspired by the successes of generative AI techniques. Thus, a unified framework is proposed to optimize both tasks, which has four key steps: data collection, continuous space construction, enhanced embedding search, and feature space reconstruction. The effectiveness of both perspectives underscores the potential for building up foundation models in the data-centric AI domain.
Completion Date
2024
Semester
Spring
Committee Chair
Fu, Yanjie
Degree
Doctor of Philosophy (Ph.D.)
College
College of Engineering and Computer Science
Department
Computer Science
Degree Program
Computer and Information Sciences
Format
application/pdf
Language
English
Rights
In copyright
Release Date
November 2024
Length of Campus-only Access
None
Access Status
Doctoral Dissertation (Open Access)
Campus Location
Orlando (Main) Campus
STARS Citation
Wang, Dongjie, "Data-Centric AI: Taming AI-ready Feature Space from Decision-Making to Generative-AI Perspectives" (2024). Graduate Thesis and Dissertation 2023-2024. 462.
https://stars.library.ucf.edu/etd2023/462
Accessibility Status
Meets minimum standards for ETDs/HUTs