Keywords

Automated Feature Engineering; Reinforcement Learning; Deep Sequential Learning;

Abstract

Unlike humans, AI systems are brittle and not robust. They often struggle when faced with novel situations, and are highly sensitive to small perturbations, which can lead to catastrophically poor performance. These systems comprise two main components: the model and the data. In recent decades, research has primarily focused on models, emphasizing advanced structures or algorithms to enhance AI performance. However, the data-centric aspect consumes most of the time and resources of human experts and greatly influences AI systems. Furthermore, the gains from the model-centric part are reaching a plateau. Thus, I shifted my research focus toward data-centric AI in order to identify the ideal feature space for preparing the AI-readiness of data. To realize this, this dissertation introduces two main research perspectives and the corresponding frameworks: 1) the decision-making perspective and 2) the generative AI perspective. The decision-making perspective formulates feature selection and feature generation as Markov decision-making processes. Within this perspective, reinforcement learning is used to develop practical frameworks due to its proficiency in optimizing such processes. Specifically, for feature selection, a single agent is employed to determine the selection of individual features in an iterative manner. For feature generation, a cascading reinforced agent structure is proposed to select candidate features and operations for generating new features. The generative AI perspective assumes that the knowledge derived from discrete feature learning records can be effectively integrated into a continuous space. This integration facilitates the exploration of an optimal feature space inspired by the successes of generative AI techniques. Thus, a unified framework is proposed to optimize both tasks, which has four key steps: data collection, continuous space construction, enhanced embedding search, and feature space reconstruction. The effectiveness of both perspectives underscores the potential for building up foundation models in the data-centric AI domain.

Completion Date

2024

Semester

Spring

Committee Chair

Fu, Yanjie

Degree

Doctor of Philosophy (Ph.D.)

College

College of Engineering and Computer Science

Department

Computer Science

Degree Program

Computer and Information Sciences

Format

application/pdf

Language

English

Rights

In copyright

Release Date

November 2024

Length of Campus-only Access

None

Access Status

Doctoral Dissertation (Open Access)

Campus Location

Orlando (Main) Campus

STARS Citation

Wang, Dongjie, "Data-Centric AI: Taming AI-ready Feature Space from Decision-Making to Generative-AI Perspectives" (2024). Graduate Thesis and Dissertation 2023-2024. 462.
https://stars.library.ucf.edu/etd2023/462

Accessibility Status

Meets minimum standards for ETDs/HUTs

Download

COinS

Graduate Thesis and Dissertation 2023-2024

Data-Centric AI: Taming AI-ready Feature Space from Decision-Making to Generative-AI Perspectives

Keywords

Abstract

Completion Date

Semester

Committee Chair

Degree

College

Department

Degree Program

Format

Language

Rights

Release Date

Length of Campus-only Access

Access Status

Campus Location

STARS Citation

Accessibility Status

Browse Advisors

Explore

Connect

Graduate Thesis and Dissertation 2023-2024

Data-Centric AI: Taming AI-ready Feature Space from Decision-Making to Generative-AI Perspectives

Author

Keywords

Abstract

Completion Date

Semester

Committee Chair

Degree

College

Department

Degree Program

Format

Language

Rights

Release Date

Length of Campus-only Access

Access Status

Campus Location

STARS Citation

Accessibility Status

Share

Browse Advisors

Explore

Connect