Abstract

Feature engineering is one of the most important components in data mining and machine learning. One of the key thrusts in data mining is to answer: How should a low-dimensional geometry structure be extracted and reconstructed from high-dimensional data? To solve this issue, researchers proposed feature selection, PCA, sparsity regularization, factorization, embedding, and deep learning. However, existing techniques are limited in achieving full automation, globally optimal, and explainable explicitness. Can I address the automation, optimal, and explainability challenges in data geometry reconstruction? A low-dimensional data geometry structure is crucial for SciML methods (e.g., GP models), and the accuracy of these methods depends on how one can learn the data geometry structure from data or physics-based models. This dissertation will target the problem of automated identification of an optimal and explicit low-dimensional data geometry from high dimensional data. I will propose a novel principled self-optimizing data geometry reconstruction framework by viewing feature generation and selection from the lens of Reinforcement Learning (RL). I will show that reconstructing a low-dimensional data geometry (a.k.a., feature space) can be accomplished by an interactive nested feature generation and selection framework, where feature generation is to generate new meaningful and explicit features, feature selection is to subset redundant features to reduce dimensionality, and an optimized sequential structure of generations and selections will result into an optimized feature space for a downstream machine learning task. Finally, I will highlight that the search for such an optimized sequential structure can be generalized as an advanced cascading reinforcement learning system.

Notes

If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu.

Graduation Date

2022

Semester

Spring

Advisor

Fu, Yanjie

Degree

Doctor of Philosophy (Ph.D.)

College

College of Engineering and Computer Science

Department

Computer Science

Degree Program

Computer Science

Format

application/pdf

Identifier

CFE0009014; DP0026347

URL

https://purls.library.ucf.edu/go/DP0026347

Language

English

Release Date

May 2022

Length of Campus-only Access

None

Access Status

Doctoral Dissertation (Open Access)

Restricted to the UCF community until May 2022; it will then be open access.

Share

COinS