Keywords

Ethical AI Decision-Making; Reinforcement Learning with Human Feedback (RLHF); AI Ethics and Human Values; Robust; Deep Reinforcement Learning; Artificial Intelligence

Abstract

The emergence of reinforcement learning from human feedback (RLHF) has made great strides toward giving AI decision-making the ability to learn from external human advice. In general, this machine learning technique is concerned with producing agents that learn to work toward optimizing and achieving some goal, advanced by interactions with the environment and feedback given in terms of a quantifiable reward. In the scope of this project, we seek to merge the intricate realms of AI robustness, ethical decision-making, and RLHF. With no way to truly quantify human values, human feedback is an essential bridge in the learning process, allowing AI models to reflect better ethical principles rather than just replicating human behavior. By exploring the transformative potential of RLHF in AI-human interactions, acknowledging the dynamic nature of human behavior beyond simplistic models, and emphasizing the necessity for ethically framed AI systems, this thesis constructs a deep reinforcement learning framework that is not only robust but also well aligned with human ethical standards. Through a methodology that incorporates simulated ethical dilemmas and evaluates AI decisions against established ethical frameworks, the focus is to contribute significantly to the understanding and application of RLHF in creating AI systems that embody robustness and ethical integrity.

Thesis Completion Year

2024

Thesis Completion Semester

Fall

Thesis Chair

Wang, Yue

College

College of Engineering and Computer Science

Department

Electrical and Computer Engineering

Thesis Discipline

Computer Engineering

Language

English

Access Status

Open Access

Length of Campus Access

None

Campus Location

Orlando (Main) Campus

Share

COinS
 

Rights Statement

In Copyright