Keywords
Ethical AI Decision-Making; Reinforcement Learning with Human Feedback (RLHF); AI Ethics and Human Values; Robust; Deep Reinforcement Learning; Artificial Intelligence
Abstract
The emergence of reinforcement learning from human feedback (RLHF) has made great strides toward giving AI decision-making the ability to learn from external human advice. In general, this machine learning technique is concerned with producing agents that learn to work toward optimizing and achieving some goal, advanced by interactions with the environment and feedback given in terms of a quantifiable reward. In the scope of this project, we seek to merge the intricate realms of AI robustness, ethical decision-making, and RLHF. With no way to truly quantify human values, human feedback is an essential bridge in the learning process, allowing AI models to reflect better ethical principles rather than just replicating human behavior. By exploring the transformative potential of RLHF in AI-human interactions, acknowledging the dynamic nature of human behavior beyond simplistic models, and emphasizing the necessity for ethically framed AI systems, this thesis constructs a deep reinforcement learning framework that is not only robust but also well aligned with human ethical standards. Through a methodology that incorporates simulated ethical dilemmas and evaluates AI decisions against established ethical frameworks, the focus is to contribute significantly to the understanding and application of RLHF in creating AI systems that embody robustness and ethical integrity.
Thesis Completion Year
2024
Thesis Completion Semester
Fall
Thesis Chair
Wang, Yue
College
College of Engineering and Computer Science
Department
Electrical and Computer Engineering
Thesis Discipline
Computer Engineering
Language
English
Access Status
Open Access
Length of Campus Access
None
Campus Location
Orlando (Main) Campus
STARS Citation
Plasencia, Marco M., "Reinforcement Learning From Human Feedback For Ethically Robust Ai Decision-Making" (2024). Honors Undergraduate Theses. 212.
https://stars.library.ucf.edu/hut2024/212