CFE-CMStatistics 2024: Start Registration
View Submission - CFECMStatistics2024
A1255
Title: Dual active learning for reinforcement learning from human feedback Authors:  Wei Sun - Purdue University (United States) [presenting]
Abstract: Aligning large language models (LLMs) with human preferences is critical to recent advances in generative artificial intelligence. Reinforcement learning from human feedback (RLHF) is widely applied to achieve this objective. A key step in RLHF is to learn the reward function from human feedback. However, human feedback is costly and time-consuming, so an effective strategy to collect human feedback within a sample budget is essential. Additionally, different teachers have different levels of rationality in various types of contexts, making it critical to query the most informative teachers for their preferences. A dual active reward learning policy is introduced for the simultaneous selection of contexts and teachers motivated by the idea of D-optimal design.