Up RLHF 作成: 2025-05-07
更新: 2026-04-12


RLHF : Reinforcement Learning from Human Feedback