2D AGO
17 MIN

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

This paper investigate two major drawbacks in the reward learning phase of RLHF: reward overfitting and reward overoptimization, which often occur because the standard cross-entropy loss is inadequate for imbalanced preference datasets. To address these issues, the paper introduces a novel algorithm called Iterative Data Smoothing (IDS), which mitigates these problems by iteratively updating hard comparison labels with softer, model-predicted labels during training. Theoretical analysis and empirical results in both multi-armed bandit and neural network settings demonstrate that IDS outperforms traditional Maximum Likelihood Estimation (MLE), offering a more robust approach to reward training.

Episode Webpage

Show

Best AI papers explained
Frequency

Updated Weekly
Published

October 9, 2025 at 4:35 PM UTC
Length

17 min
Rating

Clean

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

Information