This paper introduces "Value Flows," a novel reinforcement learning algorithm that uses flow-based models to estimate the full future return distribution, instead of flattening it to a single scalar value like traditional methods. This approach is designed to provide richer learning signals and better estimations of aleatoric uncertainty (return variance), which is then used to prioritize learning on uncertain transitions. The abstract and text detail how a new flow-matching objective is formulated to satisfy the distributional Bellman equation, while accompanying images illustrate this concept with a violin plot of return distributions and screenshots of a robotic manipulation task used for evaluation. Experiments demonstrate that Value Flows significantly outperforms prior offline and online-to-online RL methods across various tasks by achieving a 1.3× improvement in success rates and a lower distributional discrepancy.
信息
- 节目
- 频率一周一更
- 发布时间2025年10月14日 UTC 21:24
- 长度16 分钟
- 分级儿童适宜