1天前
16 分钟

Value Flows: Flow-Based Distributional Reinforcement Learning

This paper introduces "Value Flows," a novel reinforcement learning algorithm that uses flow-based models to estimate the full future return distribution, instead of flattening it to a single scalar value like traditional methods. This approach is designed to provide richer learning signals and better estimations of aleatoric uncertainty (return variance), which is then used to prioritize learning on uncertain transitions. The abstract and text detail how a new flow-matching objective is formulated to satisfy the distributional Bellman equation, while accompanying images illustrate this concept with a violin plot of return distributions and screenshots of a robotic manipulation task used for evaluation. Experiments demonstrate that Value Flows significantly outperforms prior offline and online-to-online RL methods across various tasks by achieving a 1.3× improvement in success rates and a lower distributional discrepancy.

单集网页

节目

Best AI papers explained
频率

一周一更
发布时间

2025年10月14日 UTC 21:24
长度

16 分钟
分级

儿童适宜

Value Flows: Flow-Based Distributional Reinforcement Learning

信息