9 HR AGO
11 MIN

2025.10.02 | MCTS破局RLVR瓶颈；GEM开源智能体训练场

本期的 15 篇论文如下：

[00:19] 🧠 DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search（DeepSearch：以蒙特卡洛树搜索破解强化学习可验证奖励瓶颈）

[01:20] 🤖 GEM: A Gym for Agentic LLMs（GEM：面向智能体大模型的开放训练场）

[01:57] 🧠 VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators（VLA-RFT：基于世界模拟器与验证奖励的视觉-语言-动作强化微调）

[02:36] 🎒 Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation（背包强化学习：通过优化预算分配解锁大模型探索潜能）

[03:06] 🎬 Code2Video: A Code-centric Paradigm for Educational Video Generation（Code2Video：面向教育视频生成的代码中心范式）

[03:41] ⚙ PIPer: On-Device Environment Setup via Online Reinforcement Learning（PIPer：基于在线强化学习的设备端环境自动配置）

[04:11] 🗜 ACON: Optimizing Context Compression for Long-horizon LLM Agents（ACON：面向长程LLM智能体的上下文压缩优化）

[04:52] 🔍 Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls（为何Transformer学不会乘法？逆向工程揭示长程依赖陷阱）

[05:22] ⚖ BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses（BiasFreeBench：面向大语言模型去偏响应评测的统一基准）

[06:01] ⚡ Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution（Flash-Searcher：基于DAG并行执行的极速高效网络智能体）

[06:42] 🚀 BroRL: Scaling Reinforcement Learning via Broadened Exploration（BroRL：通过拓宽探索规模来扩展强化学习）

[07:25] 📊 Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum（超越对数似然：面向模型能力连续谱的监督微调概率目标）

[08:02] 🎯 On Predictability of Reinforcement Learning Dynamics for Large Language Models（论大型语言模型强化学习动力学的可预测性）

[08:31] 🖥 GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness（GUI-KV：面向具备时空感知的高效GUI智能体的KV缓存方案）

[09:17] 🧠 Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned（训练视觉-语言过程奖励模型以实现多模态推理测试时扩展：关键洞见与经验总结）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

Episode Webpage

Show

HuggingFace 每日AI论文速递
Frequency

Updated daily
Published

2 October 2025 at 23:00 UTC
Length

11 min
Rating

Clean

2025.10.02 | MCTS破局RLVR瓶颈；GEM开源智能体训练场

Information