本期的 15 篇论文如下:
[00:19] 🤖 The Landscape of Agentic Reinforcement Learning for LLMs: A Survey(面向大语言模型的智能体强化学习全景:一项综述)
[00:40] 🚀 SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning(SimpleTIR:面向多轮工具集成推理的端到端强化学习)
[01:12] 🤖 UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning(UI-TARS-2技术报告:通过多轮强化学习推进GUI代理)
[01:41] 🎥 ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding(ELV-Halluc:长视频理解中的语义聚合幻觉基准测试)
[02:12] 🔄 LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model(LLaVA-Critic-R1:你的评论模型其实是一个强大的策略模型)
[02:43] 🔧 VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use(VerlTool:迈向整体性代理强化学习与工具使用)
[03:11] 📄 POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion(POINTS-Reader:无蒸馏适配的视觉-语言模型用于文档转换)
[03:33] 🩺 Baichuan-M2: Scaling Medical Capability with Large Verifier System(百川-M2:通过大规模验证系统扩展医疗能力)
[03:57] 🎥 Kwai Keye-VL 1.5 Technical Report(快手 Keye-VL 1.5 技术报告)
[04:20] 🤖 Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR(通过监督学习框架实现隐式Actor-Critic耦合用于RLVR)
[04:45] 🧠 Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic(推理向量:通过任务算术传递思维链能力)
[05:11] 🔄 Jointly Reinforcing Diversity and Quality in Language Model Generations(在语言模型生成中联合强化多样性与质量)
[05:42] 🚀 DCPO: Dynamic Clipping Policy Optimization(DCPO: 动态裁剪策略优化)
[06:04] 🚀 OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning(OpenVision 2:用于多模态学习的生成式预训练视觉编码器系列)
[06:27] 🎬 GenCompositor: Generative Video Compositing with Diffusion Transformer(GenCompositor:基于扩散变换器的生成式视频合成)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
المعلومات
- البرنامج
- معدل البثيتم التحديث يوميًا
- تاريخ النشر٣ سبتمبر ٢٠٢٥ في ١١:٠٠ م UTC
- مدة الحلقة٧ من الدقائق
- التقييمملائم