本期的 15 篇论文如下:
[00:20] 🎬 LongLive: Real-time Interactive Long Video Generation(LongLive:实时交互式长视频生成框架)
[00:56] 🎯 Quantile Advantage Estimation for Entropy-Safe Reasoning(用于熵安全推理的分位数优势估计)
[01:34] 📄 MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing(MinerU2.5:面向高效高分辨率文档解析的解耦视觉-语言模型)
[02:11] 🧠 EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning(EPO:面向LLM智能体强化学习的熵正则策略优化)
[03:08] 🧠 Variational Reasoning for Language Models(语言模型的变分推理框架)
[03:37] 💬 Language Models Can Learn from Verbal Feedback Without Scalar Rewards(无需标量奖励,语言模型也能从语言反馈中学习)
[04:32] 🔍 ReviewScore: Misinformed Peer Review Detection with Large Language Models(ReviewScore:用大模型揪出“跑偏”的同行评审)
[05:12] 🎯 CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning(CapRL:用强化学习激发稠密图像描述潜能)
[05:49] 🪄 MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning(MesaTask:面向任务驱动的桌面场景生成与3D空间推理)
[06:32] 🎯 No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping(零方差提示不浪费:基于熵引导优势塑造的LLM强化学习新范式)
[07:14] 🗣 VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing(VoiceAssistant-Eval:横跨听、说、看的AI助手基准测评)
[07:58] 🧭 UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios(UltraHorizon:在长周期场景中评估智能体能力的基准)
[08:29] 🖼 LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer(LucidFlux:无需文字描述的大规模扩散Transformer通用图像修复)
[09:16] 🌐 WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning(WebGen-Agent:借助多级反馈与步骤级强化学习提升交互式网页生成)
[09:49] 🔄 SPARK: Synergistic Policy And Reward Co-Evolving Framework(SPARK:策略与奖励协同演化的强化学习框架)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
Informacje
- Program
- CzęstotliwośćUakt. codziennie
- Opublikowano29 września 2025 23:00 UTC
- Czas trwania11 min
- KlasyfikacjaDla wszystkich