HuggingFace 每日AI论文速递

2025.09.29 | 实时长视频边聊边播;分位数基线稳控推理熵

本期的 15 篇论文如下:

[00:20] 🎬 LongLive: Real-time Interactive Long Video Generation(LongLive:实时交互式长视频生成框架)

[00:56] 🎯 Quantile Advantage Estimation for Entropy-Safe Reasoning(用于熵安全推理的分位数优势估计)

[01:34] 📄 MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing(MinerU2.5:面向高效高分辨率文档解析的解耦视觉-语言模型)

[02:11] 🧠 EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning(EPO:面向LLM智能体强化学习的熵正则策略优化)

[03:08] 🧠 Variational Reasoning for Language Models(语言模型的变分推理框架)

[03:37] 💬 Language Models Can Learn from Verbal Feedback Without Scalar Rewards(无需标量奖励,语言模型也能从语言反馈中学习)

[04:32] 🔍 ReviewScore: Misinformed Peer Review Detection with Large Language Models(ReviewScore:用大模型揪出“跑偏”的同行评审)

[05:12] 🎯 CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning(CapRL:用强化学习激发稠密图像描述潜能)

[05:49] 🪄 MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning(MesaTask:面向任务驱动的桌面场景生成与3D空间推理)

[06:32] 🎯 No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping(零方差提示不浪费:基于熵引导优势塑造的LLM强化学习新范式)

[07:14] 🗣 VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing(VoiceAssistant-Eval:横跨听、说、看的AI助手基准测评)

[07:58] 🧭 UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios(UltraHorizon:在长周期场景中评估智能体能力的基准)

[08:29] 🖼 LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer(LucidFlux:无需文字描述的大规模扩散Transformer通用图像修复)

[09:16] 🌐 WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning(WebGen-Agent:借助多级反馈与步骤级强化学习提升交互式网页生成)

[09:49] 🔄 SPARK: Synergistic Policy And Reward Co-Evolving Framework(SPARK:策略与奖励协同演化的强化学习框架)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递