本期的 15 篇论文如下:
[00:27] 🎭 HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning(HuMo:通过协同多模态条件控制实现以人为中心的视频生成)
[01:18] 🤖 SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning(SimpleVLA-RL:通过强化学习实现VLA训练规模化)
[02:02] 🗣 EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs(EchoX:基于回声训练弥合声学-语义鸿沟的语音大模型研究)
[02:37] 🎭 Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis(Kling-Avatar:面向级联长时化身动画合成的多模态指令语义落地方法)
[03:11] 🧭 Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents(驾驭不确定性:面向长周期LLM智能体的熵调制策略梯度方法)
[03:57] 🎨 FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark(FLUX-Reason-6M和PRISM-Bench:百万级文生图推理数据集与全面评测基准)
[04:34] 🤖 VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model(VLA-Adapter:面向小型视觉-语言-动作模型的有效范式)
[05:14] 🔄 Can Understanding and Generation Truly Benefit Together -- or Just Coexist?(理解与生成真能互惠共进,抑或仅共存?)
[05:46] 📹 SpatialVID: A Large-Scale Video Dataset with Spatial Annotations(SpatialVID大规模带空间标注的视频数据集)
[06:16] 📊 Visual Programmability: A Guide for Code-as-Thought in Chart Understanding(视觉可编程性:面向图表理解的Code-as-Thought指南)
[06:55] 🕵 Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval(梯度-注意力引导的双重掩码协同框架用于鲁棒的基于文本的人物检索)
[07:35] 🖼 2D Gaussian Splatting with Semantic Alignment for Image Inpainting(面向图像修复的语义对齐2D高斯泼溅)
[08:10] 📏 LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering(LoCoBench:面向复杂软件工程的长上下文大模型基准测试)
[08:45] 🤖 OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning(OmniEVA:面向具身任务的自适应3D感知与本体约束联合规划器)
[09:31] 🎯 The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward(散度选择:缓解可验证奖励强化学习多样性坍缩的关键)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
Informacje
- Program
- CzęstotliwośćUakt. codziennie
- Opublikowano12 września 2025 23:00 UTC
- Czas trwania11 min
- KlasyfikacjaDla wszystkich