2025.08.19 | Ovis2.5提升多模态；ComoRAG优化长叙事推理

本期的 15 篇论文如下：

[00:20] ✨ Ovis2.5 Technical Report（Ovis2.5 技术报告）

[00:51] 🧠 ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning（ComoRAG：一种认知启发式记忆组织RAG，用于有状态长叙事推理）

[01:14] 🎥 4DNeX: Feed-Forward 4D Generative Modeling Made Easy（4DNeX：前馈4D生成建模轻松实现）

[01:38] ✨ Next Visual Granularity Generation（下一视觉粒度生成）

[01:57] ⚡ Speed Always Wins: A Survey on Efficient Architectures for Large Language Models（速度至上：大型语言模型高效架构综述）

[02:30] 🤔 Has GPT-5 Achieved Spatial Intelligence? An Empirical Study（GPT-5是否已实现空间智能？一项实证研究）

[03:00] 🎮 HeroBench: A Benchmark for Long-Horizon Planning and Structured Reasoning in Virtual Worlds（HeroBench：虚拟世界中长周期规划与结构化推理的基准测试）

[03:26] ❗ When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs（当标点符号至关重要时：大型语言模型提示鲁棒性方法的大规模比较）

[03:56] 🎮 Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model（矩阵游戏 2.0：一个开源、实时、流式的交互式世界模型）

[04:21] 💡 Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models（Lumen：基于视频生成模型的一致性视频重打光与和谐背景替换）

[04:47] 🌐 G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration（G-CUT3R：融合相机与深度先验的引导式三维重建）

[05:15] ✨ S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models（S^2-Guidance：扩散模型无训练增强的随机自引导）

[05:49] 👂 Representing Speech Through Autoregressive Prediction of Cochlear Tokens（通过自回归预测耳蜗令牌实现语音表征）

[06:09] 💡 Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision Mapping（逆向LLaVA：通过文本到视觉映射消除对齐预训练）

[06:40] 🎬 Precise Action-to-Video Generation Through Visual Action Prompts（通过视觉动作提示实现精确的动作到视频生成）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

ข้อมูล