HuggingFace 每日AI论文速递

2025.08.19 | Ovis2.5提升多模态;ComoRAG优化长叙事推理

本期的 15 篇论文如下:

[00:20] ✨ Ovis2.5 Technical Report(Ovis2.5 技术报告)

[00:51] 🧠 ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning(ComoRAG:一种认知启发式记忆组织RAG,用于有状态长叙事推理)

[01:14] 🎥 4DNeX: Feed-Forward 4D Generative Modeling Made Easy(4DNeX:前馈4D生成建模轻松实现)

[01:38] ✨ Next Visual Granularity Generation(下一视觉粒度生成)

[01:57] ⚡ Speed Always Wins: A Survey on Efficient Architectures for Large Language Models(速度至上:大型语言模型高效架构综述)

[02:30] 🤔 Has GPT-5 Achieved Spatial Intelligence? An Empirical Study(GPT-5是否已实现空间智能?一项实证研究)

[03:00] 🎮 HeroBench: A Benchmark for Long-Horizon Planning and Structured Reasoning in Virtual Worlds(HeroBench:虚拟世界中长周期规划与结构化推理的基准测试)

[03:26] ❗ When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs(当标点符号至关重要时:大型语言模型提示鲁棒性方法的大规模比较)

[03:56] 🎮 Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model(矩阵游戏 2.0:一个开源、实时、流式的交互式世界模型)

[04:21] 💡 Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models(Lumen:基于视频生成模型的一致性视频重打光与和谐背景替换)

[04:47] 🌐 G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration(G-CUT3R:融合相机与深度先验的引导式三维重建)

[05:15] ✨ S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models(S^2-Guidance:扩散模型无训练增强的随机自引导)

[05:49] 👂 Representing Speech Through Autoregressive Prediction of Cochlear Tokens(通过自回归预测耳蜗令牌实现语音表征)

[06:09] 💡 Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision Mapping(逆向LLaVA:通过文本到视觉映射消除对齐预训练)

[06:40] 🎬 Precise Action-to-Video Generation Through Visual Action Prompts(通过视觉动作提示实现精确的动作到视频生成)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递