本期的 15 篇论文如下:
[00:21] 🖼 Ovis-U1 Technical Report(Ovis-U1 技术报告)
[00:58] 🎬 VMoBA: Mixture-of-Block Attention for Video Diffusion Models(VMoBA:用于视频扩散模型的混合块注意力机制)
[01:36] ✍ Calligrapher: Freestyle Text Image Customization(书法家:自由风格的文本图像定制)
[02:21] 🖼 Listener-Rewarded Thinking in VLMs for Image Preferences(图像偏好:视觉语言模型中基于监听者奖励的思考)
[03:04] 🧠 SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning(SPIRAL:基于零和博弈的自博弈通过多智能体多轮强化学习激励推理)
[03:46] 📸 Consistent Time-of-Flight Depth Denoising via Graph-Informed Geometric Attention(基于图结构几何注意力机制的稳定ToF深度图像去噪)
[04:29] 🧬 Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective(上下文演化提示:一种开放式、自复制的视角)
[05:09] 🤔 Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling?(“顿悟时刻”再探:视觉语言模型能否在推理时扩展中实现真正的自我验证?)
[05:58] 💾 MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation(MEMFOF:面向内存高效多帧光流估计的高分辨率训练)
[06:38] 🚀 SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity(SparseLoRA:通过上下文稀疏性加速LLM微调)
[07:23] 🏙 UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding(UrbanLLaVA:一个用于城市智能的、具备空间推理与理解能力的多模态大型语言模型)
[08:01] 🧠 MARBLE: A Hard Benchmark for Multimodal Spatial Reasoning and Planning(MARBLE:一个用于多模态空间推理与规划的硬基准)
[08:38] 🧰 Teaching a Language Model to Speak the Language of Tools(教语言模型说工具的语言)
[09:16] ✂ VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs(VOCABTRIM:用于LLM高效推测解码的词汇表剪枝)
[10:01] 🤖 RoboScape: Physics-informed Embodied World Model(RoboScape:物理信息驱动的具身世界模型)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
Information
- Show
- FrequencyUpdated daily
- Published1 July 2025 at 23:00 UTC
- Length11 min
- RatingClean