本期的 15 篇论文如下:
[00:24] 📜 Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR(Baseer:面向阿拉伯文档OCR的视觉-语言模型)
[00:58] 🚀 Reinforcement Learning on Pre-Training Data(基于预训练数据的强化学习)
[01:37] 👁 Do You Need Proprioceptive States in Visuomotor Policies?(无需本体感觉状态的视觉-运动策略是否可行?)
[02:36] 🚀 MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe(MiniCPM-V 4.5:通过架构、数据与训练配方烹饪高效多模态大模型)
[03:24] 🎯 MAPO: Mixed Advantage Policy Optimization(混合优势策略优化:解决GRPO中优势分配难题)
[04:06] 🚀 Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation(Hyper-Bagel:统一加速多模态理解与生成的一体化框架)
[04:44] 🎯 VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction(VolSplat:基于体素对齐预测的前馈3D高斯抛雪球重建新范式)
[05:31] 🌌 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation(Lyra:基于视频扩散模型自蒸馏的生成式3D场景重建)
[06:08] 🧩 What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT(有效推理的密码:重新审视思维链长度、回顾与结构)
[06:41] 🗣 Large Language Models Discriminate Against Speakers of German Dialects(大型语言模型对德语方言使用者的歧视)
[07:32] 📊 OpenGVL - Benchmarking Visual Temporal Progress for Data Curation(OpenGVL——面向数据整理的视觉时序进展评测基准)
[08:19] 🪄 HyRF: Hybrid Radiance Fields for Memory-efficient and High-quality Novel View Synthesis(HyRF:混合辐射场实现内存高效且高质量的新视角合成)
[09:07] 🛠 CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching(条件感知重参数化对齐源域与目标域的流匹配)
[09:41] 🛰 Zero-Shot Multi-Spectral Learning: Reimagining a Generalist Multimodal Gemini 2.5 Model for Remote Sensing Applications(零样本多光谱学习:让通用多模态Gemini 2.5模型在遥感任务中重焕新生)
[10:28] 🌍 VIR-Bench: Evaluating Geospatial and Temporal Understanding of MLLMs via Travel Video Itinerary Reconstruction(VIR-Bench:通过旅行视频行程重建评测多模态大模型的地理-时空理解力)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
Information
- Show
- FrequencyUpdated daily
- Published24 September 2025 at 23:00 UTC
- Length12 min
- RatingClean