HuggingFace 每日AI论文速递

2025.11.05 | 向量草图测代码;先画后想补视觉

本期的 15 篇论文如下:

[00:21] 🖼 VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation(VCode:以SVG为符号视觉表征的多模态代码评测基准)

[01:12] 🧠 When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought(当可视化成为推理第一步:MIRA视觉思维链基准测试)

[01:48] ⚖ When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs(当模态冲突时:单模态推理不确定性如何左右多模态大模型的偏好)

[02:36] 🪙 Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR(更短却更好:用易题作长度正则化实现节俭推理)

[03:11] 🧠 Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer(Brain-IT:基于脑交互Transformer的fMRI图像重建)

[03:49] 👁 Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization(别让VLA变盲:对齐视觉表征实现分布外泛化)

[04:33] 🎨 LTD-Bench: Evaluating Large Language Models by Letting Them Draw(LTD-Bench:让大模型画画来测评空间推理力)

[05:15] 🤖 TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System(TWIST2:可扩展、便携且全面的人形机器人数据采集系统)

[06:01] 🗜 Can Visual Input Be Compressed? A Visual Token Compression Benchmark for Large Multimodal Models(视觉输入能否被压缩?面向大型多模态模型的视觉Token压缩基准)

[06:46] 🏆 CodeClash: Benchmarking Goal-Oriented Software Engineering(CodeClash:面向目标的软件工程基准测试)

[07:29] 🎭 VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models(VidEmo:面向情感中心视频基础模型的情感树推理)

[08:03] 🧠 BRAINS: A Retrieval-Augmented System for Alzheimer's Detection and Monitoring(BRAINS:用于阿尔茨海默病检测与监测的检索增强系统)

[08:42] 📊 ChartM$^3$: A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension(ChartM³:面向图表理解的多维多步视觉推理数据构建的多阶段代码驱动流水线)

[09:45] 📊 TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data(TabDSR:表格复杂数值推理的分解-清洗-推理框架)

[10:17] 🤖 iFlyBot-VLA Technical Report(iFlyBot-VLA技术报告:大规模视觉-语言-动作模型新框架)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递