2025.06.17 | MiniMax-M1提升推理性能;多模态模型认知测试创新。

HuggingFace 每日AI论文速递

本期的 15 篇论文如下:

[00:22] 💡 MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention(MiniMax-M1:利用闪电注意力高效扩展测试时计算)

[01:00] 🔬 Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning(科学家的首次考试:通过感知、理解和推理来探索多模态大型语言模型的认知能力)

[01:47] 🧐 DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents(DeepResearch Bench:一个面向深度研究Agent的综合性评测基准)

[02:28] 🧠 DoTA-RAG: Dynamic of Thought Aggregation RAG(思想动态聚合RAG:一种用于大规模网络知识索引的检索增强生成系统)

[03:08] 🧠 Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning(Ego-R1:用于超长第一视角视频推理的工具链式思考)

[03:52] 💡 Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency(等等,我们不需要“等等”!移除思考Token提升推理效率)

[04:28] 🤖 TaskCraft: Automated Generation of Agentic Tasks(任务工坊:自动化生成自主Agent任务)

[05:04] 🤯 Discrete Diffusion in Large Language and Multimodal Models: A Survey(大型语言和多模态模型中的离散扩散:一项综述)

[05:42] 🪞 Test3R: Learning to Reconstruct 3D at Test Time(Test3R:测试时学习三维重建)

[06:25] 🖼 VGR: Visual Grounded Reasoning(VGR:视觉基础推理)

[07:06] 🤖 PersonaFeedback: A Large-scale Human-annotated Benchmark For Personalization(PersonaFeedback:一个大规模的人工标注的个性化基准)

[07:50] 🤖 From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding(从真实到合成:通过属性化基础生成数百万条多样化且复杂的用户指令)

[08:32] 🤖 BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models(BridgeVLA: 基于输入-输出对齐的视觉-语言模型高效3D操作学习)

[09:11] 🧠 Language Surgery in Multilingual Large Language Models(多语言大型语言模型中的语言手术)

[09:44] 🤖 AI Agent Behavioral Science(人工智能体行为科学)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

To listen to explicit episodes, sign in.

Stay up to date with this show

Sign in or sign up to follow shows, save episodes and get the latest updates.

Select a country or region

Africa, Middle East, and India

Asia Pacific

Europe

Latin America and the Caribbean

The United States and Canada