HuggingFace 每日AI论文速递

2025.10.08 | TaTToo用外挂代码干翻大模型;4B小模型32步逼近闭源巨头

本期的 15 篇论文如下:

[00:24] 📊 TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning(TaTToo:面向表格推理测试时扩展的“工具落地思维”过程奖励模型)

[00:57] 🔍 Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs(Fathom-DeepResearch:解锁小模型长程信息检索与综合的钥匙)

[01:39] 🚀 Fast-dLLM v2: Efficient Block-Diffusion LLM(Fast-dLLM v2:高效的块扩散大语言模型)

[02:30] 🧑 CoDA: Coding LM via Diffusion Adaptation(CoDA:基于扩散适配的轻量级代码生成模型)

[03:01] 🧩 Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning(规模化代码辅助思维链与指令以增强模型推理)

[03:52] ⚖ ASPO: Asymmetric Importance Sampling Policy Optimization(ASPO:非对称重要性采样策略优化)

[04:34] 🔗 Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context(混合机制:语言模型如何在上下文中检索绑定实体)

[05:15] 🧠 AInstein: Assessing the Feasibility of AI-Generated Approaches to Research Problems(AInstein:评估AI生成科研方案可行性的研究框架)

[05:51] 🪂 Refusal Falls off a Cliff: How Safety Alignment Fails in Reasoning?(拒绝断崖:安全对齐在推理中为何崩塌)

[06:35] 🌍 HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video(HoloScene:单视频生成可交互3D仿真世界)

[07:22] ⚡ TensorBLEU: Vectorized GPU-based BLEU Score Implementation for Per-Sentence In-Training Evaluation(TensorBLEU:面向逐句训练评估的向量化GPU加速BLEU分数实现)

[08:09] 🎯 Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization(边缘自适应DPO:利用奖励模型实现偏好优化的粒度控制)

[09:00] 🩺 Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation(基于多模态大语言模型的离散扩散模型实现统一医学多模态生成)

[09:46] 🧠 MixReasoning: Switching Modes to Think(混合推理:动态切换思考模式)

[10:20] ⚡ LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation(LightCache:面向视频生成的内存高效、无需训练的加速方法)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递