HuggingFace 每日AI论文速递

duan

5,0 (2)
TECNOLOGIA
DIÁRIO

每天10分钟，带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新，欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版，可在小红书搜索并关注【AI速递】

HÁ 6 H

2025.09.09 | REER提升推理性能；WebExplorer训练智能体

本期的 15 篇论文如下： [00:21] 💡 Reverse-Engineered Reasoning for Open-Ended Generation（面向开放式生成的逆向工程推理） [00:47] 🌐 WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents（WebExplorer：探索与演进，用于训练长周期网络智能体） [01:17] 🚀 Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models（革新扩散大语言模型的强化学习框架） [01:38] 🤔 Does DINOv3 Set a New Medical Vision Standard?（DINOv3 能否树立医学视觉新标准？） [02:06] 🛠 Reinforced Visual Perception with Tools（基于工具的强化视觉感知） [02:26] 🤖 Reinforcement Learning Foundations for Deep Research Systems: A Survey（深度研究系统中的强化学习基础：综述） [02:55] 👁 Focusing by Contrastive Attention: Enhancing VLMs' Visual Reasoning（通过对比注意力聚焦：增强VLM的视觉推理能力） [03:28] 🎥 UniVerse-1: Unified Audio-Video Generation via Stitching of Experts（UniVerse-1：通过专家模型拼接实现统一音视频生成） [03:50] 🤔 Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?（绘画易于思考：文生图模型能布景，但无法主导剧情吗？） [04:12] 🤔 Interleaving Reasoning for Better Text-to-Image Generation（通过交错推理提升文本到图像生成） [04:37] 🤖 Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI Agents（Paper2Agent：将研究论文重构为交互式可靠的AI代理） [05:05] ⚙ Guided Decoding and Its Critical Role in Retrieval-Augmented Generation（引导式解码及其在检索增强生成中的关键作用） [05:36] 🚀 Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers（扩展用于大型语言模型分步证明器的多轮离策略强化学习和多智能体树搜索） [06:04] 🛡 \texttt{R$^\textbf{2}$AI}: Towards Resistant and Resilient AI in an Evolving World（R$^2$AI：迈向演进世界中的抵抗性与韧性AI） [06:30] 🌍 Llama-GENBA-10B: A Trilingual Large Language Model for German, English and Bavarian（Llama-GENBA-10B：一个德语、英语和巴伐利亚语三语大型语言模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

7min
HÁ 1 DIA

2025.09.08 | 语言模型幻觉源于预训练；大模型图形编程性能提升

本期的 12 篇论文如下： [00:24] 🤔 Why Language Models Hallucinate（语言模型为何产生幻觉） [00:47] 🎨 Symbolic Graphics Programming with Large Language Models（使用大型语言模型进行符号化图形编程） [01:17] ⚡ Set Block Decoding is a Language Model Inference Accelerator（集合块解码：一种语言模型推理加速器） [01:43] 🎼 WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning（WildScore：多模态大语言模型在真实场景下的符号音乐推理基准测试） [02:14] 🌍 LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation（LatticeWorld：基于多模态大语言模型的交互式复杂世界生成框架） [02:42] 💡 LuxDiT: Lighting Estimation with Video Diffusion Transformer（LuxDiT：基于视频扩散变换器的光照估计） [03:15] 📷 WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool（WinT3R：基于窗口流式重建与相机令牌池） [03:44] 📉 On Robustness and Reliability of Benchmark-Based Evaluation of LLMs（基于基准测试的LLM评估的鲁棒性与可靠性研究） [04:07] 🔍 MedVista3D: Vision-Language Modeling for Reducing Diagnostic Errors in 3D CT Disease Detection, Understanding and Reporting（MedVista3D：用于减少3D CT疾病检测、理解和报告中诊断错误的视觉语言建模） [04:43] 🦾 U-ARM : Ultra low-cost general teleoperation interface for robot manipulation（U-ARM：用于机器人操作的超低成本通用遥操作接口） [05:16] 🔍 Behavioral Fingerprinting of Large Language Models（大型语言模型的行为指纹识别） [05:45] 🚀 Bootstrapping Task Spaces for Self-Improvement（自改进任务空间的引导构建）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

6min
HÁ 3 DIAS

【周末特辑】9月第2周最火AI论文 | LLM智能体RL综述；AI代码安全基准

本期的 5 篇论文如下： [00:35] TOP1(🔥139) | 🤖 The Landscape of Agentic Reinforcement Learning for LLMs: A Survey（面向大语言模型的智能体强化学习全景：一项综述） [01:52] TOP2(🔥133) | 🔒 A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code（A.S.E：一个用于评估AI生成代码安全的仓库级基准） [02:57] TOP3(🔥127) | 🤖 A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers（科学大型语言模型综述：从数据基础到智能体前沿） [04:15] TOP4(🔥103) | 🧠 R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning（R-4B: 通过双模式退火和强化学习激励多模态大语言模型的通用自动思考能力） [05:11] TOP5(🔥101) | 🤔 Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth（废话学：用深度解读无意义内容挑战大型语言模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

7min
HÁ 4 DIAS

2025.09.05 | 大型语言模型语义理解弱；图像编辑模型提升几何估计

本期的 13 篇论文如下： [00:22] 🤔 Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth（废话学：用深度解读无意义内容挑战大型语言模型） [00:47] 📐 From Editor to Dense Geometry Estimator（从编辑模型到密集几何估计器） [01:08] 🧠 Towards a Unified View of Large Language Model Post-Training（迈向大语言模型后训练的统一视角） [01:39] 🔄 Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?（逆向IFEval：大型语言模型能否摒弃顽固训练惯例以遵循真实指令？） [02:05] 🔬 DeepResearch Arena: The First Exam of LLMs' Research Abilities via Seminar-Grounded Tasks（深度研究竞技场：基于研讨会任务对大语言模型研究能力的首次考核） [02:26] 🚀 Transition Models: Rethinking the Generative Learning Objective（过渡模型：重新思考生成式学习目标） [02:54] 🔍 NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware Embeddings（NER检索器：基于类型感知嵌入的零样本命名实体检索） [03:24] ⚡ Few-step Flow for 3D Generation via Marginal-Data Transport Distillation（基于边缘数据传输蒸馏的少步流3D生成方法） [03:53] 🎬 Video-MTR: Reinforced Multi-Turn Reasoning for Long Video Understanding（视频多轮推理：面向长视频理解的强化多轮推理框架） [04:19] 🎭 Durian: Dual Reference-guided Portrait Animation with Attribute Transfer（Durian：基于双参考引导的肖像动画与属性迁移） [04:47] 📐 Drawing2CAD: Sequence-to-Sequence Learning for CAD Generation from Vector Drawings（Drawing2CAD：基于序列到序列学习的矢量绘图CAD生成） [05:24] 🧠 Delta Activations: A Representation for Finetuned Large Language Models（Delta激活：微调大型语言模型的一种表示方法） [06:01] ⚠ False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize（虚假安全感：为何基于探测的恶意输入检测方法难以泛化）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

7min
HÁ 5 DIAS

2025.09.04 | 机器人任务规划高效；数据推理能力提升

本期的 5 篇论文如下： [00:24] 🤖 Robix: A Unified Model for Robot Interaction, Reasoning and Planning（Robix：一个用于机器人交互、推理和规划的统一模型） [00:54] 🔍 Open Data Synthesis For Deep Research（面向深度研究的开放数据合成） [01:30] 🧠 LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations（LMEnt：一套分析语言模型从预训练数据到表示的知识套件） [02:00] 🧩 MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement（MOSAIC: 基于对应感知对齐和解纠缠的多主体个性化生成） [02:32] 🧠 Planning with Reasoning using Vision Language World Model（基于视觉语言世界模型的规划与推理）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

4min
HÁ 6 DIAS

2025.09.03 | 智能体RL提升大模型自主性；SimpleTIR解多轮工具推理

本期的 15 篇论文如下： [00:19] 🤖 The Landscape of Agentic Reinforcement Learning for LLMs: A Survey（面向大语言模型的智能体强化学习全景：一项综述） [00:40] 🚀 SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning（SimpleTIR：面向多轮工具集成推理的端到端强化学习） [01:12] 🤖 UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning（UI-TARS-2技术报告：通过多轮强化学习推进GUI代理） [01:41] 🎥 ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding（ELV-Halluc：长视频理解中的语义聚合幻觉基准测试） [02:12] 🔄 LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model（LLaVA-Critic-R1：你的评论模型其实是一个强大的策略模型） [02:43] 🔧 VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use（VerlTool：迈向整体性代理强化学习与工具使用） [03:11] 📄 POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion（POINTS-Reader：无蒸馏适配的视觉-语言模型用于文档转换） [03:33] 🩺 Baichuan-M2: Scaling Medical Capability with Large Verifier System（百川-M2：通过大规模验证系统扩展医疗能力） [03:57] 🎥 Kwai Keye-VL 1.5 Technical Report（快手 Keye-VL 1.5 技术报告） [04:20] 🤖 Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR（通过监督学习框架实现隐式Actor-Critic耦合用于RLVR） [04:45] 🧠 Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic（推理向量：通过任务算术传递思维链能力） [05:11] 🔄 Jointly Reinforcing Diversity and Quality in Language Model Generations（在语言模型生成中联合强化多样性与质量） [05:42] 🚀 DCPO: Dynamic Clipping Policy Optimization（DCPO: 动态裁剪策略优化） [06:04] 🚀 OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning（OpenVision 2：用于多模态学习的生成式预训练视觉编码器系列） [06:27] 🎬 GenCompositor: Generative Video Compositing with Diffusion Transformer（GenCompositor：基于扩散变换器的生成式视频合成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

7min
2 DE SET.

2025.09.02 | PVPO优化推理性能；T2R-bench暴露模型短板

本期的 6 篇论文如下： [00:23] 🧠 PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning（PVPO：基于预估值策略优化的智能体推理方法） [00:49] 📊 T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables（T2R-bench：一个用于从真实世界工业表格生成文章级报告的基准测试） [01:18] 🔍 No Label Left Behind: A Unified Surface Defect Detection Model for all Supervision Regimes（无标签遗漏：适用于所有监督制度的统一表面缺陷检测模型） [01:44] 📊 UI-Level Evaluation of ALLaM 34B: Measuring an Arabic-Centric LLM via HUMAIN Chat（ALLaM 34B 的UI级评估：通过 HUMAIN Chat 测量以阿拉伯语为中心的大语言模型） [02:11] 🧠 From reactive to cognitive: brain-inspired spatial intelligence for embodied agents（从反应到认知：用于具身智能体的脑启发表象智能） [02:36] 🔄 How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$-bench（输入重构如何提高复杂动态环境中的工具使用准确性？一项关于$τ$-bench的研究）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

3min
1 DE SET.

2025.09.01 | R-4B模型优化思考效率；EO-1提升机器人控制能力

本期的 15 篇论文如下： [00:24] 🧠 R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning（R-4B: 通过双模式退火和强化学习激励多模态大语言模型的通用自动思考能力） [00:59] 🤖 EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control（具身一体视觉：交错视觉-文本-动作预训练用于通用机器人控制） [01:29] 🔒 A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code（A.S.E：一个用于评估AI生成代码安全的仓库级基准） [01:57] 🎥 Droplet3D: Commonsense Priors from Videos Facilitate 3D Generation（Droplet3D：视频中的常识先验促进3D生成） [02:26] 🗣 TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis（TalkVid: 一个用于音频驱动说话头部合成的大规模多样化数据集） [02:58] 🤖 A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers（科学大型语言模型综述：从数据基础到智能体前沿） [03:28] 🤖 UItron: Foundational GUI Agent with Advanced Perception and Planning（UItron：具有先进感知和规划能力的基础GUI代理） [03:50] 🎮 Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models（在游戏中思考：通过强化学习与大型语言模型学习游戏推理） [04:20] 🔄 TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training（TiKMiX：将数据影响力纳入语言模型预训练的动态混合） [04:45] 💻 Efficient Code Embeddings from Code Generation Models（来自代码生成模型的高效代码嵌入） [05:10] ⏸ Morae: Proactively Pausing UI Agents for User Choices（Morae: 主动暂停UI代理以供用户选择） [05:37] 🔍 AHELM: A Holistic Evaluation of Audio-Language Models（AHELM：音频语言模型的全面评估） [06:05] 🤖 HERMES: Human-to-Robot Embodied Learning from Multi-Source Motion Data for Mobile Dexterous Manipulation（HERMES: 基于多源运动数据的人到机器人具身学习用于移动灵巧操作） [06:34] 🔄 Model-Task Alignment Drives Distinct RL Outcomes（模型-任务对齐驱动强化学习的差异化结果） [07:08] 👁 Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery（模仿物理学家的眼睛：一种以视觉语言模型为中心的物理公式发现方法）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

8min

Ver tudo (379)

de 5

2 avaliações

支持！！

16 de fev.

Fergie.W

希望能一直做下去

Criado por

duan
Anos de atividade

2024 - 2025
Episódios

379
Classificação

Livre
Site do podcast

HuggingFace 每日AI论文速递

Tecnologia

Tecnologia

Semanal
Tecnologia

Tecnologia

Quinzenal
Tecnologia

Tecnologia

Semanal
Lazer

Lazer

Semanal
Negócios

Negócios

Diário
Investimentos

Investimentos

Semanal
Empreendedorismo

Empreendedorismo

Semanal

HuggingFace 每日AI论文速递

2025.09.09 | REER提升推理性能；WebExplorer训练智能体

2025.09.08 | 语言模型幻觉源于预训练；大模型图形编程性能提升

【周末特辑】9月第2周最火AI论文 | LLM智能体RL综述；AI代码安全基准

2025.09.05 | 大型语言模型语义理解弱；图像编辑模型提升几何估计

2025.09.04 | 机器人任务规划高效；数据推理能力提升

2025.09.03 | 智能体RL提升大模型自主性；SimpleTIR解多轮工具推理

2025.09.02 | PVPO优化推理性能；T2R-bench暴露模型短板

2025.09.01 | R-4B模型优化思考效率；EO-1提升机器人控制能力

Classificações e avaliações

支持！！

Sobre

Informações

Você também pode gostar de

HuggingFace 每日AI论文速递

Episódios

2025.09.09 | REER提升推理性能；WebExplorer训练智能体

2025.09.08 | 语言模型幻觉源于预训练；大模型图形编程性能提升

【周末特辑】9月第2周最火AI论文 | LLM智能体RL综述；AI代码安全基准

2025.09.05 | 大型语言模型语义理解弱；图像编辑模型提升几何估计

2025.09.04 | 机器人任务规划高效；数据推理能力提升

2025.09.03 | 智能体RL提升大模型自主性；SimpleTIR解多轮工具推理

2025.09.02 | PVPO优化推理性能；T2R-bench暴露模型短板

2025.09.01 | R-4B模型优化思考效率；EO-1提升机器人控制能力

Classificações e avaliações

Sobre

Informações

Você também pode gostar de