HuggingFace 每日AI论文速递

duan

0.0 (0)
TECHNOLOGY
UPDATED DAILY

每天10分钟，带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新，欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版，可在小红书搜索并关注【AI速递】

22 HR AGO

【周末特辑】9月第2周最火AI论文 | LLM智能体RL综述；AI代码安全基准

本期的 5 篇论文如下： [00:35] TOP1(🔥139) | 🤖 The Landscape of Agentic Reinforcement Learning for LLMs: A Survey（面向大语言模型的智能体强化学习全景：一项综述） [01:52] TOP2(🔥133) | 🔒 A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code（A.S.E：一个用于评估AI生成代码安全的仓库级基准） [02:57] TOP3(🔥127) | 🤖 A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers（科学大型语言模型综述：从数据基础到智能体前沿） [04:15] TOP4(🔥103) | 🧠 R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning（R-4B: 通过双模式退火和强化学习激励多模态大语言模型的通用自动思考能力） [05:11] TOP5(🔥101) | 🤔 Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth（废话学：用深度解读无意义内容挑战大型语言模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

7 min
1 DAY AGO

2025.09.05 | 大型语言模型语义理解弱；图像编辑模型提升几何估计

本期的 13 篇论文如下： [00:22] 🤔 Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth（废话学：用深度解读无意义内容挑战大型语言模型） [00:47] 📐 From Editor to Dense Geometry Estimator（从编辑模型到密集几何估计器） [01:08] 🧠 Towards a Unified View of Large Language Model Post-Training（迈向大语言模型后训练的统一视角） [01:39] 🔄 Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?（逆向IFEval：大型语言模型能否摒弃顽固训练惯例以遵循真实指令？） [02:05] 🔬 DeepResearch Arena: The First Exam of LLMs' Research Abilities via Seminar-Grounded Tasks（深度研究竞技场：基于研讨会任务对大语言模型研究能力的首次考核） [02:26] 🚀 Transition Models: Rethinking the Generative Learning Objective（过渡模型：重新思考生成式学习目标） [02:54] 🔍 NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware Embeddings（NER检索器：基于类型感知嵌入的零样本命名实体检索） [03:24] ⚡ Few-step Flow for 3D Generation via Marginal-Data Transport Distillation（基于边缘数据传输蒸馏的少步流3D生成方法） [03:53] 🎬 Video-MTR: Reinforced Multi-Turn Reasoning for Long Video Understanding（视频多轮推理：面向长视频理解的强化多轮推理框架） [04:19] 🎭 Durian: Dual Reference-guided Portrait Animation with Attribute Transfer（Durian：基于双参考引导的肖像动画与属性迁移） [04:47] 📐 Drawing2CAD: Sequence-to-Sequence Learning for CAD Generation from Vector Drawings（Drawing2CAD：基于序列到序列学习的矢量绘图CAD生成） [05:24] 🧠 Delta Activations: A Representation for Finetuned Large Language Models（Delta激活：微调大型语言模型的一种表示方法） [06:01] ⚠ False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize（虚假安全感：为何基于探测的恶意输入检测方法难以泛化）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

7 min
2 DAYS AGO

2025.09.04 | 机器人任务规划高效；数据推理能力提升

本期的 5 篇论文如下： [00:24] 🤖 Robix: A Unified Model for Robot Interaction, Reasoning and Planning（Robix：一个用于机器人交互、推理和规划的统一模型） [00:54] 🔍 Open Data Synthesis For Deep Research（面向深度研究的开放数据合成） [01:30] 🧠 LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations（LMEnt：一套分析语言模型从预训练数据到表示的知识套件） [02:00] 🧩 MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement（MOSAIC: 基于对应感知对齐和解纠缠的多主体个性化生成） [02:32] 🧠 Planning with Reasoning using Vision Language World Model（基于视觉语言世界模型的规划与推理）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

4 min
3 DAYS AGO

2025.09.03 | 智能体RL提升大模型自主性；SimpleTIR解多轮工具推理

本期的 15 篇论文如下： [00:19] 🤖 The Landscape of Agentic Reinforcement Learning for LLMs: A Survey（面向大语言模型的智能体强化学习全景：一项综述） [00:40] 🚀 SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning（SimpleTIR：面向多轮工具集成推理的端到端强化学习） [01:12] 🤖 UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning（UI-TARS-2技术报告：通过多轮强化学习推进GUI代理） [01:41] 🎥 ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding（ELV-Halluc：长视频理解中的语义聚合幻觉基准测试） [02:12] 🔄 LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model（LLaVA-Critic-R1：你的评论模型其实是一个强大的策略模型） [02:43] 🔧 VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use（VerlTool：迈向整体性代理强化学习与工具使用） [03:11] 📄 POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion（POINTS-Reader：无蒸馏适配的视觉-语言模型用于文档转换） [03:33] 🩺 Baichuan-M2: Scaling Medical Capability with Large Verifier System（百川-M2：通过大规模验证系统扩展医疗能力） [03:57] 🎥 Kwai Keye-VL 1.5 Technical Report（快手 Keye-VL 1.5 技术报告） [04:20] 🤖 Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR（通过监督学习框架实现隐式Actor-Critic耦合用于RLVR） [04:45] 🧠 Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic（推理向量：通过任务算术传递思维链能力） [05:11] 🔄 Jointly Reinforcing Diversity and Quality in Language Model Generations（在语言模型生成中联合强化多样性与质量） [05:42] 🚀 DCPO: Dynamic Clipping Policy Optimization（DCPO: 动态裁剪策略优化） [06:04] 🚀 OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning（OpenVision 2：用于多模态学习的生成式预训练视觉编码器系列） [06:27] 🎬 GenCompositor: Generative Video Compositing with Diffusion Transformer（GenCompositor：基于扩散变换器的生成式视频合成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

7 min
4 DAYS AGO

2025.09.02 | PVPO优化推理性能；T2R-bench暴露模型短板

本期的 6 篇论文如下： [00:23] 🧠 PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning（PVPO：基于预估值策略优化的智能体推理方法） [00:49] 📊 T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables（T2R-bench：一个用于从真实世界工业表格生成文章级报告的基准测试） [01:18] 🔍 No Label Left Behind: A Unified Surface Defect Detection Model for all Supervision Regimes（无标签遗漏：适用于所有监督制度的统一表面缺陷检测模型） [01:44] 📊 UI-Level Evaluation of ALLaM 34B: Measuring an Arabic-Centric LLM via HUMAIN Chat（ALLaM 34B 的UI级评估：通过 HUMAIN Chat 测量以阿拉伯语为中心的大语言模型） [02:11] 🧠 From reactive to cognitive: brain-inspired spatial intelligence for embodied agents（从反应到认知：用于具身智能体的脑启发表象智能） [02:36] 🔄 How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$-bench（输入重构如何提高复杂动态环境中的工具使用准确性？一项关于$τ$-bench的研究）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

3 min
5 DAYS AGO

2025.09.01 | R-4B模型优化思考效率；EO-1提升机器人控制能力

本期的 15 篇论文如下： [00:24] 🧠 R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning（R-4B: 通过双模式退火和强化学习激励多模态大语言模型的通用自动思考能力） [00:59] 🤖 EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control（具身一体视觉：交错视觉-文本-动作预训练用于通用机器人控制） [01:29] 🔒 A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code（A.S.E：一个用于评估AI生成代码安全的仓库级基准） [01:57] 🎥 Droplet3D: Commonsense Priors from Videos Facilitate 3D Generation（Droplet3D：视频中的常识先验促进3D生成） [02:26] 🗣 TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis（TalkVid: 一个用于音频驱动说话头部合成的大规模多样化数据集） [02:58] 🤖 A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers（科学大型语言模型综述：从数据基础到智能体前沿） [03:28] 🤖 UItron: Foundational GUI Agent with Advanced Perception and Planning（UItron：具有先进感知和规划能力的基础GUI代理） [03:50] 🎮 Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models（在游戏中思考：通过强化学习与大型语言模型学习游戏推理） [04:20] 🔄 TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training（TiKMiX：将数据影响力纳入语言模型预训练的动态混合） [04:45] 💻 Efficient Code Embeddings from Code Generation Models（来自代码生成模型的高效代码嵌入） [05:10] ⏸ Morae: Proactively Pausing UI Agents for User Choices（Morae: 主动暂停UI代理以供用户选择） [05:37] 🔍 AHELM: A Holistic Evaluation of Audio-Language Models（AHELM：音频语言模型的全面评估） [06:05] 🤖 HERMES: Human-to-Robot Embodied Learning from Multi-Source Motion Data for Mobile Dexterous Manipulation（HERMES: 基于多源运动数据的人到机器人具身学习用于移动灵巧操作） [06:34] 🔄 Model-Task Alignment Drives Distinct RL Outcomes（模型-任务对齐驱动强化学习的差异化结果） [07:08] 👁 Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery（模仿物理学家的眼睛：一种以视觉语言模型为中心的物理公式发现方法）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

8 min
6 DAYS AGO

【月末特辑】8月最火AI论文 | 科学AI模型缩小性能差距；图像模型解决文本渲染与编辑

本期的 10 篇论文如下： [00:30] TOP1(🔥242) | 🧪 Intern-S1: A Scientific Multimodal Foundation Model（Intern-S1：一个科学多模态基础模型） [01:36] TOP2(🔥239) | 🎨 Qwen-Image Technical Report（Qwen-Image技术报告） [02:46] TOP3(🔥227) | 🤔 Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens（LLM思维链推理是海市蜃楼吗？一个数据分布的视角） [04:14] TOP4(🔥220) | 🚀 DINOv3（DINOv3：视觉基础模型新里程碑） [05:25] TOP5(🔥168) | 🚀 GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models（GLM-4.5：智能体、推理与编程（ARC）基础模型） [06:25] TOP6(🔥166) | ✨ On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification（关于SFT泛化性的研究：一个基于奖励修正的强化学习视角） [07:29] TOP7(🔥164) | 🚀 InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency（InternVL3.5：提升开源多模态模型在通用性、推理能力和效率上的表现） [08:45] TOP8(🔥156) | 🤖 VeriGUI: Verifiable Long-Chain GUI Dataset（VeriGUI：可验证的长链GUI数据集） [09:53] TOP9(🔥142) | 📚 We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning（We-Math 2.0：一个激励视觉数学推理的多功能数学手册系统） [11:26] TOP10(🔥139) | 🚀 NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale（NextStep-1：迈向大规模连续令牌自回归图像生成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13 min
30 AUG

【周末特辑】8月第5周最火AI论文 | 多模态模型效率提升；自博弈策略提高多样性

本期的 5 篇论文如下： [00:36] TOP1(🔥161) | 🚀 InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency（InternVL3.5：提升开源多模态模型在通用性、推理能力和效率上的表现） [01:25] TOP2(🔥114) | 📈 Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR（超越Pass@1：变分问题合成的自博弈策略持续提升RLVR） [02:23] TOP3(🔥108) | 🚀 AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs（AgentFly：无需微调LLM即可微调LLM智能体） [03:51] TOP4(🔥94) | 🗣 VibeVoice Technical Report（VibeVoice技术报告） [05:17] TOP5(🔥78) | 🔍 Beyond Transcription: Mechanistic Interpretability in ASR（超越转录：自动语音识别中的机械可解释性）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

7 min

See All (377)

Creator

duan
Years Active

2024 - 2025
Episodes

377
Rating

Clean
Show Website

HuggingFace 每日AI论文速递

Technology

Technology

Updated weekly
Technology

Technology

Every two weeks
Technology

Technology

Updated weekly
Leisure

Leisure

Updated weekly
Business

Business

Updated daily
Investing

Investing

Updated weekly
Entrepreneurship

Entrepreneurship

Updated weekly

HuggingFace 每日AI论文速递

【周末特辑】9月第2周最火AI论文 | LLM智能体RL综述；AI代码安全基准

2025.09.05 | 大型语言模型语义理解弱；图像编辑模型提升几何估计

2025.09.04 | 机器人任务规划高效；数据推理能力提升

2025.09.03 | 智能体RL提升大模型自主性；SimpleTIR解多轮工具推理

2025.09.02 | PVPO优化推理性能；T2R-bench暴露模型短板

2025.09.01 | R-4B模型优化思考效率；EO-1提升机器人控制能力

【月末特辑】8月最火AI论文 | 科学AI模型缩小性能差距；图像模型解决文本渲染与编辑

【周末特辑】8月第5周最火AI论文 | 多模态模型效率提升；自博弈策略提高多样性

About

Information

You Might Also Like

HuggingFace 每日AI论文速递

Episodes

【周末特辑】9月第2周最火AI论文 | LLM智能体RL综述；AI代码安全基准

2025.09.05 | 大型语言模型语义理解弱；图像编辑模型提升几何估计

2025.09.04 | 机器人任务规划高效；数据推理能力提升

2025.09.03 | 智能体RL提升大模型自主性；SimpleTIR解多轮工具推理

2025.09.02 | PVPO优化推理性能；T2R-bench暴露模型短板

2025.09.01 | R-4B模型优化思考效率；EO-1提升机器人控制能力

【月末特辑】8月最火AI论文 | 科学AI模型缩小性能差距；图像模型解决文本渲染与编辑

【周末特辑】8月第5周最火AI论文 | 多模态模型效率提升；自博弈策略提高多样性

About

Information

You Might Also Like