HuggingFace 每日AI论文速递

duan

5.0 (2)
Technology
Updated Daily

每天10分钟，带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新，欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版，可在小红书搜索并关注【AI速递】

10h ago

【周末特辑】8月第1周最火AI论文 | Kimi K3开源前沿智能；AskChem论断级化学检索

【赞助商】OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】本期的 5 篇论文如下： [00:44] TOP1(🔥431) | 🧠 Kimi K3: Open Frontier Intelligence（Kimi K3：开放前沿智能）[03:17] TOP2(🔥292) | 🧪 AskChem: Claim-Centered Infrastructure for Chemistry Literature Synthesis（AskChem：以论断为中心的化学文献综合基础设施）[05:53] TOP3(🔥281) | 🤖 Qwen-UI-Agent Technical Report: Toward Next-Generation Real-World Centric Foundation GUI Agents（Qwen-UI-Agent技术报告：迈向下一代以真实世界为中心的基础GUI智能体）[08:46] TOP4(🔥256) | 🧠 Metis: Memory Foundation Model（Metis：记忆基础模型）[11:40] TOP5(🔥192) | 🤖 Progress Reward Modeling for Robotic Learning: A Comprehensive Survey（机器人学习的进度奖励建模：一项全面综述）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿
3d ago

2026.07.30 | TurboVLA实现消费级显卡实时操控；CoRT精细化信用分配提升指令遵循。

【赞助商】OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】本期的 15 篇论文如下： [00:33] ⚡ TurboVLA: Real-Time Vision-Language-Action Model at 32 Hz on an RTX 4090 with 1 GB VRAM（TurboVLA：在RTX 4090上以32Hz频率运行且显存占用低于1GB的实时视觉-语言-动作模型）[01:36] 🎯 CoRT: Counterfactual Replay for Token-Level Rubric-Guided Policy Optimization（CoRT：用于令牌级准则引导策略优化的反事实重放）[02:31] 🤖 HumanCLAW: Can Vision-Language Models Act Through a Body?（HumanCLAW：视觉-语言模型能否通过身体行动？）[03:41] 🧬 DecoEvo: Score-Decoupled Co-Evolution of Solver and Rubric-Generator Skills in Text Space（DecoEvo：文本空间中求解器与评估生成器技能的解耦协同进化）[04:39] 🧠 CLBench-V: Evaluating Multimodal Context Learning from Grounding to Knowledge Acquisition（CLBench-V：从基础定位到知识获取的多模态上下文学习评估）[05:32] 🧩 CAST: Game Solvers as Turn-Level Teachers for LLM Agents（CAST：游戏求解器作为LLM智能体的回合级教师）[06:28] 🧠 SkillRise: Agentic Reinforcement Learning for Cross-Task Skill Evolution（技能崛起：面向跨任务技能演化的智能体强化学习）[07:18] 🎮 StatePlay: State-Aware Game World Models for Mechanics-Consistent Generation（StatePlay：状态感知的游戏世界模型用于机制一致的内容生成）[08:05] 📊 OmegaUse-OfficeVal: Benchmarking LLM Agents on Long-Horizon Office-Suite Tasks with Economic Grounding（OmegaUse-OfficeVal：基于经济基准评估LLM智能体在长期办公套件任务中的表现）[09:08] 🤖 Can AI agents conduct open-ended AI research? Early evidence from two case studies（AI智能体能否进行开放式的AI研究？来自两个案例研究的早期证据）[10:04] 🛡 GPT-Red: Automated Red Teaming via Self-Play at Scale（GPT-Red：通过大规模自我对弈实现自动化红队测试）[11:04] 🎬 Explicit Layer Modeling for Video Object Insertion and Layer Decomposition（显式层建模用于视频对象插入与层分解）[12:01] 🕵 StealthBench: Measuring Operational Stealth in Autonomous Offensive-Security Agents（StealthBench：衡量自主攻击安全代理的操作隐蔽性）[12:54] 🧠 Memory for Large Language Models（大型语言模型的记忆机制）[13:48] 📜 Grading the Narrators: An Isnad-Rijal Framework for Claim-Level Provenance in Multi-Agent Knowledge Systems（为叙述者评级：面向多智能体知识系统中声明级溯源的一种伊斯纳德-里贾尔框架）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿
4d ago

2026.07.29 | 高保真数据训练策略，逼近真机效果；相关性动态引导搜索，精准高效检索

【赞助商】OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】本期的 15 篇论文如下： [00:31] 🤖 HiFi-UMI: Learning Deployable Manipulation Policies from High-Fidelity UMI Data Alone（HiFi-UMI：仅从高保真UMI数据学习可部署的操作策略）[01:31] 🔍 A New Role for Relevance: Guiding Corpus Interaction in Agentic Search（相关性的新角色：在智能体搜索中引导语料库交互）[02:20] 🎨 ReDesign: Recovering Editable Design Structures from Images via Agentic Decomposition（ReDesign：通过智能体分解从图像中恢复可编辑的设计结构）[03:07] 🧠 Keep It InMind: Benchmarking the Implicit-Association Blind Spot in Agent Memory（保持铭记：基准测试代理记忆中的内隐关联盲点）[03:56] 🏃 Pass the Baton: Trajectory-Relayed On-Policy Distillation（传递接力棒：轨迹中继的在线策略蒸馏）[04:47] ⚡ Mage-VL: An Efficient Codec-Native Streaming Multimodal Foundation Model（Mage-VL：一种高效的编解码器原生流式多模态基础模型）[05:49] 🧩 CodeNib: A Multi-View Data System for Serving Repository Context to Coding Agents（CodeNib：一种为编码智能体提供仓库上下文服务的多视图数据系统）[06:48] 🌍 Wonder: Video World Model Done Better（Wonder：更优的视频世界模型）[07:51] 👁 PerceptionBench: Evaluating Atomic Visual Perception in Multimodal Large Language Models（感知基准：评估多模态大语言模型中的原子视觉感知能力）[08:44] 🔍 Novel Claim or Déjà Vu? Rethinking "Contamination-Free'' Dynamic Evaluation for Multimodal Automated Fact-Checking（新颖主张还是似曾相识？重新思考多模态自动事实核查的“无污染”动态评估）[09:40] 🛡 Shieldstral（盾星）[10:41] 🔀 MODUS: Decoder-Only Any-to-Any Modeling of Diverse Modalities（MODUS：仅解码器的任意模态到任意模态多样化建模）[11:41] ⚡ Parallel Decoding Distillation for Fast Image and Video Generation（并行解码蒸馏：面向快速图像与视频生成的方法）[12:23] 🎬 Visual prompt engineering for video models（视频模型的视觉提示工程）[13:16] 🎯 OmniDelta: Skill-Driven Budget Allocation for Token Compression in OmniLLMs（OmniDelta：面向全模态大语言模型令牌压缩的技能驱动预算分配方法）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿
5d ago

2026.07.28 | Kimi K3开源模型性能领先；JarvisHub画布框架革新创意协作

【赞助商】OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】本期的 15 篇论文如下： [00:33] 🧠 Kimi K3: Open Frontier Intelligence（Kimi K3：开放前沿智能）[01:22] 🎨 JarvisHub: An Open Harness for Canvas-Native Multimodal Creative Agents（JarvisHub：一个面向画布原生多模态创意代理的开放框架）[02:20] 🤖 From Proprietary to Open-Source: Bridging the Distribution Gap via Multi-Agent Protocol Distillation in Agentic Search（从专有到开源：通过多智能体协议蒸馏弥合智能搜索中的分布差距）[03:18] 🤖 Progress Reward Modeling for Robotic Learning: A Comprehensive Survey（机器人学习的进度奖励建模：一项全面综述）[04:17] 🤖 StateAct: Program State, before Pixels, for Long-Horizon Computer-Use Agents（StateAct：面向长周期计算机使用代理，程序状态优先于像素）[05:23] 🧠 Rethinking Classifier-Free Guidance in On-Policy Diffusion Distillation（重新思考在线策略扩散蒸馏中的无分类器引导）[06:27] 🗼 Data Pyramid for Embodied Manipulation（具身操作的数据金字塔）[07:33] ⚡ Sol-Attn: Accelerating Video Generation Inference via On-the-Fly Attention Sparsification（Sol-Attn：通过即时注意力稀疏化加速视频生成推理）[08:37] 🎬 OmniVAE: An Audio-Video VAE with Cross-Modal Alignment for Joint Generation（OmniVAE：一种具有跨模态对齐的音频-视频VAE，用于联合生成）[09:30] 🧠 The Physics of Multi-Turn Long-Horizon Planning: From Pre-training to Post-training via Single- and Multi-Teacher On-Policy Agentic Distillation（多轮长程规划中的物理学：通过单教师与多教师在线策略智能体蒸馏从预训练到后训练）[10:26] 👗 Oxygen-TryOn: Fashion-Native Foundation Model for Any-item Virtual Try-On（氧气试穿：面向时尚的原生基础模型，实现任意物品虚拟试穿）[11:17] 🦎 Chamaileon: Cross-Context Binder Design with Contextualized Modeling and Mixed Sampling（Chamaileon：基于情境化建模与混合采样的跨情境结合剂设计）[12:11] 🔮 dRAE: Representation Autoencoder with Hyper-Spherical Codes（dRAE：超球面码的表示自编码器）[13:09] 🧩 DecoupleMix: Decoupled Ratio Search and Convex Allocation for Scalable VLM Data Recipes（解耦混合：用于可扩展VLM数据配方的解耦比率搜索与凸分配）[14:08] 🏥 ClinFusion: A Vision-Centric Multimodal LLM System for Holistic Medical Understanding（ClinFusion：一种面向整体医学理解的以视觉为中心的多模态大语言模型系统）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿
6d ago

2026.07.27 | 技能自我对弈推动模型能力前沿；智能体上下文管理优化成本与推理。

【赞助商】OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】本期的 14 篇论文如下： [00:32] 🔄 Skill Self-Play: Pushing the Frontier of LLM Capability with Co-Evolving Skills（技能自我对弈：通过协同进化技能推动大语言模型能力前沿）[01:22] 🧠 Agentic Context Management: Solving Agent Memory and Cost by Treating Them as Lifecycle and Architecture Problems（智能体上下文管理：将代理记忆与成本视为生命周期与架构问题）[02:18] 🤖 Molt: A Scalable PyTorch-Native Training Framework for Agentic Reinforcement Learning（Molt：一个可扩展的、原生PyTorch的智能体强化学习训练框架）[03:07] 🧪 DataPrep-Bench: Benchmarking LLMs as Training Data Preparators（数据准备基准：将大语言模型作为训练数据准备工具的基准测试）[04:04] 📊 Scaling Native Multimodal Pre-Training From Scratch（从头开始扩展原生多模态预训练）[04:54] 🎯 Three-Body Scattering for Generative Modeling（用于生成建模的三体散射）[05:48] 🌐 LAMAR: An Open Language-Aware Multilingual Alignment Reranker（LAMAR：一种开放的语言感知多语言对齐重排序器）[06:46] 🧠 Multi-Head Latent Control: A Unified Interface for LLM Agent Decision Making（多头潜在控制：大语言模型智能体决策的统一接口）[07:43] 🎛 Spectral Prior for Reducing Exposure Bias in Diffusion Models（用于减少扩散模型曝光偏差的频谱先验）[08:34] 🧠 IDEAgent: Agentic Quality-Diversity Search for Research Idea Generation（IDEAgent：面向研究创意生成的主体性质量-多样性搜索）[09:27] 🎯 SceneActBench: Can Agents Act on the 3D Scenes They See?（场景动作基准：智能体能否对所见的三维场景采取行动？）[10:37] 🔄 Closing the Loop: Training-Free Revisit Consistency for Autoregressive Generative Rendering（闭环：无需训练的循环一致性自回归生成渲染）[11:31] 🔊 Multimodal Speaker Verification as a Threat to Speaker Anonymization（多模态说话人验证对说话人匿名化的威胁）[12:26] 🧠 VisCo: Leveraging Large Language Models as Intrinsic Encoders for Visual Token Compression（VisCo：利用大语言模型作为视觉标记压缩的内在编码器）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿
6d ago

【周末特辑】7月第4周最火AI论文 | 单卡GPU实现无限交互世界；三维定位让机器人更智能

【赞助商】OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】本期的 5 篇论文如下： [00:47] TOP1(🔥297) | 🌍 ABot-World-0: Infinite Interactive World Rollout on a Single Desktop GPU（ABot-World-0：在单个桌面GPU上实现无限交互式世界展开）[02:39] TOP2(🔥197) | 🤖 RynnBrain 1.1: Towards More Capable and Generalizable Embodied Foundation Model（RynnBrain 1.1：迈向更强大和更通用的具身基础模型）[04:48] TOP3(🔥165) | ⏱ TimeLens2: Generalist Video Temporal Grounding with Multimodal LLMs（TimeLens2：基于多模态大语言模型的通用视频时间定位）[06:35] TOP4(🔥144) | 🔍 RAGU: A Multi-Step GraphRAG Engine with a Compact Domain-Adapted LLM（RAGU：一种具有紧凑领域自适应大语言模型的多步图检索增强生成引擎）[09:10] TOP5(🔥141) | 🔍 AREX: Towards a Recursively Self-Improving Agent for Deep Research（AREX：迈向递归自我改进的深度研究智能体）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿
Jul 24

2026.07.24 | AREX验证驱动递归改进；ReferTrack先指认后跟踪

【赞助商】OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】本期的 15 篇论文如下： [00:32] 🔍 AREX: Towards a Recursively Self-Improving Agent for Deep Research（AREX：迈向递归自我改进的深度研究智能体）[01:34] 🤖 ReferTrack: Referring Then Tracking for Embodied Visual Tracking（ReferTrack：面向具身视觉追踪的“先指认后跟踪”范式）[02:20] 📚 K12-KGraph: A Curriculum-Aligned Knowledge Graph for Benchmarking and Training Educational LLMs（K12-KGraph：一个面向课程对齐的知识图谱，用于基准测试和训练教育大语言模型）[03:13] 🖼 Visual Contrastive Self-Distillation（视觉对比自蒸馏）[04:02] 🗺 Show, Don't Tell: Evaluating Spatial Cognition in Generative Pixels Rather Than LLM Text（展示而非叙述：在生成像素而非LLM文本中评估空间认知）[04:51] 🎨 Color Pass-Through via Camera-Display Coupling（通过相机-显示耦合的色彩直通）[05:45] 🛠 Tencent WorkBuddy Bench: A Multi-Domain Coding-Agent Benchmark with Contamination-Resistant Task Construction（腾讯工作伙伴基准：一个具有抗污染任务构建的多领域编码智能体基准）[06:43] 🧭 LLMs Get Lost in Evolving User Intent（大语言模型在用户意图演变中迷失方向）[07:33] 🎥 Self-Supervised Learning of Structured Dynamics from Videos（从视频中自监督学习结构化动力学）[08:28] 🧠 Sample-Efficient Learning from Agent Experience（从智能体经验中进行样本高效学习）[09:28] 🌀 Recurrent Sinusoidal INRs for Efficient High-Fidelity Representation（用于高效高保真表示的递归正弦隐式神经表示）[10:27] 🌍 Streaming Multi-Agent Autoregressive Diffusion Model with World State Registers（流式多智能体自回归扩散模型与世界状态寄存器）[11:28] 🤖 Robostral Navigate（罗博斯特拉导航）[12:18] 🎭 Predictive Divergence Masks for LLM RL（预测性散度掩码用于大语言模型强化学习）[13:04] 🎬 SANA-Video 2.0: Hybrid Linear Attention with Attention Residuals for Efficient Video Generation（SANA-Video 2.0：混合线性注意力与注意力残差实现高效视频生成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿
Jul 22

2026.07.22 | 实时游戏渲染提速56倍；AI数据管道成本降七成。

【赞助商】OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】本期的 15 篇论文如下： [00:34] 🎮 Generative World Renderer at the Speed of Play（以游戏速度运行的生成式世界渲染器）[01:27] 🔀 DataFlow-Harness: A Grounded Code-Agent Platform for Constructing Editable LLM Data Pipelines（数据流驾驭平台：一种用于构建可编辑LLM数据管道的接地代码代理平台）[02:16] 🔍 Text Template Tokens Are Implicit Semantic Registers in Diffusion Transformers（文本模板标记是扩散Transformer中的隐式语义寄存器）[03:04] ⚡ Mage-Flow: An Efficient Native-Resolution Foundation Model for Image Generation and Editing（Mage-Flow：一种用于图像生成与编辑的高效原生分辨率基础模型）[03:52] 🌍 AlayaWorld: Interactive Long-Horizon World Modeling -- Full Technical Report（AlayaWorld：交互式长时域世界建模——完整技术报告）[04:49] 🤖 Stale but Stable: Staleness-Adaptive Trust Regions for Stabilizing Asynchronous Reinforcement Learning（陈旧但稳定：面向异步强化学习稳定化的陈旧性自适应信任区域方法）[05:52] 🌍 ABot-World-0: Infinite Interactive World Rollout on a Single Desktop GPU（ABot-World-0：在单个桌面GPU上实现无限交互式世界展开）[06:49] 📊 SciForma: Structure-Faithful Generation of Scientific Diagrams（SciForma：科学图表的结构忠实生成）[07:45] 🔍 AgentDebugX: An Open-Source Toolkit for Failure Observability, Attribution, and Recovery in LLM Agents（AgentDebugX：用于LLM Agent故障可观测性、归因与恢复的开源工具包）[08:41] ⚡ HPD-Parsing: Hierarchical Parallel Document Parsing（HPD-Parsing：层级并行文档解析）[09:38] 📊 Two-Level Meta-Rubrics for Evaluating Open-Ended Generation: GAMUT, a Benchmark for Factual Completeness（用于评估开放生成的两级元评价标准：GAMUT，一个面向事实完整性的基准测试）[10:33] 🧠 ISO: An RLVR-Native Optimization Stack（ISO：一种原生于RLVR的优化栈）[11:41] 🎙 Transcription Policy as a Latent Variable: Activating Controllable Verbatim ASR with Word-Level Timing（转录策略作为潜在变量：通过词级时序激活可控的逐字语音识别）[12:43] 🎓 EduPanel: A Three-Agent LLM Judge for Teaching Videos -- Reliability, Complementarity, and Human Trust Calibration（EduPanel：一种用于教学视频的三智能体LLM评审器——可靠性、互补性与人类信任校准）[13:29] 🎬 Masked Visual Actions for Unified World Modeling（掩蔽视觉动作：用于统一世界建模）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

See All (663)

out of 5

2 Ratings

支持！！

02/16/2025

Fergie.W

希望能一直做下去

Creator

duan
Years Active

2024 - 2026
Episodes

663
Rating

Clean
Show Website

HuggingFace 每日AI论文速递

Technology

Technology

Updated Weekly
News

News

Updated Semiweekly
Investing

Investing

Updated Weekly
Entrepreneurship

Entrepreneurship

Updated Daily
Business

Business

Updated Weekly
Leisure

Leisure

Updated Jul 27
News

News

Updated Daily

HuggingFace 每日AI论文速递

【周末特辑】8月第1周最火AI论文 | Kimi K3开源前沿智能；AskChem论断级化学检索

2026.07.30 | TurboVLA实现消费级显卡实时操控；CoRT精细化信用分配提升指令遵循。

2026.07.29 | 高保真数据训练策略，逼近真机效果；相关性动态引导搜索，精准高效检索

2026.07.28 | Kimi K3开源模型性能领先；JarvisHub画布框架革新创意协作

2026.07.27 | 技能自我对弈推动模型能力前沿；智能体上下文管理优化成本与推理。

【周末特辑】7月第4周最火AI论文 | 单卡GPU实现无限交互世界；三维定位让机器人更智能

2026.07.24 | AREX验证驱动递归改进；ReferTrack先指认后跟踪

2026.07.22 | 实时游戏渲染提速56倍；AI数据管道成本降七成。

Ratings & Reviews

支持！！

About

Information

You Might Also Like

HuggingFace 每日AI论文速递

Episodes

【周末特辑】8月第1周最火AI论文 | Kimi K3开源前沿智能；AskChem论断级化学检索

2026.07.30 | TurboVLA实现消费级显卡实时操控；CoRT精细化信用分配提升指令遵循。

2026.07.29 | 高保真数据训练策略，逼近真机效果；相关性动态引导搜索，精准高效检索

2026.07.28 | Kimi K3开源模型性能领先；JarvisHub画布框架革新创意协作

2026.07.27 | 技能自我对弈推动模型能力前沿；智能体上下文管理优化成本与推理。

【周末特辑】7月第4周最火AI论文 | 单卡GPU实现无限交互世界；三维定位让机器人更智能

2026.07.24 | AREX验证驱动递归改进；ReferTrack先指认后跟踪

2026.07.22 | 实时游戏渲染提速56倍；AI数据管道成本降七成。

Ratings & Reviews

About

Information

You Might Also Like