HuggingFace 每日AI论文速递

duan

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. 16H AGO

    2026.02.16 | 特征激活补数据;区域蒸馏藏放大

    【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:30] 🧠 Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs(少即是够:在大型语言模型特征空间中合成多样化数据) [01:19] 🔍 Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception(无需缩放:面向细粒度多模态感知的区域到图像蒸馏) [02:03] 🏥 MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs(MedXIAOHE:构建医疗多模态大语言模型的完整方案) [02:43] 🎯 OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence(OneVision-编码器:以编解码器对齐的稀疏性作为多模态智能的基础原则) [03:29] 🔬 What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis(强化学习对视觉推理有何改进?一项弗兰肯斯坦式分析) [04:18] 🤖 RLinf-Co: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models(RLinf-Co:基于强化学习的仿真-现实协同训练VLA模型) [05:05] 🤖 ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning(ABot-M0:基于动作流形学习的机器人操作VLA基础模型) [05:53] 🎬 Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions(迈向具有属性结构和质量验证指令的通用视频多模态大语言模型) [06:55] 🤝 Intelligent AI Delegation(智能AI委托框架) [07:49] 📍 GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics(GeoAgent:通过强化地理特征学习实现无处不在的地理定位) [08:39] ⚙ BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models(BPDQ:基于可变网格的比特平面分解量化用于大语言模型) [09:37] ⚡ FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching(FLAC:通过动能正则化桥匹配实现最大熵强化学习) [10:14] 🔍 On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs(关于RL微调视觉语言模型的鲁棒性与思维链一致性研究) [11:03] ⚡ DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels(DICE:扩散大语言模型在生成CUDA内核方面表现出色) [11:48] ⚡ CoPE-VideoLM: Codec Primitives For Efficient Video Language Models(CoPE-VideoLM:面向高效视频语言模型的编解码器原语) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    13 min
  2. 3D AGO

    2026.02.13 | 自演化AI难守安全;音频大模型统一token

    【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:31] ⚠ The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies(魔书背后的魔鬼:在自我进化的AI社会中,人类安全价值总是趋于消失) [01:24] 🎵 MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models(MOSS-Audio-Tokenizer:为未来音频基础模型扩展音频分词器) [02:28] 🧠 Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation(超越教师的学习:基于奖励外推的广义策略蒸馏) [03:05] 🤖 GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning(GigaBrain-0.5M*:一种通过世界模型强化学习训练的视觉-语言-动作模型) [03:56] ⚖ LawThinker: A Deep Research Legal Agent in Dynamic Environments(LawThinker:动态环境中的深度研究法律智能体) [04:33] 🔍 Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning(思之愈久,探之愈深:通过长度激励强化学习实现上下文内探索) [05:16] 🎨 Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching(惊喜之笔:矢量草图绘制中的渐进式语义错觉) [06:01] 🚀 DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing(DeepGen 1.0:一个用于推进图像生成与编辑的轻量级统一多模态模型) [06:55] 🧩 Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models(Composition-RL:为大型语言模型强化学习组合可验证提示) [07:38] 🧠 Thinking with Drafting: Optical Decompression via Logical Reconstruction(思维与草稿:通过逻辑重构实现光学解压缩) [08:17] 🗳 dVoting: Fast Voting for dLLMs(dVoting:面向扩散大语言模型的快速投票推理方法) [09:09] 🤖 RISE: Self-Improving Robot Policy with Compositional World Model(RISE:基于组合世界模型的机器人策略自改进框架) [09:54] 🤖 $χ_{0}$: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies(χ₀:通过驯服分布不一致实现资源感知的鲁棒机器人操作) [10:48] 🤖 EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration(EgoHumanoid:利用无机器人自我中心演示解锁野外移动操作) [11:45] 🔍 Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty Adaptation(揭示隐式优势对称性:为何GRPO在探索与难度适应中举步维艰) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    13 min
  3. 4D AGO

    2026.02.12 | 稀疏MoE比肩GPT-5;GENIUS测流体智能

    【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:28] ⚡ Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters(Step 3.5 Flash:拥有110亿活跃参数的前沿级智能模型) [01:06] 🧠 GENIUS: Generative Fluid Intelligence Evaluation Suite(GENIUS:生成式流体智能评估套件) [01:46] 🤖 PhyCritic: Multimodal Critic Models for Physical AI(PhyCritic:面向物理人工智能的多模态评判模型) [02:18] ⚙ ASA: Training-Free Representation Engineering for Tool-Calling Agents(ASA:面向工具调用智能体的免训练表征工程) [02:59] 🧠 When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning(何时记忆与何时停止:用于长上下文推理的门控循环记忆) [03:38] 🧮 Towards Autonomous Mathematics Research(迈向自主数学研究) [04:15] 🎬 TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions(TimeChat-Captioner:基于时间感知与结构化音视频描述的多场景视频脚本生成) [05:12] 🧠 G-LNS: Generative Large Neighborhood Search for LLM-Based Automatic Heuristic Design(G-LNS:基于大语言模型的生成式大邻域搜索自动启发式设计) [06:02] ⚙ FeatureBench: Benchmarking Agentic Coding for Complex Feature Development(FeatureBench:面向复杂功能开发的智能体编码基准测试) [06:44] 🧑 DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning(DataChef:通过强化学习为LLM适应烹饪最优数据配方) [07:28] 🚀 ROCKET: Rapid Optimization via Calibration-guided Knapsack Enhanced Truncation for Efficient Model Compression(ROCKET:基于校准引导的背包增强截断的快速优化,用于高效模型压缩) [08:27] 📈 Online Causal Kalman Filtering for Stable and Effective Policy Optimization(在线因果卡尔曼滤波用于稳定有效的策略优化) [09:24] 🧠 Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models(将元经验内化至记忆以指导大语言模型的强化学习) [10:06] 🗣 Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models(Ex-Omni:赋能全模态大语言模型生成3D面部动画) [10:47] 🔄 Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning(在长链思维监督微调中,数据重复优于数据扩展) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    12 min
  4. 5D AGO

    2026.02.11 | OPUS对齐更新选数据;Code2World代码预演GUI

    【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:33] 🚀 OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration(OPUS:迈向大规模语言模型预训练中高效且原理化的逐轮数据选择) [01:17] 💻 Code2World: A GUI World Model via Renderable Code Generation(Code2World:通过可渲染代码生成的GUI世界模型) [02:05] 🤖 UI-Venus-1.5 Technical Report(UI-Venus-1.5 技术报告) [02:58] 🧠 Chain of Mindset: Reasoning with Adaptive Cognitive Modes(思维链模式:基于自适应认知模式的推理) [03:52] 🧠 SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning(SkillRL:通过递归技能增强强化学习进化智能体) [04:29] 🔬 P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads(P1-VL:连接视觉感知与物理奥赛中的科学推理) [05:24] 🤖 Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning(智能体世界模型:面向智能体强化学习的无限合成环境) [05:58] 🔍 Prism: Spectral-Aware Block-Sparse Attention(Prism:基于频谱感知的块稀疏注意力机制) [06:41] ⚡ DLLM-Searcher: Adapting Diffusion Large Language Model for Search Agents(DLLM-Searcher:适配扩散大语言模型用于搜索智能体) [07:23] 🎬 Olaf-World: Orienting Latent Actions for Video World Modeling(Olaf-World:面向视频世界建模的潜在动作定向) [08:18] 🎨 Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss(基于扩散损失的图像自回归生成中的条件误差优化) [09:09] 🍌 Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling(智能体香蕉:基于智能体思维与工具的高保真图像编辑) [09:50] 🎯 SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models(SCALE:基于自不确定度条件化的自适应视觉感知与执行视觉-语言-动作模型) [10:37] 🤖 BagelVLA: Enhancing Long-Horizon Manipulation via Interleaved Vision-Language-Action Generation(BagelVLA:通过交错式视觉-语言-动作生成增强长视野操作) [11:31] 🎬 TokenTrim: Inference-Time Token Pruning for Autoregressive Long Video Generation(TokenTrim:用于自回归长视频生成的推理时令牌剪枝) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    13 min
  5. 6D AGO

    2026.02.10 | ReAlign零训弥合图文隙;MOVA同步生成视音频

    【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:34] 🔀 Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models(面向多模态大语言模型的模态间隙驱动的子空间对齐训练范式) [01:23] 🎬 MOVA: Towards Scalable and Synchronized Video-Audio Generation(MOVA:迈向可扩展且同步的视频-音频生成) [02:03] 📈 QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining(QuantaAlpha:一种基于大语言模型驱动的阿尔法挖掘进化框架) [02:51] 🤖 Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning(循环深度视觉语言动作模型:通过潜在迭代推理实现隐式测试时计算扩展) [03:24] 🎯 Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO(通过建模逐步与长期采样效应缓解流式GRPO中的稀疏奖励问题) [04:22] ⚡ LLaDA2.1: Speeding Up Text Diffusion via Token Editing(LLaDA2.1:通过令牌编辑加速文本扩散) [05:02] 📱 GEBench: Benchmarking Image Generation Models as GUI Environments(GEBench:将图像生成模型作为GUI环境的基准测试) [05:52] 🎬 Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition(Demo-ICL:面向过程性视频知识获取的上下文学习) [06:42] 🧠 Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory(学习查询感知的预算层级路由以实现运行时智能体记忆) [07:20] 📈 Weak-Driven Learning: How Weak Agents make Strong Agents Stronger(弱驱动学习:弱智能体如何使强智能体更强) [08:12] 📊 LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth(LOCA-bench:在可控与极端上下文增长下对语言智能体进行基准测试) [08:59] 🔍 GISA: A Benchmark for General Information-Seeking Assistant(GISA:通用信息寻求助手基准) [09:56] 🧭 WorldCompass: Reinforcement Learning for Long-Horizon World Models(WorldCompass:面向长视野世界模型的强化学习) [10:35] 🧪 LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning(LatentChem:从文本思维链到化学推理中的潜在思维) [11:20] 🧭 Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?(空间理论:基础模型能否通过主动探索构建空间信念?) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    13 min
  6. FEB 9

    2026.02.09 | AI问诊如住院医;互动悟规则才是真智能

    【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:32] 🩺 Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making(Baichuan-M3:建模临床问询以实现可靠的医疗决策) [01:17] 🧭 OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions(奥德赛竞技场:面向长视野、主动与归纳交互的大语言模型基准测试) [02:03] 📈 On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models(论大型语言模型强化微调中的熵动态) [02:47] 🎯 F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare(F-GRPO:别让你的策略学会常见而遗忘罕见) [03:48] ⚖ MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration(MSign:一种通过稳定秩恢复防止大语言模型训练不稳定的优化器) [04:33] 🤖 DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos(DreamDojo:基于大规模人类视频的通用机器人世界模型) [05:14] 🧠 Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training(通过翻译-推理集成训练实现自我改进的多语言长推理) [06:07] 🧮 Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math(评判我们无法解决的问题:一种基于后果的无监督研究级数学评估方法) [06:46] 🎯 POINTS-GUI-G: GUI-Grounding Journey(POINTS-GUI-G:图形用户界面基础任务之旅) [07:45] 🧠 MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments(MemGUI-Bench:动态环境中移动GUI代理内存能力的基准测试) [08:29] 🧠 Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities(回归基础:通过生成概率重新审视强化学习在LLM推理中的探索) [09:18] 🎵 AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders(AudioSAE:利用稀疏自编码器理解音频处理模型) [09:59] ⚡ Canzona: A Unified, Asynchronous, and Load-Balanced Framework for Distributed Matrix-based Optimizers(Canzona:一个统一、异步且负载均衡的分布式矩阵优化器框架) [11:02] 🧠 InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning(InftyThink+:通过强化学习实现高效且有效的无限视野推理) [11:49] 🧠 PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks(PlanViz:面向计算机使用任务的规划导向图像生成与编辑评估) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    13 min

Ratings & Reviews

5
out of 5
2 Ratings

About

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】