HuggingFace 每日AI论文速递

duan

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. 4h ago

    2026.06.05 | ArcANE框架量化角色弧线;TIDE模型实现主动洞察

    【目录】本期的 15 篇论文如下: [00:31] 🎭 ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?(ArcANE:角色扮演语言代理在正确时刻保持角色一致性吗?)[01:26] 🔍 TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration(TIDE:通过模板引导的迭代实现主动多问题发现)[02:27] 🤖 AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints(AdaPlanBench:在世界与用户约束下评估大语言模型智能体的自适应规划能力)[03:14] 🎥 VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding(VideoKR:迈向知识和推理密集型视频理解)[04:09] 🤖 RobotValues: Evaluating Household Robots When Human Values Conflict(机器人价值观:当人类价值观冲突时评估家用机器人)[05:01] 🌐 Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation(强化学习引发对未见语言的上下文翻译学习)[05:58] 🎬 LoomVideo: Unifying Multimodal Inputs into Video Generation and Editing(LoomVideo:统一多模态输入的视频生成与编辑)[06:49] 📸 Personal AI Agent for Camera Roll VQA(个人相机胶卷视觉问答的AI助手)[07:36] 🧠 Rethinking Continual Experience Internalization for Self-Evolving LLM Agents(重新思考持续经验内化以实现自演化的大语言模型智能体)[08:27] ⚖ Complexity-Balanced Diffusion Splitting(复杂度平衡扩散分割)[09:28] 🤖 Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?(Dream.exe:视频生成模型能否构想出可执行的机器人操作?)[10:33] 🔬 Unsupervised Skill Discovery for Agentic Data Analysis(面向智能体数据分析的无监督技能发现)[11:25] 🔍 LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs(大型语言模型可能泄露训练数据,但它们愿意吗?一种基于倾向性的记忆评估方法)[12:17] 🎯 Towards One-to-Many Temporal Grounding(迈向一对多时序定位)[13:16] 💰 The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs(推理的影子价格:大型语言模型最优预算分配的经济学视角) 【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递 【赞助商】OpenClaw快报每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 在小宇宙查看该单集文稿

    14 min
  2. 1d ago

    2026.06.04 | 全模态统一框架;音频实时主动交互

    【目录】本期的 15 篇论文如下: [00:31] 🌌 Cosmos 3: Omnimodal World Models for Physical AI(宇宙3:面向物理AI的全模态世界模型)[01:36] 🎧 Audio Interaction Model(音频交互模型)[02:31] 🔍 Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories(深度研究型智能体错在哪里?智能体轨迹中的跨度级错误定位)[03:30] 🔍 Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning(在基于评分标准的强化学习中复现、分析与检测奖励作弊行为)[04:25] 🧭 OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs(OVO-S-Bench:面向多模态大语言模型流式空间智能的分层基准)[05:27] ⚡ Qwen-Image-Flash: Beyond Objective Design(Qwen-Image-Flash:超越客观设计)[06:18] 🧠 M$^3$Eval: Multi-Modal Memory Evaluation through Cognitively-Grounded Video Tasks(M$^3$Eval:基于认知视频任务的多模态记忆评估)[07:13] 🎥 Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation(回声无限:面向实时无限视频生成的可学习演化记忆)[08:14] 🧠 ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning(思维折叠:通过内省偏好学习折叠推理链)[09:08] 🧪 Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems(基准测试并不足够:用于生产系统中智能体模型运行时评估的RAMP框架)[10:15] ⚡ Streaming Communication in Multi-Agent Reasoning(多智能体推理中的流式通信)[11:08] 🎯 Self-Distilled Policy Gradient(自蒸馏策略梯度)[12:13] 🧠 MemTrain: Self-Supervised Context Memory Training(MemTrain:自监督上下文记忆训练)[13:05] 🧩 Eliciting Complex Spatial Reasoning in MLLMs through Wide-Baseline Matching(通过宽基线匹配激发多模态大语言模型中的复杂空间推理能力)[14:11] 🤖 MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?(MMG2Skill:智能体能否从野外指南中蒸馏出自我进化的技能?) 【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递 【赞助商】OpenClaw快报每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 在小宇宙查看该单集文稿

    15 min
  3. 2d ago

    2026.06.03 | 信任区域教小模型;人形GPT追踪动作

    【目录】本期的 15 篇论文如下: [00:31] 🎯 Trust Region On-Policy Distillation(信任区域同策略蒸馏)[01:17] 🤖 Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking(人形GPT:扩展数据与结构实现零样本运动追踪)[02:07] 🧠 A Local Perturbation Theory for Cross-Domain Interference and Recovery in Multi-Domain RL(多领域强化学习中跨域干扰与恢复的局部微扰理论)[03:06] 🧠 World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning(世界模型与语言模型:具体与抽象推理的互补性)[03:57] 🏥 AutoMedBench: Towards Medical AutoResearch with Agentic AI Models(AutoMedBench:面向医疗自主研究的智能体AI模型基准)[05:09] 🖼 Decoupled Residual Denoising Diffusion Models for Unified and Data Efficient Image-to-Image Translation(解耦残差去噪扩散模型用于统一且数据高效的图像到图像翻译)[06:12] 😴 Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories(语言模型需要睡眠:学习自我修改与记忆巩固)[07:09] 🧩 TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL(TRON:面向视觉推理强化学习的目标驱动、规则可验证的在线环境)[08:07] 💬 $Ψ$-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues(Ψ-Bench:评估说服性对话中个性感知影响能力)[09:08] 🧩 Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging(去中心化指令微调:冲突感知分割与权重合并)[10:05] 🎯 Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling(小型强化学习控制器与大型语言模型:基于强化学习引导的自适应采样实现测试时扩展)[11:09] 📄 PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training(PaddleOCR-VL-1.6:通过欠优化区域精炼与渐进式后训练扩展文档解析前沿)[12:14] 🗺 PlatonicNav: Unveiling Semantic Correspondence in Navigation with Platonic Topological Maps(柏拉图导航:利用柏拉图拓扑图揭示导航中的语义对应关系)[13:16] 🔍 Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces(诊断正确答案长链思维训练轨迹中的有害延续)[14:05] 🎵 MERIT: Learning Disentangled Music Representations for Audio Similarity(MERIT:学习用于音频相似性的解耦音乐表示) 【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递 【赞助商】OpenClaw快报每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 在小宇宙查看该单集文稿

    15 min
  4. 3d ago

    2026.06.02 | 多智能体框架生成可编辑图表;参数高效微调支撑百万个性化模型

    【目录】本期的 15 篇论文如下: [00:33] 🎨 Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs(Crafter:一种用于从多样化输入生成可编辑科学图形的多智能体框架)[01:39] 🧩 On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters(关于参数高效微调的规模化:迈向万亿参数级别的百万个性化模型)[02:35] 🧪 A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks(品味之道:提升智能体基准测试的覆盖度与难度)[03:25] 🌐 K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts(K-BrowseComp:基于韩国语境的网页浏览代理基准测试)[04:21] ⚡ Draft-OPD: On-Policy Distillation for Speculative Draft Models(Draft-OPD:面向推测草稿模型的在策略蒸馏)[05:10] 🎓 VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization(视觉语言模型作为视频推理的优质教师:通过自适应测试时优化)[06:18] 📡 X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding(X-Stream:探索多模态大语言模型作为多流理解的多路复用器)[07:13] 🎬 VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion(VideoMLA:用于分钟级自回归视频扩散的低秩潜在KV缓存)[07:59] 🤖 SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories(SkillAdaptor:面向LLM智能体的自适应技能从轨迹中学习)[08:54] 🧠 Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models(哪种预训练范式更好地服务于空间智能?视觉语言模型与视频生成模型的实证比较)[09:51] 🧠 NITP: Next Implicit Token Prediction for LLM Pre-training(NITP:面向大语言模型预训练的下一隐式词元预测)[10:50] 👀 Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?(该看向哪里:基础模型能否通过主动探索达到目标视角?)[11:46] 🎬 LVSA: Training-Free Sparse Attention for Long Video Diffusion(LVSA:面向长视频扩散的无训练稀疏注意力机制)[12:38] 🛑 ESPO: Early-Stopping Proximal Policy Optimization(早期停止的近端策略优化)[13:37] 🎤 StreamChar: Long-Horizon Streaming Character Audio-Video Generation with Decoupled Orchestration(StreamChar:基于解耦编排的长时流式角色音频-视频生成) 【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递 【赞助商】OpenClaw快报每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 在小宇宙查看该单集文稿

    15 min
  5. 4d ago

    2026.06.01 | 知识蒸馏炼技能;表示强制破瓶颈

    【目录】本期的 15 篇论文如下: [00:30] 🧠 COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation(COLLEAGUE.SKILL:通过专家知识蒸馏实现自动化AI技能生成)[01:17] 🧠 Representation Forcing for Bottleneck-Free Unified Multimodal Models(表示强制:无瓶颈统一多模态模型)[02:07] 🎙 SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue(SwanVoice:面向独白与对话的表现力丰富长文本零样本语音合成)[02:58] 🔍 LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards(长迹强化学习:利用评分奖励从搜索代理轨迹中学习长上下文推理)[03:59] 🎧 Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer(面向流式同步空间音频生成的自回归扩散Transformer)[04:48] 🖼 GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration(GGT-100K:面向通用真实世界图像恢复的生成式真实标签)[05:39] 🎤 Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios(多样化场景下长篇语音生成的综合基准测试)[06:46] 🛋 Function2Scene: 3D Indoor Scene Layout from Functional Specifications(从功能规格到场景:基于功能说明的3D室内布局生成)[07:36] 🎥 SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer(SANA-Streaming:基于混合扩散Transformer的实时流式视频编辑)[08:29] 🧠 Task-Focused Memorization for Multimodal Agents(面向多模态智能体的任务聚焦记忆机制)[09:30] 🤖 Exploring Autonomous Agentic Data Engineering for Model Specialization(探索面向模型专业化的自主代理数据工程)[10:15] 🎓 Not All Disagreement Is Learnable: Token Teachability in On-Policy Distillation(并非所有分歧都是可学习的:在线策略蒸馏中的令牌可教性)[11:10] 🧩 dMoE: dLLMs with Learnable Block Experts(dMoE:具有可学习块级专家机制的扩散大语言模型)[12:12] 🛠 Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents(恢复策略诱导的错误:面向鲁棒GUI智能体的基准测试与轨迹合成)[13:07] 🛡 From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors(从提示注入到持久控制:防御智能体框架中的特洛伊后门) 【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递 【赞助商】OpenClaw快报每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 在小宇宙查看该单集文稿

    15 min
  6. 4d ago

    【月末特辑】5月最火AI论文 | 多智能体世界建模;开源机器人VLA模型

    【目录】本期的 10 篇论文如下:[00:45] TOP1(🔥407) | 🌍 Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players(Gamma-World:超越双玩家的生成式多智能体世界建模)[03:09] TOP2(🔥347) | 🤖 MolmoAct2: Action Reasoning Models for Real-world Deployment(MolmoAct2:面向实际部署的動作推理模型)[05:30] TOP3(🔥269) | 🔍 CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence(CiteVQA:为可信文档智能建立证据归因基准)[07:51] TOP4(🔥231) | 🧠 Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers(均值模式尖叫:面向千层扩散Transformer的均值-方差分裂残差)[10:04] TOP5(🔥219) | 🏗 MinT: Managed Infrastructure for Training and Serving Millions of LLMs(MinT:用于训练和服务数百万大语言模型的托管基础设施)[11:59] TOP6(🔥217) | 🧠 Heterogeneous Scientific Foundation Model Collaboration(异构科学基础模型协作)[14:17] TOP7(🔥210) | 🤖 Code as Agent Harness(代码作为智能体框架)[16:26] TOP8(🔥210) | 🧠 SkillOpt: Executive Strategy for Self-Evolving Agent Skills(SkillOpt:面向自进化智能体技能的执行策略)[18:39] TOP9(🔥204) | 🎯 DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards(DelTA:面向可验证奖励强化学习的判别性令牌信用分配)[20:25] TOP10(🔥195) | 🧠 Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information(基于点互信息的反自蒸馏用于推理强化学习) 【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递 在小宇宙查看该单集文稿

    23 min
  7. May 29

    2026.05.29 | AgentDoG 1.5实现毫秒级安全防护;Qwen-VLA统一跨任务动作建模。

    【目录】本期的 14 篇论文如下:[00:25] 🛡 AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security(AgentDoG 1.5:一种轻量级且可扩展的AI代理安全与安保对齐框架)[01:06] 🤖 Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments(Qwen-VLA:统一跨任务、环境和机器人本体的视觉-语言-动作建模)[02:02] 🌐 OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources(OmniRetrieval:跨异构知识源的统一检索)[02:52] 🎨 CollectionLoRA: Collecting 50 Effects in 1 LoRA via Multi-Teacher On-Policy Distillation(集合LoRA:通过多教师同策略蒸馏将50种效果收集到一个LoRA中)[03:47] 🎬 minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models(minWM:一个用于实时交互式视频世界模型的全栈开源框架)[04:39] 🎥 YoCausal: How Far is Video Generation from World Model? A Causality Perspective(YoCausal:视频生成距离世界模型还有多远?一个因果视角)[05:42] 🎨 GenClaw: Code-Driven Agentic Image Generation(GenClaw:代码驱动的智能体图像生成)[06:40] ⚡ EarlyTom: Early Token Compression Completes Fast Video Understanding(EarlyTom:早期令牌压缩实现快速视频理解)[07:37] 🎯 UniSteer: Text-Guided Flow Matching in Activation Space for Versatile LLM Steering(UniSteer:文本引导的激活空间流匹配实现多功能大语言模型操控)[08:25] 🧠 How LoRA Remembers? A Parametric Memory Law for LLM Finetuning(LoRA如何记忆?大语言模型微调中的参数化记忆定律)[09:20] 🔗 LoMo: Local Modality Substitution for Deeper Vision-Language Fusion(本地模态替换:实现更深入的视觉-语言融合)[10:24] 🔍 LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training(LaRA:基于逐层表示分析的RL后训练数据污染检测方法)[11:16] 🧠 Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning(Skill0.5:面向智能体强化学习中分布外泛化的技能内化与利用联合框架)[12:17] 🔍 Xetrieval: Mechanistically Explaining Dense Retrieval(Xetrieval:机制性解释稠密检索) 【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递 在小宇宙查看该单集文稿

    14 min

Ratings & Reviews

5
out of 5
2 Ratings

About

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

You Might Also Like