HuggingFace 每日AI论文速递

duan

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. 21 小時前

    2025.08.12 | ReasonRank提升段落排序推理;WideSearch评估智能体广域搜寻

    本期的 15 篇论文如下: [00:18] 🧠 ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability(ReasonRank:赋予段落排序强大推理能力) [00:41] 🔍 WideSearch: Benchmarking Agentic Broad Info-Seeking(WideSearch:智能体广域信息搜寻基准测试) [01:01] ✨ Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation(Omni-Effects:统一且空间可控的视觉效果生成) [01:26] 🧠 Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization(Klear-Reasoner:通过梯度保留剪裁策略优化提升推理能力) [01:59] 💬 UserBench: An Interactive Gym Environment for User-Centric Agents(UserBench:面向用户中心智能体的交互式Gym基准环境) [02:22] 💡 SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens(SONAR-LLM:以句子嵌入思考并以Token表达的自回归Transformer) [02:50] 🌱 A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems(自进化AI智能体综合综述:连接基础模型与终身智能体系统的新范式) [03:15] 🔬 BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent(BrowseComp-Plus:一种更公平透明的深度研究智能体评估基准) [03:45] 🤖 MolmoAct: Action Reasoning Models that can Reason in Space(MolmoAct:可进行空间推理的动作推理模型) [04:11] 🤖 OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks(OmniEAR:具身任务中智能体推理的基准测试) [04:38] 💡 Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts(Grove MoE:面向高效卓越的伴随专家MoE大语言模型) [05:05] ⏳ Temporal Self-Rewarding Language Models: Decoupling Chosen-Rejected via Past-Future(时序自奖励语言模型:通过过去-未来解耦选择与拒绝) [05:29] 🗺 Reinforcement Learning in Vision: A Survey(视觉强化学习:综述) [05:59] 🔍 Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning(第一部分:技巧还是陷阱?深入探究强化学习在大型语言模型推理中的应用) [06:23] 🖌 Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control(随形而动:轨迹引导区域控制的形状感知图像编辑) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    7 分鐘
  2. 1 天前

    2025.08.11 | GLM-4.5统一智能体推理编程;Voost高保真虚拟试穿试脱

    本期的 11 篇论文如下: [00:20] 🚀 GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models(GLM-4.5:智能体、推理与编程(ARC)基础模型) [00:47] 👕 Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off(Voost:一种统一且可扩展的双向虚拟试穿与试脱扩散Transformer) [01:11] 🎯 InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization(InfiGUI-G1:通过自适应探索策略优化推进 GUI 元素定位能力) [01:34] 🧠 Memp: Exploring Agent Procedural Memory(Memp:探索智能体程序性记忆) [02:03] ✂ Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal(剪枝非关键信息:基于首令牌惊奇度的高效代码推理) [02:29] 🪄 GENIE: Gaussian Encoding for Neural Radiance Fields Interactive Editing(GENIE:用于神经辐射场交互式编辑的高斯编码) [02:50] 📚 Adapting Vision-Language Models Without Labels: A Comprehensive Survey(无标签视觉-语言模型适应:一项全面综述) [03:15] 🌍 MELLA: Bridging Linguistic Capability and Cultural Groundedness for Low-Resource Language MLLMs(MELLA:弥合低资源语言多模态大语言模型的语言能力与文化扎根性) [03:37] 🧱 MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh(MeshLLM:赋能大型语言模型逐步理解和生成3D网格) [04:02] 🎯 UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding(UI-AGILE:以有效强化学习和精准推断时定位提升图形用户界面智能体) [04:30] ✨ LightSwitch: Multi-view Relighting with Material-guided Diffusion(光开关:基于材料引导扩散的多视角重照明) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    5 分鐘
  3. 4 天前

    2025.08.08 | 动态微调优推理;零数据自演进强推理

    本期的 15 篇论文如下: [00:16] ✨ On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification(关于SFT泛化性的研究:一个基于奖励修正的强化学习视角) [00:41] 🌱 R-Zero: Self-Evolving Reasoning LLM from Zero Data(R-Zero:零数据自演进推理大语言模型) [01:00] 🤖 Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation(Genie Envisioner:一个用于机器人操作的统一世界基础平台) [01:27] 🤔 DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning(DeepPHY:具身视觉语言模型物理推理基准测试) [01:49] 📊 Hi3DEval: Advancing 3D Generation Evaluation with Hierarchical Validity(Hi3DEval:基于分层有效性的3D生成评估进展) [02:12] 🤔 Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?(文档检索增强生成评估:我们走在正确的道路上吗?) [02:40] 🔍 Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability(大型多模态模型能否主动识别有缺陷的输入?一项对其输入审查能力的系统性评估框架) [03:08] 💡 Are Today's LLMs Ready to Explain Well-Being Concepts?(当今大型语言模型能否胜任解释幸福感概念?) [03:30] 🚀 CoAct-1: Computer-using Agents with Coding as Actions(CoAct-1:以编程为行动的计算机操作代理) [03:57] 🚀 InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities(InfiAlign:可扩展、样本高效的LLM推理能力对齐框架) [04:18] 💬 Evaluating, Synthesizing, and Enhancing for Customer Support Conversation(评估、合成与提升客户支持对话) [04:41] 💡 Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models(拒绝过度思考:高效R1风格大型推理模型综述) [05:02] 🤯 MOSEv2: A More Challenging Dataset for Video Object Segmentation in Complex Scenes(MOSEv2:复杂场景视频目标分割的更具挑战性数据集) [05:22] 🎤 Marco-Voice Technical Report(Marco-Voice 技术报告) [05:47] 🎨 StrandDesigner: Towards Practical Strand Generation with Sketch Guidance(StrandDesigner:迈向草图引导的实用毛发生成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    7 分鐘
  4. 5 天前

    2025.08.07 | VeriGUI提升代理能力;CoT推理实为模式匹配

    本期的 13 篇论文如下: [00:20] 🤖 VeriGUI: Verifiable Long-Chain GUI Dataset(VeriGUI:可验证的长链GUI数据集) [00:40] 🤔 Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens(LLM思维链推理是海市蜃楼吗?一个数据分布的视角) [00:59] 💰 Efficient Agents: Building Effective Agents While Reducing Cost(高效智能体:在降低成本的同时构建有效智能体) [01:21] 🌱 SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience(SEAgent:基于经验自主学习的自我演进计算机操作智能体) [01:47] ⚡ Agent Lightning: Train ANY AI Agents with Reinforcement Learning(智能体闪电:基于强化学习训练任意AI智能体) [02:09] 🧠 CoTox: Chain-of-Thought-Based Molecular Toxicity Reasoning and Prediction(CoTox:基于思维链的分子毒性推理与预测) [02:35] 🤖 Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning(使用强化学习训练长上下文、多轮软件工程智能体) [03:00] 🤝 Sotopia-RL: Reward Design for Social Intelligence(Sotopia-RL:社交智能的奖励设计) [03:26] 💻 LaTCoder: Converting Webpage Design to Code with Layout-as-Thought(LaTCoder:基于布局思考的网页设计转代码) [03:52] 🧠 Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents(Web-CogReasoner:迈向Web智能体的知识诱导认知推理) [04:16] ✨ HPSv3: Towards Wide-Spectrum Human Preference Score(HPSv3:迈向广谱人类偏好评分) [04:38] 🪄 Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis(高斯变化场扩散实现高保真视频到4D合成) [05:00] ⚡ LeanK: Learnable K Cache Channel Pruning for Efficient Decoding(LeanK:可学习的K缓存通道剪枝实现高效解码) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    6 分鐘
  5. 6 天前

    2025.08.06 | 高速推理扩散模型;紧凑视觉生成模型

    本期的 13 篇论文如下: [00:17] 🚀 Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference(种子扩散:一种具有高速推理能力的大规模扩散语言模型) [00:39] 🎨 Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation(Skywork UniPic:用于视觉理解与生成的统一自回归建模) [01:05] 🎥 LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation(LongVie:多模态引导的可控超长视频生成) [01:27] 🔍 CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward(CompassVerifier:统一且鲁棒的大语言模型评估与结果奖励验证器) [01:51] 🚀 CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search(CRINN: 用于近似最近邻搜索的对比强化学习) [02:13] 🔍 Tool-integrated Reinforcement Learning for Repo Deep Search(用于仓库深度搜索的工具集成强化学习) [02:36] 👥 Multi-human Interactive Talking Dataset(多人互动说话数据集) [03:04] 🧠 Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction(哥德尔证明器V2:通过脚手架数据合成和自我校正扩展形式化定理证明) [03:39] 🧭 LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?(LiveMCPBench:智能体能在海量MCP工具的海洋中航行吗?) [04:08] 🧩 LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer(LAMIC: 基于多模态扩散Transformer可扩展性的布局感知多图像合成) [04:37] 📊 ChartCap: Mitigating Hallucination of Dense Chart Captioning(ChartCap:缓解密集图表字幕生成的幻觉问题) [05:03] 🛡 AlignGuard-LoRA: Alignment-Preserving Fine-Tuning via Fisher-Guided Decomposition and Riemannian-Geodesic Collision Regularization(AlignGuard-LoRA:基于Fisher引导分解与黎曼测地碰撞正则化的对齐保持微调) [05:35] 🔍 TRACEALIGN -- Tracing the Drift: Attributing Alignment Failures to Training-Time Belief Sources in LLMs(TRACEALIGN -- 追踪漂移:将大语言模型中的对齐失败归因于训练时的信念源) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    6 分鐘
  6. 8月6日

    2025.08.05 | 图像文本渲染编辑创新;上下文检索提升故事理解

    本期的 15 篇论文如下: [00:18] 🎨 Qwen-Image Technical Report(Qwen-Image技术报告) [00:39] 🔍 SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension(SitEmb-v1.5:改进的上下文感知密集检索用于语义关联与长故事理解) [01:08] 🧬 CellForge: Agentic Design of Virtual Cell Models(CellForge: 虚拟细胞模型的智能体设计) [01:36] 🧠 Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following(超越权衡:用于推理模型指令遵循的自监督强化学习) [02:05] 🛡 Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report(Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct 技术报告) [02:32] 🤖 InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation(InstructVLA:从理解到操作的视觉-语言-动作指令微调) [03:04] 🚀 VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo(VeOmni:通过以模型为中心的分布式配方库扩展任意模态模型训练) [03:31] ✂ A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models(压缩一瞥:大型视觉语言模型的动态视觉令牌剪枝) [03:57] 🔒 Personalized Safety Alignment for Text-to-Image Diffusion Models(文本到图像扩散模型的个性化安全对齐) [04:16] 🌐 Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe(Voxlect:一个用于建模全球方言和地区语言的语音基础模型基准) [04:46] 🧠 RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems(RoboMemory:一种受大脑启发的多记忆智能体框架,用于物理体现系统中的终身学习) [05:10] 🎨 Artificial Intelligence and Misinformation in Art: Can Vision Language Models Judge the Hand or the Machine Behind the Canvas?(人工智能与艺术中的错误信息:视觉语言模型能否判断画布背后的是人手还是机器?) [05:47] 🔄 Exploitation Is All You Need... for Exploration(利用是你所需要的一切...为了探索) [06:15] 🔒 Cyber-Zero: Training Cybersecurity Agents without Runtime(Cyber-Zero:无运行时训练网络安全代理) [06:41] 🧠 AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks(AgentTTS:用于复杂任务中测试时计算最优扩展策略的大语言模型智能体) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    7 分鐘
  7. 8月5日

    2025.08.04 | 扩散语言模型变长去噪,高效省资源;PixNerd图像扩散,高效高质量。

    本期的 11 篇论文如下: [00:22] 🔄 Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models(超越固定长度:扩散大语言模型的可变长度去噪) [00:44] 🎨 PixNerd: Pixel Neural Field Diffusion(PixNerd:像素神经场扩散) [01:11] 💡 SWE-Exp: Experience-Driven Software Issue Resolution(SWE-Exp:经验驱动的软件问题解决) [01:38] 🔍 Multimodal Referring Segmentation: A Survey(多模态指代表达分割:一项综述) [01:59] 🧠 3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding(3D-R1:增强3D VLM的推理能力以实现统一场景理解) [02:40] 🤖 SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution(SWE-Debate:用于软件问题解决的竞争性多智能体辩论) [03:05] ⚖ Learning an Efficient Multi-Turn Dialogue Evaluator from Multiple Judges(从多个评委中学习高效的多轮对话评估器) [03:33] 🤯 Investigating Hallucination in Conversations for Low Resource Languages(研究低资源语言对话中的幻觉现象) [04:00] 🧭 IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation(IGL-Nav:用于图像目标导航的增量式三维高斯定位) [04:30] 🎧 SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video Generation(SpA2V: 利用空间听觉线索进行音频驱动的空间感知视频生成) [04:55] 🎮 Multi-Agent Game Generation and Evaluation via Audio-Visual Recordings(多智能体游戏生成与评估基于视听记录) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    6 分鐘

評分與評論

5
(滿分 5 顆星)
2 則評分

簡介

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

你可能也會喜歡