HuggingFace 每日AI论文速递

duan

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. 23 HR AGO

    2025.11.13 | 原神数据炼成7B通用AI;零训练轨迹秒变视频遥控器

    本期的 9 篇论文如下: [00:19] 🌍 Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds(Lumine:在3D开放世界中打造通才智能体的开源方案) [00:54] 🎬 Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising(Time-to-Move:无需训练的双时钟去噪运动控制视频生成) [01:31] ⚡ TiDAR: Think in Diffusion, Talk in Autoregression(TiDAR:扩散式思考,自回归式表达) [02:15] 🔄 LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls(LoopTool:闭合数据-训练循环,铸就鲁棒LLM工具调用) [02:51] 🤖 WMPO: World Model-based Policy Optimization for Vision-Language-Action Models(基于世界模型的视觉-语言-动作策略优化) [03:33] 🖥 WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation(WebVIA:可交互可验证的网页端视觉-语言智能体UI代码生成框架) [04:19] 🎯 Toward the Frontiers of Reliable Diffusion Sampling via Adversarial Sinkhorn Attention Guidance(迈向对抗式Sinkhorn注意力引导的可靠扩散采样新前沿) [04:55] 🤖 Agentic Refactoring: An Empirical Study of AI Coding Agents(智能体重构:AI编程智能体的大规模实证研究) [05:31] 🛡 Stemming Hallucination in Language Models Using a Licensing Oracle(利用许可证预言机遏制语言模型幻觉) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    6 min
  2. 1 DAY AGO

    2025.11.12 | 1.5B小模型反超671B大模型;多智能体质检聊天机器人

    本期的 9 篇论文如下: [00:24] 🧠 Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B(小模型大逻辑:多样性驱动优化唤醒VibeThinker-1.5B的大模型推理力) [00:59] 🤝 Adaptive Multi-Agent Response Refinement in Conversational Systems(对话系统中自适应多智能体响应精炼机制) [01:30] 🧩 Wasm: A Pipeline for Constructing Structured Arabic Interleaved Multimodal Corpora(Wasm:构建结构化阿拉伯交错型多模态语料的流水线) [02:17] ⚡ KLASS: KL-Guided Fast Inference in Masked Diffusion Models(KLASS:基于KL散度引导的掩码扩散模型快速采样) [02:53] 🖥 Grounding Computer Use Agents on Human Demonstrations(基于人类演示的计算机使用智能体定位研究) [03:37] 🎥 VideoSSR: Video Self-Supervised Reinforcement Learning(VideoSSR:视频自监督强化学习) [04:19] 🚪 The Path Not Taken: RLVR Provably Learns Off the Principals(未被选择的路径:RLVR确实沿非主方向学习) [05:14] 🔗 BiCA: Effective Biomedical Dense Retrieval with Citation-Aware Hard Negatives(BiCA:面向引文感知难负样本的生物医学稠密检索) [05:56] 🤹 Walking the Tightrope of LLMs for Software Development: A Practitioners' Perspective(游走于大型语言模型的钢丝绳——开发者视角的平衡之道) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    7 min
  3. 2 DAYS AGO

    2025.11.11 | 小窗口勤总结刷新深度研究;先广撒网再啃难题激活代码竞赛

    本期的 13 篇论文如下: [00:25] 🧩 IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction(IterResearch:基于马尔可夫状态重构的长程智能体再思考) [01:16] 🏆 DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation(DRIVE:面向可验证奖励强化学习的竞赛级代码生成数据精选最佳实践) [02:03] 🔬 The Station: An Open-World Environment for AI-Driven Discovery(“站”:面向AI驱动科学发现的开放世界环境) [02:43] 🚀 RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services(RedOne 2.0:社交网络场景下领域大模型后训练新范式) [03:15] 🧠 SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization(SofT-GRPO:用Gumbel重参数化软思考策略优化让离散Token强化学习望尘莫及) [03:53] 🧭 Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs(路由流形对齐提升混合专家大语言模型的泛化能力) [04:30] 🔍 Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads(以置信度推理:通过不确定性头高效验证大模型推理步骤) [05:10] 🎬 MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs(MVU-Eval:面向多模态大模型的多视频理解评测基准) [05:50] 🎨 MPJudge: Towards Perceptual Assessment of Music-Induced Paintings(MPJudge:面向音乐诱发绘画的感知评估) [06:57] 🔄 RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization(RLoop:一种通过迭代策略初始化自我提升的强化学习框架) [07:36] 🤖 Robot Learning from a Physical World Model(基于物理世界模型的机器人学习) [08:21] 🛠 NURBGen: High-Fidelity Text-to-CAD Generation through LLM-Driven NURBS Modeling(NURBGen:基于大模型驱动NURBS建模的高保真文本转CAD生成) [08:52] 🚀 SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?(SWE-fficiency:语言模型能否在真实工作负载下优化真实仓库性能?) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    10 min
  4. 6 DAYS AGO

    2025.11.07 | 视频推理新范式;图像互动促思维

    本期的 12 篇论文如下: [00:21] 🎬 Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm(用视频思考:视频生成作为统一多模态推理新范式) [00:58] 🧠 V-Thinker: Interactive Thinking with Images(V-Thinker:与图像互动的思维推理) [01:39] 🧠 Scaling Agent Learning via Experience Synthesis(基于经验合成的智能体规模化强化学习) [02:23] 🧠 Cambrian-S: Towards Spatial Supersensing in Video(Cambrian-S:迈向视频中的空间超感) [03:06] 🖥 GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents(GUI-360°:面向计算机使用智能体的大规模综合数据集与评测基准) [03:51] 📄 NVIDIA Nemotron Nano V2 VL(NVIDIA Nemotron Nano V2 VL:面向文档与长视频理解的高效视觉语言模型) [04:28] 🎟 The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms(多头注意力机制的强彩票假设) [05:12] 🕵 Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts(基准设计者应“在测试集上训练”以暴露可利用的非视觉捷径) [05:48] ⚽ Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots(人形机器人视觉驱动反应式足球技能学习) [06:18] 🔍 Contamination Detection for VLMs using Multi-Modal Semantic Perturbation(基于多模态语义扰动的视觉语言模型污染检测) [06:53] 🎧 How to Evaluate Speech Translation with Source-Aware Neural MT Metrics(如何借助源语言感知的神经机器翻译指标评估语音翻译) [07:32] 🚀 RDMA Point-to-Point Communication for LLM Systems(面向LLM系统的RDMA点对点通信) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    8 min
  5. 6 NOV

    2025.11.06 | 扩散模型省数据;音视频对口型

    本期的 9 篇论文如下: [00:17] 🚀 Diffusion Language Models are Super Data Learners(扩散语言模型是超级数据学习者) [01:06] 🎬 UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions(统一音视频生成的不对称跨模态交互方法) [01:42] 🧩 LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation(LEGO-Eval:面向具身3D环境合成工具增强细粒度评测) [02:25] 📊 Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning(Orion-MSP:面向表格上下文学习的多尺度稀疏注意力机制) [03:15] 📊 TabTune: A Unified Library for Inference and Fine-Tuning Tabular Foundation Models(TabTune:面向表格基础模型推理与微调的一站式统一库) [03:46] 🦾 Kinematify: Open-Vocabulary Synthesis of High-DoF Articulated Objects(Kinematify:开放词汇的高自由度关节物体合成) [04:30] 🧠 MME-CC: A Challenging Multi-Modal Evaluation Benchmark of Cognitive Capacity(MME-CC:一项面向多模态认知能力的挑战性评测基准) [05:06] 📈 LiveTradeBench: Seeking Real-World Alpha with Large Language Models(LiveTradeBench:用大模型在真实市场里挖掘超额收益) [05:55] 🔍 Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation(多模态嵌入器自适应决定何时增强查询的所罗门方法) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    7 min
  6. 5 NOV

    2025.11.05 | 向量草图测代码;先画后想补视觉

    本期的 15 篇论文如下: [00:21] 🖼 VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation(VCode:以SVG为符号视觉表征的多模态代码评测基准) [01:12] 🧠 When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought(当可视化成为推理第一步:MIRA视觉思维链基准测试) [01:48] ⚖ When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs(当模态冲突时:单模态推理不确定性如何左右多模态大模型的偏好) [02:36] 🪙 Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR(更短却更好:用易题作长度正则化实现节俭推理) [03:11] 🧠 Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer(Brain-IT:基于脑交互Transformer的fMRI图像重建) [03:49] 👁 Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization(别让VLA变盲:对齐视觉表征实现分布外泛化) [04:33] 🎨 LTD-Bench: Evaluating Large Language Models by Letting Them Draw(LTD-Bench:让大模型画画来测评空间推理力) [05:15] 🤖 TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System(TWIST2:可扩展、便携且全面的人形机器人数据采集系统) [06:01] 🗜 Can Visual Input Be Compressed? A Visual Token Compression Benchmark for Large Multimodal Models(视觉输入能否被压缩?面向大型多模态模型的视觉Token压缩基准) [06:46] 🏆 CodeClash: Benchmarking Goal-Oriented Software Engineering(CodeClash:面向目标的软件工程基准测试) [07:29] 🎭 VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models(VidEmo:面向情感中心视频基础模型的情感树推理) [08:03] 🧠 BRAINS: A Retrieval-Augmented System for Alzheimer's Detection and Monitoring(BRAINS:用于阿尔茨海默病检测与监测的检索增强系统) [08:42] 📊 ChartM$^3$: A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension(ChartM³:面向图表理解的多维多步视觉推理数据构建的多阶段代码驱动流水线) [09:45] 📊 TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data(TabDSR:表格复杂数值推理的分解-清洗-推理框架) [10:17] 🤖 iFlyBot-VLA Technical Report(iFlyBot-VLA技术报告:大规模视觉-语言-动作模型新框架) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    12 min

About

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

You Might Also Like