HuggingFace 每日AI论文速递

duan

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. 22H AGO

    2026.03.09 | LLM做视觉编码器;BandPO剪得更聪明

    【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:34] 🐧 Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders(Penguin-VL:探索基于LLM视觉编码器的VLM效率极限) [01:16] 🚀 BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning(BandPO:通过概率感知边界桥接信任区域与比率裁剪以用于大语言模型强化学习) [02:02] ⚡ Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model(8个令牌的规划:用于潜在世界模型的紧凑离散分词器) [02:43] 🚀 Progressive Residual Warmup for Language Model Pretraining(语言模型预训练的渐进残差预热方法) [03:41] 🎬 WildActor: Unconstrained Identity-Preserving Video Generation(WildActor:无约束身份保持的视频生成) [04:38] 🧠 RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies(RoboMME:机器人通用策略的记忆基准测试与理解) [05:31] 🤔 Reasoning Models Struggle to Control their Chains of Thought(推理模型难以控制其思维链) [06:13] 🧭 HiMAP-Travel: Hierarchical Multi-Agent Planning for Long-Horizon Constrained Travel(HiMAP-Travel:面向长时域约束旅行的分层多智能体规划) [06:59] ⚡ FlashPrefill: Instantaneous Pattern Discovery and Thresholding for Ultra-Fast Long-Context Prefilling(FlashPrefill:面向超快速长上下文预填充的即时模式发现与阈值化方法) [07:49] 🚀 $π$-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs(π-StepNFT:在线强化学习中,基于流的视觉语言动作模型需要更精细的步骤以适应更广的空间) [08:32] 🧠 Mario: Multimodal Graph Reasoning with Large Language Models(Mario:基于大语言模型的多模态图推理) [09:22] 🎬 Physical Simulator In-the-Loop Video Generation(物理模拟器在环视频生成) [10:14] 🧩 Dynamic Chunking Diffusion Transformer(动态分块扩散变换器) [11:05] 🔄 SLER-IR: Spherical Layer-wise Expert Routing for All-in-One Image Restoration(SLER-IR:面向一体化图像修复的球面分层专家路由框架) [11:50] 🧊 PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction(PixARMesh:基于自回归网格原生模型的单视角场景重建) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    13 min
  2. 2D AGO

    【月末特辑】2月最火AI论文 | VBVR百万视频炼视觉推理;OPUS同频优化器省算力

    【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 10 篇论文如下: [00:44] TOP1(🔥508) | 🧠 A Very Big Video Reasoning Suite(一个超大规模视频推理套件) [03:11] TOP2(🔥343) | 🚀 OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration(OPUS:迈向大规模语言模型预训练中高效且原理化的逐轮数据选择) [05:13] TOP3(🔥312) | 🤖 Green-VLA: Staged Vision-Language-Action Model for Generalist Robots(Green-VLA:面向通用机器人的分阶段视觉-语言-动作模型) [07:12] TOP4(🔥278) | 📈 Weak-Driven Learning: How Weak Agents make Strong Agents Stronger(弱驱动学习:弱智能体如何使强智能体更强) [09:36] TOP5(🔥262) | 🧠 ERNIE 5.0 Technical Report(ERNIE 5.0 技术报告) [11:23] TOP6(🔥259) | 💭 Does Your Reasoning Model Implicitly Know When to Stop Thinking?(你的推理模型是否隐含地知道何时停止思考?) [13:34] TOP7(🔥254) | 🤖 Kimi K2.5: Visual Agentic Intelligence(Kimi K2.5:视觉智能体) [15:14] TOP8(🔥240) | 🧠 Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs(少即是够:在大型语言模型特征空间中合成多样化数据) [17:24] TOP9(🔥216) | ⚖ VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training(VESPO:用于稳定离策略LLM训练的变分序列级软策略优化) [19:30] TOP10(🔥213) | 🍌 PaperBanana: Automating Academic Illustration for AI Scientists(PaperBanana:面向AI科学家的学术插图自动化生成框架) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    23 min
  3. 3D AGO

    2026.03.06 | MOOSE-Star打破科学发现训练壁垒;DARE让LLM秒变严谨统计助手

    【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:32] 🚀 MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier(MOOSE-Star:通过打破复杂性壁垒解锁科学发现的可处理训练) [01:50] 📊 DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval(DARE:通过分布感知检索实现LLM智能体与R统计生态系统的对齐) [02:39] 🧠 SkillNet: Create, Evaluate, and Connect AI Skills(SkillNet:创建、评估与连接AI技能) [03:28] 📱 RoboPocket: Improve Robot Policies Instantly with Your Phone(RoboPocket:用手机即时提升机器人策略) [04:15] 🎨 HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images(HiFi-Inpaint:面向高保真参考的图像修复,用于生成细节保留的人-物图像) [04:59] 🔍 AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios(AgentVista:在超挑战性真实视觉场景中评估多模态智能体) [05:37] 🔬 SageBwd: A Trainable Low-bit Attention(SageBwd:一种可训练的低比特注意力机制) [06:21] 🧠 Large Multimodal Models as General In-Context Classifiers(大型多模态模型作为通用上下文分类器) [07:01] ⚖ MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models(MASQuant:面向多模态大语言模型的模态感知平滑量化) [07:54] 🌍 DreamWorld: Unified World Modeling in Video Generation(DreamWorld:视频生成中的统一世界建模) [08:34] 🎬 RealWonder: Real-Time Physical Action-Conditioned Video Generation(RealWonder:基于物理仿真的实时动作条件视频生成) [09:35] 🧠 Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline(迈向多模态终身理解:数据集与智能体基线) [10:14] 🧠 On-Policy Self-Distillation for Reasoning Compression(基于策略自蒸馏的推理压缩方法) [10:55] 🤖 KARL: Knowledge Agents via Reinforcement Learning(KARL:基于强化学习的知识智能体) [11:39] 🔍 Locality-Attending Vision Transformer(局部性感知视觉Transformer) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    13 min
  4. 4D AGO

    2026.03.05 | Helios无限续写长视频;异构模型协同减半刷题

    【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:33] 🎬 Helios: Real Real-Time Long Video Generation Model(Helios:实时长视频生成模型) [01:12] 🤝 Heterogeneous Agent Collaborative Reinforcement Learning(异构智能体协作强化学习) [01:56] 🧠 T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning(T2S-Bench与思维结构:全面文本到结构推理的基准测试与提示技术) [02:50] 🤖 Proact-VL: A Proactive VideoLLM for Real-Time AI Companions(Proact-VL:面向实时AI伴侣的主动视频语言模型) [03:28] 🧠 MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning(MemSifter:通过结果驱动的代理推理卸载LLM记忆检索) [04:20] 🤖 ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors(ArtHOI:基于视频先验4D重建的关节化人-物交互合成) [05:12] 🎥 CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video(CubeComposer:基于透视视频的时空自回归4K 360°视频生成) [05:51] 🧠 Phi-4-reasoning-vision-15B Technical Report(Phi-4推理视觉-15B技术报告) [06:41] 🧠 Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory(Memex(RL):通过索引化经验记忆扩展长程LLM智能体) [07:20] 🔍 AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models(AgilePruner:针对大型视觉语言模型中自适应视觉令牌剪枝的注意力与多样性实证研究) [08:12] 🎬 RIVER: A Real-Time Interaction Benchmark for Video LLMs(RIVER:面向视频大语言模型的实时交互基准) [08:51] 🎬 InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions(InfinityStory:具有世界一致性和角色感知镜头转换的无限制视频生成) [09:43] 🧠 EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding(EmbodiedSplat:面向开放词汇3D场景理解的在线前馈语义3D高斯泼溅) [10:32] 🧠 BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning(BeamPERL:基于可验证奖励的参数高效强化学习使紧凑型大语言模型专精于结构化梁力学推理) [11:34] 🔄 SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration(SWE-CI:通过持续集成评估智能体在代码库维护中的能力) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    12 min
  5. 5D AGO

    2026.03.04 | 统一模型“对齐税”拖累理解;通用点云编码器一锅端多场景

    【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:32] 🔍 UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?(UniG2U-Bench:统一模型是否推动了多模态理解的发展?) [01:40] 🧩 Utonia: Toward One Encoder for All Point Clouds(Utonia:迈向适用于所有点云的统一编码器) [02:21] 🔍 BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing?(超越SWE:当前代码智能体能否在单仓库缺陷修复之外生存?) [03:00] 🔍 Beyond Language Modeling: An Exploration of Multimodal Pretraining(超越语言建模:多模态预训练的探索) [03:53] 🧠 Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models(超越长度缩放:融合广度与深度以优化生成式奖励模型) [04:40] 🎯 How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities(大型语言模型的可控性如何?跨行为粒度的统一评估) [05:16] 🎬 Kling-MotionControl Technical Report(Kling-MotionControl技术报告) [05:58] 🎬 Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance(Kiwi-Edit:基于指令与参考引导的通用视频编辑) [07:01] 🤖 Qwen3-Coder-Next Technical Report(Qwen3-Coder-Next技术报告) [07:46] 🧠 PRISM: Pushing the Frontier of Deep Think via Process Reward Model-Guided Inference(PRISM:通过过程奖励模型引导的推理推动深度思考前沿) [08:30] 🔍 InfoPO: Information-Driven Policy Optimization for User-Centric Agents(InfoPO:面向用户中心智能体的信息驱动策略优化) [09:29] 🔬 Surgical Post-Training: Cutting Errors, Keeping Knowledge(手术式后训练:精准修正错误,稳固保留知识) [10:14] 🎛 CFG-Ctrl: Control-Based Classifier-Free Diffusion Guidance(CFG-Ctrl:基于控制的Classifier-Free扩散引导) [10:53] 🎬 NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing(NOVA:稀疏控制与密集合成的无配对视频编辑框架) [11:58] ⚡ Spilled Energy in Large Language Models(大语言模型中的能量溢出) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    13 min
  6. 6D AGO

    2026.03.03 | 自适应扩展省算力;令牌秒变动效

    【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:30] ⚡ From Scale to Speed: Adaptive Test-Time Scaling for Image Editing(从规模到速度:图像编辑的自适应测试时扩展) [01:16] 🎨 OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens(OmniLottie:通过参数化Lottie令牌生成矢量动画) [01:57] 🤖 OpenAutoNLU: Open Source AutoML Library for NLU(OpenAutoNLU:面向自然语言理解的开源自动机器学习库) [02:37] 🧩 MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning(MMR-Life:拼凑真实生活场景以实现多模态多图像推理) [03:32] 📊 RubricBench: Aligning Model-Generated Rubrics with Human Standards(RubricBench:对齐模型生成的评分标准与人类标准) [04:16] 🧠 CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning(CHIMERA:用于通用大语言模型推理的紧凑合成数据集) [05:04] 🔍 VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection(VGGT-Det:挖掘VGGT内部先验实现无需传感器几何的多视角室内3D目标检测) [06:08] 🤖 CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification(CoVe:通过约束引导验证训练交互式工具使用智能体) [06:50] ⚙ SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale(SWE-rebench V2:大规模语言无关的软件工程任务集合) [07:37] 📊 Spectral Condition for $μ$P under Width-Depth Scaling(宽度-深度缩放下 $μ$P 的光谱条件) [08:21] 🎬 WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories(WorldStereo:通过3D几何记忆桥接相机引导视频生成与场景重建) [09:08] 🧠 LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model(LLaDA-o:一种高效且长度自适应的全能扩散模型) [10:11] 🧠 Efficient RLVR Training via Weighted Mutual Information Data Selection(基于加权互信息数据选择的高效RLVR训练方法) [10:48] 🧠 Learn Hard Problems During RL with Reference Guided Fine-tuning(通过参考引导微调在强化学习中学习难题) [11:51] 🔬 When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains(强化学习何时助力医学视觉语言模型?解构视觉、监督微调与强化学习的增益) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    13 min
  7. MAR 2

    2026.03.02 | dLLM统一扩散框架;SpatialScore让AI读懂空间

    【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:29] 🛠 dLLM: Simple Diffusion Language Modeling(dLLM:简单的扩散语言建模) [01:15] 🧠 Enhancing Spatial Understanding in Image Generation via Reward Modeling(通过奖励建模增强图像生成中的空间理解) [02:11] 🌍 Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets(在翻译中恢复:自动化基准测试与数据集翻译的高效流程) [03:08] ⚡ CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation(CUDA Agent:用于高性能CUDA内核生成的大规模智能体强化学习系统) [03:59] 🎬 Mode Seeking meets Mean Seeking for Fast Long Video Generation(模式寻求与均值寻求相遇:实现快速长视频生成) [04:44] 🧩 Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models(组合泛化要求视觉嵌入模型具备线性正交表示) [05:31] ⚡ LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding(LK损失函数:用于推测解码的直接接受率优化) [06:21] 🔍 CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era(CiteAudit:你引用了它,但你读过吗?大语言模型时代科学参考文献验证基准) [07:16] ⚡ Accelerating Masked Image Generation by Learning Latent Controlled Dynamics(通过学习潜在控制动力学加速掩码图像生成) [08:00] 🧠 Memory Caching: RNNs with Growing Memory(记忆缓存:具有增长记忆能力的循环神经网络) [08:38] 📊 InfoNCE Induces Gaussian Distribution(InfoNCE诱导高斯分布) [09:28] 🧠 Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks(Ref-Adv:探索多模态大语言模型在指代表达任务中的视觉推理能力) [10:28] ⚡ SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching(SenCache:基于敏感度感知的缓存加速扩散模型推理) [11:15] 🎬 LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding(LongVideo-R1:面向低成本长视频理解的智能导航) [11:53] ⚡ Vectorizing the Trie: Efficient Constrained Decoding for LLM-based Generative Retrieval on Accelerators(向量化字典树:面向加速器的高效约束解码用于基于LLM的生成式检索) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    13 min

Ratings & Reviews

5
out of 5
2 Ratings

About

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

You Might Also Like