HuggingFace 每日AI论文速递

duan

5.0（2則評分）
科技
每日更新

每天10分钟，带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新，欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版，可在小红书搜索并关注【AI速递】

8 小時前

2025.10.15 | 像素级自监督ViT刷新生成基准；多智能体评测网文翻译新标尺

本期的 14 篇论文如下： [00:20] 🖼 Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training（通过自监督预训练推进端到端像素空间生成建模） [00:53] 📚 DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation（DITING：面向网络小说翻译评测的多智能体基准框架） [01:41] 🌐 Scaling Language-Centric Omnimodal Representation Learning（以语言为中心的跨模态表征扩展学习） [02:29] 🎯 Detect Anything via Next Point Prediction（通过下一点预测检测万物） [03:02] ⚡ FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution（FlashVSR：迈向实时扩散式流媒体视频超分辨率） [03:40] 🎯 Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models（时间对齐引导：扩散模型中的流形采样） [04:16] 🧠 Dr.LLM: Dynamic Layer Routing in LLMs（Dr.LLM：大模型中的动态层级路由） [05:03] 🎯 Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model（空间强迫：面向视觉-语言-动作模型的隐式空间表征对齐） [05:50] 🤖 ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning（ERA：借助具身先验学习与在线强化学习将视觉-语言模型转化为具身智能体） [06:35] 🤖 Robot Learning: A Tutorial（机器人学习教程：从强化学习到多任务通用模型） [07:27] 🔄 SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models（SRUM：面向统一多模态模型的细粒度自奖励机制） [08:01] 🧠 Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models（面向扩散大语言模型的边界引导策略优化：内存高效的强化学习） [09:06] 🖼 UniFusion: Vision-Language Model as Unified Encoder in Image Generation（UniFusion：将视觉-语言模型统一作为图像生成的编码器） [09:43] 🧠 Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks（记忆即行动：面向长程智能体任务的自主上下文策展）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 分鐘
1 天前

2025.10.14 | 量化误差变奖励，单卡训32B；面向多模态大模型的音视频评测基准

本期的 15 篇论文如下： [00:23] 🚀 QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs（QeRL：超越效率——面向大语言模型的量化增强强化学习） [01:22] 🧠 Diffusion Transformers with Representation Autoencoders（基于表示自编码器的扩散Transformer） [02:12] 🎬 OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs（OmniVideoBench：面向全向多模态大模型的音视频协同理解评测基准） [02:41] 🔄 Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States（潜变量精化解码：通过精化信念状态增强基于扩散的语言模型） [03:18] 🌊 RLFR: Extending Reinforcement Learning for LLMs with Flow Environment（RLFR：基于潜流环境扩展大模型强化学习） [04:11] 🔍 Spotlight on Token Perception for Multimodal Reinforcement Learning（多模态强化学习中token感知的光束聚焦） [04:50] 🎬 AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration（AVoCaDO：面向时序编排的音视频联合字幕生成器） [05:25] 🌐 DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training（DiT360：混合训练视角与全景数据的高保真全景图像生成） [05:56] 🧠 Demystifying Reinforcement Learning in Agentic Reasoning（揭开强化学习在智能体推理中的神秘面纱） [06:51] 🧮 Making Mathematical Reasoning Adaptive（让数学推理具备自适应性） [07:26] 🛡 Building a Foundational Guardrail for General Agentic Systems via Synthetic Data（面向通用智能体的基础护栏：基于合成数据的预执行安全框架） [08:05] 🧠 ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems（ACADREASON：用学术研究问题探索推理模型的极限） [08:43] 🎨 InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models（InternSVG：用多模态大模型统一搞定SVG理解、编辑与生成） [09:23] 🧾 FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs（FinAuditing：面向LLM评估的财务分类多文档基准） [10:09] 🧠 GIR-Bench: Versatile Benchmark for Generating Images with Reasoning（GIR-Bench：面向推理图像生成的多功能基准）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 分鐘
2 天前

2025.10.13 | 桌面交互预训练解锁机器人潜能；统一模型赋予相机空间想象力

本期的 14 篇论文如下： [00:20] 🖥 D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI（D2E：利用桌面数据规模化视觉-动作预训练以迁移至具身智能） [01:13] 📷 Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation（基于相机的统一多模态理解与生成模型） [01:56] 🎨 TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling（TAG：抑制幻觉的扩散采样切向放大引导） [02:31] 🧠 Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs（多模态提示优化：为何不为多模态大模型释放全模态潜能） [03:05] 🚀 AutoPR: Let's Automate Your Academic Promotion!（AutoPR：让学术晋升一键自动化！） [03:39] 🧭 R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?（R-HORIZON：你的大推理模型在广度与深度上究竟能走多远？） [04:14] 🚀 Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels（Webscale-RL：把强化学习数据扩展到预训练体量的自动化流水线） [04:56] 🛰 SpaceVista: All-Scale Visual Spatial Reasoning from mm to km（SpaceVista：毫米到千米全尺度视觉空间推理） [05:37] 🎥 StreamingVLM: Real-Time Understanding for Infinite Video Streams（StreamingVLM：面向无限视频流的实时理解框架） [06:19] 🌐 KORMo: Korean Open Reasoning Model for Everyone（KORMo：人人可用的韩语开放推理模型） [06:42] ♻ Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting（别浪费错误：通过置信度加权利用负RL组） [07:25] 🧠 Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization（从推理到学习的桥梁：以复杂度分布外泛化揭穿幻觉） [08:16] ⚡ DISCO: Diversifying Sample Condensation for Efficient Model Evaluation（DISCO：以模型分歧为导向的样本浓缩加速评测） [08:56] 🚗 Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction（面向开放词汇占用预测的各向异性采样渐进高斯Transformer）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10 分鐘
3 天前

【周末特辑】10月第2周最火AI论文 | 递归小模型刷爆推理榜；未来经验点亮零奖励学习

本期的 5 篇论文如下： [00:33] TOP1(🔥300) | 🧠 Less is More: Recursive Reasoning with Tiny Networks（小而精：用微型网络递归推理） [02:16] TOP2(🔥164) | 🌱 Agent Learning via Early Experience（基于早期经验的主体学习） [04:15] TOP3(🔥105) | 🧠 Apriel-1.5-15b-Thinker（Apriel-1.5-15B-Thinker：以小博大实现前沿多模态推理的15B开源模型） [06:17] TOP4(🔥97) | 🧠 MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization（MM-HELIX：以整体平台与自适应混合策略优化激发多模态长链反思推理） [08:45] TOP5(🔥88) | 🎬 Paper2Video: Automatic Video Generation from Scientific Papers（论文自动生成学术演讲视频）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12 分鐘
5 天前

2025.10.10 | 早期经验的Agent Learning；图文交错反思链跃升至24.9%

本期的 14 篇论文如下： [00:16] 🌱 Agent Learning via Early Experience（基于早期经验的主体学习） [00:50] 🧠 MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization（MM-HELIX：以整体平台与自适应混合策略优化激发多模态长链反思推理） [01:42] 🧪 From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning（从“是什么”到“为什么”：面向循证化学反应条件推理的多智能体系统） [02:19] 🎬 UniVideo: Unified Understanding, Generation, and Editing for Videos（UniVideo：统一理解、生成与编辑视频的多模态框架） [03:01] 🧠 When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs（当思想邂逅事实：面向长上下文语言模型的可复用推理） [03:43] 🧠 Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning（元认知增强推理模型：自对齐强化学习） [04:25] 🧠 MemMamba: Rethinking Memory Patterns in State Space Model（MemMamba：重新思考状态空间模型中的记忆模式） [05:17] 🛡 The Alignment Waltz: Jointly Training Agents to Collaborate for Safety（对齐圆舞曲：联合训练智能体协同守护安全） [05:53] 🎯 Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense（混合强化：奖励稀疏时，密集信号更胜一筹） [06:40] 🧪 NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents（NewtonBench：评测大模型智能体在通用科学定律发现中的基准） [07:17] 🪚 DeepPrune: Parallel Scaling without Inter-trace Redundancy（DeepPrune：并行扩展中消除跨路径冗余的高效推理框架） [07:54] 🚀 Training-Free Group Relative Policy Optimization（免训练群组相对策略优化） [08:24] 🪄 ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation（ARTDECO：面向高效高保真即时三维重建的结构化场景表征） [08:55] 🤥 LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions（大模型在欺骗性样本与偏见人机交互中意外学会欺骗：不诚实行为的新兴错位）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10 分鐘
6 天前

2025.10.09 | Ming-UniVision统一视觉词表；KV-Cache直连让大模型秒聊

本期的 15 篇论文如下： [00:21] 🔄 Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer（Ming-UniVision：用统一连续视觉词表打通图像理解与生成） [00:59] 🧠 Cache-to-Cache: Direct Semantic Communication Between Large Language Models（缓存到缓存：大模型间的直接语义通信） [01:32] 🌀 Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding（Lumina-DiMOO：面向多模态生成与理解的离散扩散大模型） [02:07] 🧠 SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models（SHANKS：口语模型边听边想的同步推理框架） [03:06] 🤖 RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training（RLinf-VLA：面向VLA模型强化学习训练的统一高效框架） [04:02] 🎬 MATRIX: Mask Track Alignment for Interaction-aware Video Generation（MATRIX：面向交互感知视频生成的掩码轨迹对齐） [04:51] 🎯 Vibe Checker: Aligning Code Evaluation with Human Preference（Vibe Checker：让代码评估对齐人类偏好） [05:44] 🤖 Multi-Agent Tool-Integrated Policy Optimization（多智能体工具集成策略优化） [06:24] 🧠 CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling（风暴前夜：解锁优化建模原生推理潜能的轻量化矫正框架） [06:59] ✂ OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot（OBS-Diff：一次性精准剪枝扩散模型） [07:52] 🧠 Artificial Hippocampus Networks for Efficient Long-Context Modeling（面向高效长上下文建模的人工海马网络） [08:30] 🔍 Revisiting Long-context Modeling from Context Denoising Perspective（基于上下文降噪视角的长文本建模再审视） [09:11] 🧠 Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought（推动多语言推理模型：语言混合思维链新范式） [09:51] 💥 Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention（低精度Transformer训练为何失败：Flash Attention失效机理剖析） [10:37] ⚡ Native Hybrid Attention for Efficient Sequence Modeling（原生混合注意力高效序列建模）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12 分鐘
10月8日

2025.10.08 | TaTToo用外挂代码干翻大模型；4B小模型32步逼近闭源巨头

本期的 15 篇论文如下： [00:24] 📊 TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning（TaTToo：面向表格推理测试时扩展的“工具落地思维”过程奖励模型） [00:57] 🔍 Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs（Fathom-DeepResearch：解锁小模型长程信息检索与综合的钥匙） [01:39] 🚀 Fast-dLLM v2: Efficient Block-Diffusion LLM（Fast-dLLM v2：高效的块扩散大语言模型） [02:30] 🧑 CoDA: Coding LM via Diffusion Adaptation（CoDA：基于扩散适配的轻量级代码生成模型） [03:01] 🧩 Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning（规模化代码辅助思维链与指令以增强模型推理） [03:52] ⚖ ASPO: Asymmetric Importance Sampling Policy Optimization（ASPO：非对称重要性采样策略优化） [04:34] 🔗 Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context（混合机制：语言模型如何在上下文中检索绑定实体） [05:15] 🧠 AInstein: Assessing the Feasibility of AI-Generated Approaches to Research Problems（AInstein：评估AI生成科研方案可行性的研究框架） [05:51] 🪂 Refusal Falls off a Cliff: How Safety Alignment Fails in Reasoning?（拒绝断崖：安全对齐在推理中为何崩塌） [06:35] 🌍 HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video（HoloScene：单视频生成可交互3D仿真世界） [07:22] ⚡ TensorBLEU: Vectorized GPU-based BLEU Score Implementation for Per-Sentence In-Training Evaluation（TensorBLEU：面向逐句训练评估的向量化GPU加速BLEU分数实现） [08:09] 🎯 Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization（边缘自适应DPO：利用奖励模型实现偏好优化的粒度控制） [09:00] 🩺 Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation（基于多模态大语言模型的离散扩散模型实现统一医学多模态生成） [09:46] 🧠 MixReasoning: Switching Modes to Think（混合推理：动态切换思考模式） [10:20] ⚡ LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation（LightCache：面向视频生成的内存高效、无需训练的加速方法）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 分鐘
10月7日

2025.10.07 | 论文秒变演讲；Video-LMM后训练突破

本期的 15 篇论文如下： [00:21] 🎬 Paper2Video: Automatic Video Generation from Scientific Papers（论文自动生成学术演讲视频） [00:55] 🎬 Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models（Video-LMM后训练：深入剖析大型多模态模型的视频推理） [01:38] 🎬 VChain: Chain-of-Visual-Thought for Reasoning in Video Generation（VChain：面向视频生成推理的视觉思维链） [02:14] 👻 Imperceptible Jailbreaking against Large Language Models（针对大语言模型的隐形越狱攻击） [02:56] 🌳 MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information（MITS：基于点互信息的树搜索增强大模型推理） [03:30] 🧬 Hybrid Architectures for Language Models: Systematic Analysis and Design Insights（语言模型混合架构：系统剖析与设计洞见） [04:07] 📊 Factuality Matters: When Image Generation and Editing Meet Structured Visuals（事实至关重要：当图像生成与编辑遇上结构化视觉） [04:59] 🔄 Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models（反应式Transformer：事件驱动的实时有状态对话模型） [05:55] ⚖ Judging with Confidence: Calibrating Autoraters to Preference Distributions（置信评判：将自动评分器校准到偏好分布） [06:44] 🎯 Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training（Reinforce-Ada：面向Reinforce风格LLM训练的自适应采样框架） [07:27] 📏 Optimal Scaling Needs Optimal Norm（最优扩放需要最优范数） [07:51] 🔬 Code4MeV2: a Research-oriented Code-completion Platform（Code4MeV2：面向研究的代码补全平台） [08:31] 🪞 Self-Reflective Generation at Test Time（测试时自反思生成） [09:15] 🔄 SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs（SwiReasoning：在显式与潜空间之间切换思维，实现帕累托更优的推理大模型） [10:00] 👀 Watch and Learn: Learning to Use Computers from Online Videos（观看与学习：从在线视频中学习使用计算机）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 分鐘

顯示全部 (411)

（滿分 5 顆星）

2 則評分

支持！！

2月16日

Fergie.W

希望能一直做下去

創作者

duan
活躍年代

2024年 - 2025年
集數

411
年齡分級

兒少適宜
節目網站

HuggingFace 每日AI论文速递

科技

科技

隔週更新
科技

科技

隔週更新
創業

創業

每週更新
投資

投資

每週更新
商業

商業

每日更新
休閒

休閒

每週更新
投資

投資

每週更新

HuggingFace 每日AI论文速递

2025.10.15 | 像素级自监督ViT刷新生成基准；多智能体评测网文翻译新标尺

2025.10.14 | 量化误差变奖励，单卡训32B；面向多模态大模型的音视频评测基准

2025.10.13 | 桌面交互预训练解锁机器人潜能；统一模型赋予相机空间想象力

【周末特辑】10月第2周最火AI论文 | 递归小模型刷爆推理榜；未来经验点亮零奖励学习

2025.10.10 | 早期经验的Agent Learning；图文交错反思链跃升至24.9%

2025.10.09 | Ming-UniVision统一视觉词表；KV-Cache直连让大模型秒聊

2025.10.08 | TaTToo用外挂代码干翻大模型；4B小模型32步逼近闭源巨头

2025.10.07 | 论文秒变演讲；Video-LMM后训练突破

評分與評論

支持！！

簡介

資訊

你可能也會喜歡

HuggingFace 每日AI论文速递

集數

2025.10.15 | 像素级自监督ViT刷新生成基准；多智能体评测网文翻译新标尺

2025.10.14 | 量化误差变奖励，单卡训32B；面向多模态大模型的音视频评测基准

2025.10.13 | 桌面交互预训练解锁机器人潜能；统一模型赋予相机空间想象力

【周末特辑】10月第2周最火AI论文 | 递归小模型刷爆推理榜；未来经验点亮零奖励学习

2025.10.10 | 早期经验的Agent Learning；图文交错反思链跃升至24.9%

2025.10.09 | Ming-UniVision统一视觉词表；KV-Cache直连让大模型秒聊

2025.10.08 | TaTToo用外挂代码干翻大模型；4B小模型32步逼近闭源巨头

2025.10.07 | 论文秒变演讲；Video-LMM后训练突破

評分與評論

簡介

資訊

你可能也會喜歡