HuggingFace 每日AI论文速递

duan

5,0 (2)
TECHNOLOGIES
TOUS LES JOURS

每天10分钟，带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新，欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版，可在小红书搜索并关注【AI速递】

-7 H

2025.10.20 | RPC剪枝提速保准；OmniVinci小数据跨模态称王

本期的 15 篇论文如下： [00:20] 🧠 A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning（大模型推理中内部概率与自洽性桥接的理论研究） [01:04] 🌐 OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM（OmniVinci：面向全模态理解大模型的架构与数据增强） [01:44] 🎬 Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset（用百万级合成数据集放大指令式视频编辑） [02:28] ✂ NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks（NANO3D：无需训练与掩码的高效3D编辑新方法） [03:05] 🛰 Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery（Skyfall-GS：仅凭卫星影像合成沉浸式3D城市场景） [03:41] ⚠ Emergent Misalignment via In-Context Learning: Narrow in-context examples can produce broadly misaligned LLMs（情境学习中的突发错位：狭窄示例可让大模型广泛失准） [04:18] 🧬 Latent Diffusion Model without Variational Autoencoder（无需变分自编码器的潜在扩散模型） [04:52] 📸 LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal（LightsOut：基于扩散的延展补全提升镜头眩光去除） [05:30] 🧠 MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning（MorphoBench：随模型推理能力自适应难度的评测基准） [06:14] 🧠 A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning（A²FM：面向工具感知混合推理的自适应智能体基础模型） [06:56] 🗣 Language Models Model Language（语言模型即语言本身） [07:36] 🖼 BLIP3o-NEXT: Next Frontier of Native Image Generation（BLIP3o-NEXT：原生图像生成的下一个前沿） [08:30] 🌐 Paper2Web: Let's Make Your Paper Alive!（Paper2Web：让你的论文“活”起来！） [09:12] 🔬 Foundation Models for Scientific Discovery: From Paradigm Enhancement to Paradigm Transition（面向科学发现的基础模型：从范式增强到范式跃迁） [09:55] 🔍 Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents（探索以进化：通过主动在线探索扩展深度研究智能体的聚合逻辑）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
-2 J

【周末特辑】10月第3周最火AI论文 | 量化噪声变探索，单卡跑RL；冻结编码器放语义，DiT生成新纪录

本期的 5 篇论文如下： [00:40] TOP1(🔥154) | 🚀 QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs（QeRL：超越效率——面向大语言模型的量化增强强化学习） [02:19] TOP2(🔥138) | 🧠 Diffusion Transformers with Representation Autoencoders（基于表示自编码器的扩散Transformer） [04:54] TOP3(🔥134) | 🎯 Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model（空间强迫：面向视觉-语言-动作模型的隐式空间表征对齐） [07:55] TOP4(🔥125) | 🖥 D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI（D2E：利用桌面数据规模化视觉-动作预训练以迁移至具身智能） [10:30] TOP5(🔥110) | 📷 Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation（基于相机的统一多模态理解与生成模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

14 min
-3 J

2025.10.17 | AI眼镜预判式服务；视频生成补想象力

本期的 11 篇论文如下： [00:25] 👓 AI for Service: Proactive Assistance with AI Glasses（AI服务：AI眼镜的主动式协助） [01:06] 🎬 ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints（ImagerySearch：面向超越语义依赖约束的自适应测试时搜索视频生成） [01:43] 🎯 LaSeR: Reinforcement Learning with Last-Token Self-Rewarding（LaSeR：基于末词元自奖励的强化学习） [02:33] 🧩 TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar（TokDrift：当大模型用子词而代码用语法时） [03:35] 🧠 Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents（基于信息增益的策略优化：一种简单有效的多轮LLM智能体训练方法） [04:04] ⚡ Attention Is All You Need for KV Cache in Diffusion LLMs（扩散式大语言模型只需注意力即可搞定KV缓存） [04:45] 🤥 When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA（当模型撒谎时我们反而学到东西：用PsiloQA实现跨语言细粒度幻觉检测） [05:33] 📄 PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model（PaddleOCR-VL：以9亿参数超轻量多模态模型刷新多语言文档解析性能） [06:13] 🧠 VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning（VR-Thinker：通过“边看边想”推理提升视频奖励模型） [06:52] 📐 MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning（MathCanvas：面向多模态数学推理的内生视觉思维链） [07:39] 🧠 COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with Thought Processes（COIG-Writer：高质量中文创意写作数据集，附带思维过程）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

9 min
-4 J

2025.10.16 | UniMoE一统语音音乐；注意力图点亮大模型推理

本期的 15 篇论文如下： [00:21] 🎧 UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE（UniMoE-Audio：基于动态容量MoE的统一语音与音乐生成模型） [00:57] 🔍 Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization（注意力照亮大模型推理：预规划-锚定节奏实现细粒度策略优化） [01:38] ⚡ FlashWorld: High-quality 3D Scene Generation within Seconds（FlashWorld：秒级高质量3D场景生成） [02:06] 🐝 Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs（Bee：高质量语料与全栈套件解锁完全开源多模态大模型） [02:37] 🗣 InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue（InteractiveOmni：面向音视频多轮对话的统一全模态模型） [03:24] 🌍 PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning（PhysMaster：通过强化学习掌握视频生成的物理表征） [04:00] 🧪 LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models（LIBERO-Plus：视觉-语言-动作模型鲁棒性深度剖析） [04:43] 🚗 CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving（CVD-STORM：面向自动驾驶的跨视角视频扩散时空重建模型） [05:21] 🔍 Generative Universal Verifier as Multimodal Meta-Reasoner（生成式通用验证器：多模态元推理的反思引擎） [06:07] ⚖ ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs（ParallelBench：探明扩散式大模型并行解码的取舍） [06:43] 🎞 Trace Anything: Representing Any Video in 4D via Trajectory Fields（任意视频4D轨迹场表示：一次前馈即可还原每像素连续时空路径） [07:27] 🌍 Reasoning in Space via Grounding in the World（基于世界锚定的空间推理） [07:54] 🧠 The Role of Computing Resources in Publishing Foundation Model Research（计算资源在基础模型研究发表中的角色） [08:28] ⚖ UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning（UniME-V2：用多模态大模型当裁判，打造通用多模态表征） [09:05] 🤖 InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy（InternVLA-M1：面向通用机器人策略的空间引导视觉-语言-动作框架）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10 min
-5 J

2025.10.15 | 像素级自监督ViT刷新生成基准；多智能体评测网文翻译新标尺

本期的 14 篇论文如下： [00:20] 🖼 Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training（通过自监督预训练推进端到端像素空间生成建模） [00:53] 📚 DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation（DITING：面向网络小说翻译评测的多智能体基准框架） [01:41] 🌐 Scaling Language-Centric Omnimodal Representation Learning（以语言为中心的跨模态表征扩展学习） [02:29] 🎯 Detect Anything via Next Point Prediction（通过下一点预测检测万物） [03:02] ⚡ FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution（FlashVSR：迈向实时扩散式流媒体视频超分辨率） [03:40] 🎯 Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models（时间对齐引导：扩散模型中的流形采样） [04:16] 🧠 Dr.LLM: Dynamic Layer Routing in LLMs（Dr.LLM：大模型中的动态层级路由） [05:03] 🎯 Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model（空间强迫：面向视觉-语言-动作模型的隐式空间表征对齐） [05:50] 🤖 ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning（ERA：借助具身先验学习与在线强化学习将视觉-语言模型转化为具身智能体） [06:35] 🤖 Robot Learning: A Tutorial（机器人学习教程：从强化学习到多任务通用模型） [07:27] 🔄 SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models（SRUM：面向统一多模态模型的细粒度自奖励机制） [08:01] 🧠 Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models（面向扩散大语言模型的边界引导策略优化：内存高效的强化学习） [09:06] 🖼 UniFusion: Vision-Language Model as Unified Encoder in Image Generation（UniFusion：将视觉-语言模型统一作为图像生成的编码器） [09:43] 🧠 Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks（记忆即行动：面向长程智能体任务的自主上下文策展）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
-6 J

2025.10.14 | 量化误差变奖励，单卡训32B；面向多模态大模型的音视频评测基准

本期的 15 篇论文如下： [00:23] 🚀 QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs（QeRL：超越效率——面向大语言模型的量化增强强化学习） [01:22] 🧠 Diffusion Transformers with Representation Autoencoders（基于表示自编码器的扩散Transformer） [02:12] 🎬 OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs（OmniVideoBench：面向全向多模态大模型的音视频协同理解评测基准） [02:41] 🔄 Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States（潜变量精化解码：通过精化信念状态增强基于扩散的语言模型） [03:18] 🌊 RLFR: Extending Reinforcement Learning for LLMs with Flow Environment（RLFR：基于潜流环境扩展大模型强化学习） [04:11] 🔍 Spotlight on Token Perception for Multimodal Reinforcement Learning（多模态强化学习中token感知的光束聚焦） [04:50] 🎬 AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration（AVoCaDO：面向时序编排的音视频联合字幕生成器） [05:25] 🌐 DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training（DiT360：混合训练视角与全景数据的高保真全景图像生成） [05:56] 🧠 Demystifying Reinforcement Learning in Agentic Reasoning（揭开强化学习在智能体推理中的神秘面纱） [06:51] 🧮 Making Mathematical Reasoning Adaptive（让数学推理具备自适应性） [07:26] 🛡 Building a Foundational Guardrail for General Agentic Systems via Synthetic Data（面向通用智能体的基础护栏：基于合成数据的预执行安全框架） [08:05] 🧠 ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems（ACADREASON：用学术研究问题探索推理模型的极限） [08:43] 🎨 InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models（InternSVG：用多模态大模型统一搞定SVG理解、编辑与生成） [09:23] 🧾 FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs（FinAuditing：面向LLM评估的财务分类多文档基准） [10:09] 🧠 GIR-Bench: Versatile Benchmark for Generating Images with Reasoning（GIR-Bench：面向推理图像生成的多功能基准）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
13 OCT.

2025.10.13 | 桌面交互预训练解锁机器人潜能；统一模型赋予相机空间想象力

本期的 14 篇论文如下： [00:20] 🖥 D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI（D2E：利用桌面数据规模化视觉-动作预训练以迁移至具身智能） [01:13] 📷 Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation（基于相机的统一多模态理解与生成模型） [01:56] 🎨 TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling（TAG：抑制幻觉的扩散采样切向放大引导） [02:31] 🧠 Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs（多模态提示优化：为何不为多模态大模型释放全模态潜能） [03:05] 🚀 AutoPR: Let's Automate Your Academic Promotion!（AutoPR：让学术晋升一键自动化！） [03:39] 🧭 R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?（R-HORIZON：你的大推理模型在广度与深度上究竟能走多远？） [04:14] 🚀 Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels（Webscale-RL：把强化学习数据扩展到预训练体量的自动化流水线） [04:56] 🛰 SpaceVista: All-Scale Visual Spatial Reasoning from mm to km（SpaceVista：毫米到千米全尺度视觉空间推理） [05:37] 🎥 StreamingVLM: Real-Time Understanding for Infinite Video Streams（StreamingVLM：面向无限视频流的实时理解框架） [06:19] 🌐 KORMo: Korean Open Reasoning Model for Everyone（KORMo：人人可用的韩语开放推理模型） [06:42] ♻ Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting（别浪费错误：通过置信度加权利用负RL组） [07:25] 🧠 Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization（从推理到学习的桥梁：以复杂度分布外泛化揭穿幻觉） [08:16] ⚡ DISCO: Diversifying Sample Condensation for Efficient Model Evaluation（DISCO：以模型分歧为导向的样本浓缩加速评测） [08:56] 🚗 Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction（面向开放词汇占用预测的各向异性采样渐进高斯Transformer）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10 min
12 OCT.

【周末特辑】10月第2周最火AI论文 | 递归小模型刷爆推理榜；未来经验点亮零奖励学习

本期的 5 篇论文如下： [00:33] TOP1(🔥300) | 🧠 Less is More: Recursive Reasoning with Tiny Networks（小而精：用微型网络递归推理） [02:16] TOP2(🔥164) | 🌱 Agent Learning via Early Experience（基于早期经验的主体学习） [04:15] TOP3(🔥105) | 🧠 Apriel-1.5-15b-Thinker（Apriel-1.5-15B-Thinker：以小博大实现前沿多模态推理的15B开源模型） [06:17] TOP4(🔥97) | 🧠 MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization（MM-HELIX：以整体平台与自适应混合策略优化激发多模态长链反思推理） [08:45] TOP5(🔥88) | 🎬 Paper2Video: Automatic Video Generation from Scientific Papers（论文自动生成学术演讲视频）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12 min

Tout afficher (415)

sur 5

2 notes

支持！！

16 févr.

Fergie.W

希望能一直做下去

Création

duan
Années d’activité

2024 - 2025
Épisodes

415
Classification

Tous publics
Site web de l’émission

HuggingFace 每日AI论文速递

Technologies

Technologies

Toutes les 2 semaines
Affaires

Affaires

Tous les jours
Loisirs

Loisirs

Chaque semaine
Investissement

Investissement

Chaque semaine
Affaires

Affaires

Tous les mois
Investissement

Investissement

Chaque semaine
Culture et société

Culture et société

Tous les mois

HuggingFace 每日AI论文速递

2025.10.20 | RPC剪枝提速保准；OmniVinci小数据跨模态称王

【周末特辑】10月第3周最火AI论文 | 量化噪声变探索，单卡跑RL；冻结编码器放语义，DiT生成新纪录

2025.10.17 | AI眼镜预判式服务；视频生成补想象力

2025.10.16 | UniMoE一统语音音乐；注意力图点亮大模型推理

2025.10.15 | 像素级自监督ViT刷新生成基准；多智能体评测网文翻译新标尺

2025.10.14 | 量化误差变奖励，单卡训32B；面向多模态大模型的音视频评测基准

2025.10.13 | 桌面交互预训练解锁机器人潜能；统一模型赋予相机空间想象力

【周末特辑】10月第2周最火AI论文 | 递归小模型刷爆推理榜；未来经验点亮零奖励学习

Notes et avis

支持！！

À propos

Informations

Vous aimeriez peut‑être aussi

HuggingFace 每日AI论文速递

Épisodes

2025.10.20 | RPC剪枝提速保准；OmniVinci小数据跨模态称王

【周末特辑】10月第3周最火AI论文 | 量化噪声变探索，单卡跑RL；冻结编码器放语义，DiT生成新纪录

2025.10.17 | AI眼镜预判式服务；视频生成补想象力

2025.10.16 | UniMoE一统语音音乐；注意力图点亮大模型推理

2025.10.15 | 像素级自监督ViT刷新生成基准；多智能体评测网文翻译新标尺

2025.10.14 | 量化误差变奖励，单卡训32B；面向多模态大模型的音视频评测基准

2025.10.13 | 桌面交互预训练解锁机器人潜能；统一模型赋予相机空间想象力

【周末特辑】10月第2周最火AI论文 | 递归小模型刷爆推理榜；未来经验点亮零奖励学习

Notes et avis

À propos

Informations

Vous aimeriez peut‑être aussi