HuggingFace 每日AI论文速递

duan

5,0 (2)
ТЕХНОЛОГИИ
ЕЖЕДНЕВНО

每天10分钟，带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新，欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版，可在小红书搜索并关注【AI速递】

-12 Ч

【月末特辑】6月最火AI论文 | LLM通过自我反思提升性能；MiniMax-M1高效扩展测试计算。

本期的 10 篇论文如下： [00:37] TOP1(🔥258) | 💡 Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning（反思、重试、奖励：通过强化学习实现LLM的自我提升） [02:51] TOP2(🔥249) | 💡 MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention（MiniMax-M1：利用闪电注意力高效扩展测试时计算） [05:24] TOP3(🔥240) | 🤖 Reinforcement Pre-Training（强化预训练） [07:54] TOP4(🔥165) | 🧠 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning（超越80/20法则：高熵少数Token驱动LLM推理的有效强化学习） [09:53] TOP5(🔥134) | 🕰 Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA（明日依旧为真吗？多语种常青问题分类以提升可信赖的问答系统） [12:24] TOP6(🔥132) | 🧠 ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models（ProRL：延长的强化学习拓展大型语言模型的推理边界） [14:50] TOP7(🔥126) | 🧠 Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models（自信即全部：基于语言模型的小样本强化学习微调） [16:36] TOP8(🔥116) | 🧲 Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights（拖拽式大语言模型：零样本提示到权重） [18:34] TOP9(🔥108) | 🤖 SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics（SmolVLA：一种用于经济高效型机器人的视觉-语言-动作模型） [21:05] TOP10(🔥107) | 🩺 Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning（灵枢：用于统一多模态医学理解与推理的通用基础模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

24 мин.
-22 Ч

2025.07.04 | WebSailor提升LLM推理能力；LangScene-X优化3D场景重建。

本期的 15 篇论文如下： [00:22] 🧭 WebSailor: Navigating Super-human Reasoning for Web Agent（WebSailor：为Web Agent导航超人推理） [00:59] 🖼 LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion（LangScene-X：通过TriMap视频扩散重建可泛化的3D语言嵌入场景） [01:44] 🧬 IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction（IntFold：用于通用和专用生物分子结构预测的可控基础模型） [02:35] 👂 Heeding the Inner Voice: Aligning ControlNet Training via Intermediate Features Feedback（倾听内心的声音：通过中间特征反馈对齐ControlNet训练） [03:17] 🤝 Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy（Skywork-Reward-V2：通过人机协同扩展偏好数据标注） [04:00] 🖼 Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers（基于图像的多模态推理：基础、方法与未来前沿） [04:38] 🧠 Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving（布尔巴基：用于定理证明的自生成和目标条件MDP） [05:12] 🧠 Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search（解耦规划与执行：一种用于深度搜索的分层推理框架） [05:47] 💡 Fast and Simplex: 2-Simplicial Attention in Triton（快速且简明：Triton中的2-单形注意力机制） [06:33] 🧐 Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers（大型语言模型能否识别科学研究中的关键局限性？人工智能研究论文的系统性评估） [07:16] 🧩 Selecting and Merging: Towards Adaptable and Scalable Named Entity Recognition with Large Language Models（选择与合并：面向具有大型语言模型的可适应和可扩展的命名实体识别） [08:12] 🤖 Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs（自校正基准：揭示并解决大型语言模型中的自校正盲点） [08:51] 💡 Energy-Based Transformers are Scalable Learners and Thinkers（基于能量的Transformer是可扩展的学习者和思考者） [09:33] ⚙ AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training（AsyncFlow：用于高效大语言模型后训练的异步流式强化学习框架） [10:16] 🚀 ZeCO: Zero Communication Overhead Sequence Parallelism for Linear Attention（ZeCO：线性注意力机制的零通信开销序列并行）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12 мин.
-1 ДН.

2025.07.03 | 多模态模型提升短视频理解；动画生成保持颜色一致。

本期的 9 篇论文如下： [00:21] 🎬 Kwai Keye-VL Technical Report（Kwai Keye-VL 技术报告） [01:02] 🎨 LongAnimation: Long Animation Generation with Dynamic Global-Local Memory（LongAnimation：基于动态全局-局部记忆的长期动画生成） [01:50] 👁 Depth Anything at Any Condition（任意条件下的深度感知） [02:28] 🤖 A Survey on Vision-Language-Action Models: An Action Tokenization Perspective（视觉-语言-动作模型综述：一种动作Token化的视角） [03:11] 🪄 FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model（FreeMorph：基于扩散模型的免调参通用图像渐变） [03:51] 🖼 Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation（面向高效自回归图像生成的局部感知并行解码） [04:33] 🎬 STR-Match: Matching SpatioTemporal Relevance Score for Training-Free Video Editing（STR-Match: 匹配时空相关性得分的免训练视频编辑方法） [05:14] 📊 MARVIS: Modality Adaptive Reasoning over VISualizations（MARVIS：基于可视化的模态自适应推理） [05:51] 🗣 JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching（JAM-Flow：基于流匹配的联合音频-运动合成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

7 мин.
-2 ДН.

2025.07.02 | 多模态推理提升；双向嵌入优化

本期的 12 篇论文如下： [00:23] 💡 GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning（GLM-4.1V-Thinking：基于可扩展强化学习的通用多模态推理） [01:00] 🖼 MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings（MoCa：模态感知持续预训练提升双向多模态嵌入效果） [01:35] 🔬 SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks（SciArena：科学文献任务中基础模型的开放评估平台） [02:19] 🤔 Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning（数学推理能力是否能提升通用大语言模型的能力？理解大语言模型推理的迁移性） [02:59] 🎬 Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation（径向注意力：用于长视频生成的具有能量衰减的O(n log n)稀疏注意力机制） [03:37] 🤖 DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation（DiffuCoder：理解并改进用于代码生成的掩码扩散模型） [04:19] 🧠 HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context（HumanOmniV2：基于上下文理解到全模态推理） [04:53] 🧠 Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact（超越Token：从脑启发智能到通用人工智能的认知基础及其社会影响） [05:30] 💡 Data Efficacy for Language Model Training（语言模型训练中的数据效能） [06:05] 🎬 FreeLong++: Training-Free Long Video Generation via Multi-band SpectralFusion（FreeLong++：通过多频段频谱融合实现免训练长视频生成） [06:40] 🖼 IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering（IR3D-Bench：评估视觉-语言模型作为智能体进行逆向渲染的场景理解能力） [07:28] 🛡 Peccavi: Visual Paraphrase Attack Safe and Distortion Free Image Watermarking Technique for AI-Generated Images（Peccavi：一种针对AI生成图像的视觉释义攻击安全且无失真的图像水印技术）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

9 мин.
-3 ДН.

2025.07.01 | 多模态生成领先；视频扩散效率提升

本期的 15 篇论文如下： [00:21] 🖼 Ovis-U1 Technical Report（Ovis-U1 技术报告） [00:58] 🎬 VMoBA: Mixture-of-Block Attention for Video Diffusion Models（VMoBA：用于视频扩散模型的混合块注意力机制） [01:36] ✍ Calligrapher: Freestyle Text Image Customization（书法家：自由风格的文本图像定制） [02:21] 🖼 Listener-Rewarded Thinking in VLMs for Image Preferences（图像偏好：视觉语言模型中基于监听者奖励的思考） [03:04] 🧠 SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning（SPIRAL：基于零和博弈的自博弈通过多智能体多轮强化学习激励推理） [03:46] 📸 Consistent Time-of-Flight Depth Denoising via Graph-Informed Geometric Attention（基于图结构几何注意力机制的稳定ToF深度图像去噪） [04:29] 🧬 Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective（上下文演化提示：一种开放式、自复制的视角） [05:09] 🤔 Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling?（“顿悟时刻”再探：视觉语言模型能否在推理时扩展中实现真正的自我验证？） [05:58] 💾 MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation（MEMFOF：面向内存高效多帧光流估计的高分辨率训练） [06:38] 🚀 SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity（SparseLoRA：通过上下文稀疏性加速LLM微调） [07:23] 🏙 UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding（UrbanLLaVA：一个用于城市智能的、具备空间推理与理解能力的多模态大型语言模型） [08:01] 🧠 MARBLE: A Hard Benchmark for Multimodal Spatial Reasoning and Planning（MARBLE：一个用于多模态空间推理与规划的硬基准） [08:38] 🧰 Teaching a Language Model to Speak the Language of Tools（教语言模型说工具的语言） [09:16] ✂ VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs（VOCABTRIM：用于LLM高效推测解码的词汇表剪枝） [10:01] 🤖 RoboScape: Physics-informed Embodied World Model（RoboScape：物理信息驱动的具身世界模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 мин.
-4 ДН.

2025.06.30 | 3D视觉编辑；视频令牌压缩

本期的 14 篇论文如下： [00:26] 🎨 BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing（BlenderFusion：基于3D的视觉编辑和生成式合成） [00:59] ✂ LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs（LLaVA-Scissor：基于语义连通分量的视频LLM令牌压缩） [01:42] 🖼 XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation（XVerse：通过DiT调制实现对身份和语义属性的多主体一致性控制） [02:24] 🎬 ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models（ShotBench：视觉-语言模型中专家级电影理解） [03:05] 🖼 From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios（从理想到现实：面向真实场景的统一且数据高效的密集预测） [03:44] 🖼 MiCo: Multi-image Contrast for Reinforcement Visual Reasoning（MiCo：用于增强视觉推理的多图像对比学习） [04:24] 🧮 Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity（Pangu Pro MoE：用于高效稀疏性的分组专家混合模型） [05:06] 🗺 Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs（细粒度偏好优化提升视觉语言模型中的空间推理能力） [05:52] 🤖 Ark: An Open-source Python-based Framework for Robot Learning（Ark：一个用于机器人学习的开源Python框架） [06:36] 🎨 Noise Consistency Training: A Native Approach for One-Step Generator in Learning Additional Controls（噪声一致性训练：一种在学习额外控制时用于单步生成器的原生方法） [07:20] 🏎 The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements（自动化LLM竞速基准：复现NanoGPT的改进） [08:01] 🧠 Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training（Gazal-R1：通过参数高效的两阶段训练实现最先进的医学推理） [08:45] 🧮 Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning（Confucius3-Math：一个用于中国K-12数学学习的轻量级高性能推理大语言模型） [09:39] 👁 RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models（RetFiner：用于视网膜基础模型的视觉-语言精炼方案）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 мин.
-6 ДН.

【周末特辑】6月第5周最火AI论文 | 拖拽式大模型提升效率；法线光照恢复高精度。

本期的 5 篇论文如下： [00:42] TOP1(🔥107) | 🧲 Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights（拖拽式大语言模型：零样本提示到权重） [02:39] TOP2(🔥80) | 💡 Light of Normals: Unified Feature Representation for Universal Photometric Stereo（法线光照：用于通用光度立体的统一特征表示） [04:59] TOP3(🔥79) | 🖼 Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding（视觉引导分块：增强RAG的多模态文档理解方案） [07:07] TOP4(🔥66) | 🎨 OmniGen2: Exploration to Advanced Multimodal Generation（OmniGen2：迈向高级多模态生成的探索） [09:18] TOP5(🔥59) | 🖼 ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation（ShareGPT-4o-Image：通过GPT-4o级别的图像生成能力对齐多模态模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12 мин.
28 ИЮН.

2025.06.27 | 强化学习提升搜索效率；记忆增强生成逼真驾驶场景。

本期的 15 篇论文如下： [00:25] 🔍 MMSearch-R1: Incentivizing LMMs to Search（MMSearch-R1：激励大型多模态模型进行搜索） [00:59] 🚗 MADrive: Memory-Augmented Driving Scene Modeling（MADrive：基于记忆增强的驾驶场景建模） [01:43] 🤖 WorldVLA: Towards Autoregressive Action World Model（WorldVLA：面向自回归动作世界模型） [02:23] 💡 Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test（大型语言模型预训练中Grokking现象 কোথায়? 无需测试，监测从记忆到泛化的过程） [03:14] 🤖 Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge（Mind2Web 2：使用Agent-as-a-Judge评估自主搜索） [04:00] 🚗 SAM4D: Segment Anything in Camera and LiDAR Streams（SAM4D：相机和激光雷达流中的可分割一切） [04:40] 🎨 FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing（FaSTA$^*$: 快速-慢速工具路径智能体，通过子程序挖掘实现高效的多轮图像编辑） [05:16] 🤖 Whole-Body Conditioned Egocentric Video Prediction（全身条件下的自我中心视频预测） [05:53] 🧠 Arch-Router: Aligning LLM Routing with Human Preferences（Arch-Router：将LLM路由与人类偏好对齐） [06:35] 🎨 FairyGen: Storied Cartoon Video from a Single Child-Drawn Character（FairyGen：从单张儿童绘画生成故事驱动的卡通视频） [07:12] 🌐 DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster（DiLoCoX：一种用于去中心化集群的低通信大规模训练框架） [07:55] 🧬 An Agentic System for Rare Disease Diagnosis with Traceable Reasoning（基于Agent的罕见病诊断系统，具有可追溯的推理能力） [08:35] 🤖 HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges（HeurAgenix：利用大型语言模型解决复杂组合优化难题） [09:18] 🦘 Learning to Skip the Middle Layers of Transformers（学习跳过Transformer的中间层） [09:57] 🎵 MuseControlLite: Multifunctional Music Generation with Lightweight Conditioners（MuseControlLite：基于轻量级调节器的多功能音乐生成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 мин.

См. все (320)

из 5

Оценок: 2

支持！！

16 февр.

Fergie.W

希望能一直做下去

Автор

duan
Годы выхода

2024 - 2025
Выпуски

320
Ограничения

Без ненормативной лексики
Сайт подкаста

HuggingFace 每日AI论文速递

Технологии

Технологии

Каждые две недели
Технологии

Технологии

Еженедельно
Технологии

Технологии

Каждые две недели
Бизнес

Бизнес

Ежедневно
Инвестиции

Инвестиции

Еженедельно
Инвестиции

Инвестиции

Еженедельно
Деловые новости

Деловые новости

Ежедневно

HuggingFace 每日AI论文速递

【月末特辑】6月最火AI论文 | LLM通过自我反思提升性能；MiniMax-M1高效扩展测试计算。

2025.07.04 | WebSailor提升LLM推理能力；LangScene-X优化3D场景重建。

2025.07.03 | 多模态模型提升短视频理解；动画生成保持颜色一致。

2025.07.02 | 多模态推理提升；双向嵌入优化

2025.07.01 | 多模态生成领先；视频扩散效率提升

2025.06.30 | 3D视觉编辑；视频令牌压缩

【周末特辑】6月第5周最火AI论文 | 拖拽式大模型提升效率；法线光照恢复高精度。

2025.06.27 | 强化学习提升搜索效率；记忆增强生成逼真驾驶场景。

Оценки и отзывы

支持！！

Об этом подкасте

Информация

Вам может также понравиться

HuggingFace 每日AI论文速递

Выпуски

【月末特辑】6月最火AI论文 | LLM通过自我反思提升性能；MiniMax-M1高效扩展测试计算。

2025.07.04 | WebSailor提升LLM推理能力；LangScene-X优化3D场景重建。

2025.07.03 | 多模态模型提升短视频理解；动画生成保持颜色一致。

2025.07.02 | 多模态推理提升；双向嵌入优化

2025.07.01 | 多模态生成领先；视频扩散效率提升

2025.06.30 | 3D视觉编辑；视频令牌压缩

【周末特辑】6月第5周最火AI论文 | 拖拽式大模型提升效率；法线光照恢复高精度。

2025.06.27 | 强化学习提升搜索效率；记忆增强生成逼真驾驶场景。

Оценки и отзывы

Об этом подкасте

Информация

Вам может также понравиться