HuggingFace 每日AI论文速递

duan
HuggingFace 每日AI论文速递

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. -12 Ч

    【月末特辑】6月最火AI论文 | LLM通过自我反思提升性能;MiniMax-M1高效扩展测试计算。

    本期的 10 篇论文如下: [00:37] TOP1(🔥258) | 💡 Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning(反思、重试、奖励:通过强化学习实现LLM的自我提升) [02:51] TOP2(🔥249) | 💡 MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention(MiniMax-M1:利用闪电注意力高效扩展测试时计算) [05:24] TOP3(🔥240) | 🤖 Reinforcement Pre-Training(强化预训练) [07:54] TOP4(🔥165) | 🧠 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning(超越80/20法则:高熵少数Token驱动LLM推理的有效强化学习) [09:53] TOP5(🔥134) | 🕰 Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA(明日依旧为真吗?多语种常青问题分类以提升可信赖的问答系统) [12:24] TOP6(🔥132) | 🧠 ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models(ProRL:延长的强化学习拓展大型语言模型的推理边界) [14:50] TOP7(🔥126) | 🧠 Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models(自信即全部:基于语言模型的小样本强化学习微调) [16:36] TOP8(🔥116) | 🧲 Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights(拖拽式大语言模型:零样本提示到权重) [18:34] TOP9(🔥108) | 🤖 SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics(SmolVLA:一种用于经济高效型机器人的视觉-语言-动作模型) [21:05] TOP10(🔥107) | 🩺 Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning(灵枢:用于统一多模态医学理解与推理的通用基础模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    24 мин.
  2. -22 Ч

    2025.07.04 | WebSailor提升LLM推理能力;LangScene-X优化3D场景重建。

    本期的 15 篇论文如下: [00:22] 🧭 WebSailor: Navigating Super-human Reasoning for Web Agent(WebSailor:为Web Agent导航超人推理) [00:59] 🖼 LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion(LangScene-X:通过TriMap视频扩散重建可泛化的3D语言嵌入场景) [01:44] 🧬 IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction(IntFold:用于通用和专用生物分子结构预测的可控基础模型) [02:35] 👂 Heeding the Inner Voice: Aligning ControlNet Training via Intermediate Features Feedback(倾听内心的声音:通过中间特征反馈对齐ControlNet训练) [03:17] 🤝 Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy(Skywork-Reward-V2:通过人机协同扩展偏好数据标注) [04:00] 🖼 Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers(基于图像的多模态推理:基础、方法与未来前沿) [04:38] 🧠 Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving(布尔巴基:用于定理证明的自生成和目标条件MDP) [05:12] 🧠 Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search(解耦规划与执行:一种用于深度搜索的分层推理框架) [05:47] 💡 Fast and Simplex: 2-Simplicial Attention in Triton(快速且简明:Triton中的2-单形注意力机制) [06:33] 🧐 Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers(大型语言模型能否识别科学研究中的关键局限性?人工智能研究论文的系统性评估) [07:16] 🧩 Selecting and Merging: Towards Adaptable and Scalable Named Entity Recognition with Large Language Models(选择与合并:面向具有大型语言模型的可适应和可扩展的命名实体识别) [08:12] 🤖 Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs(自校正基准:揭示并解决大型语言模型中的自校正盲点) [08:51] 💡 Energy-Based Transformers are Scalable Learners and Thinkers(基于能量的Transformer是可扩展的学习者和思考者) [09:33] ⚙ AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training(AsyncFlow:用于高效大语言模型后训练的异步流式强化学习框架) [10:16] 🚀 ZeCO: Zero Communication Overhead Sequence Parallelism for Linear Attention(ZeCO:线性注意力机制的零通信开销序列并行) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    12 мин.
  3. -2 ДН.

    2025.07.02 | 多模态推理提升;双向嵌入优化

    本期的 12 篇论文如下: [00:23] 💡 GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning(GLM-4.1V-Thinking:基于可扩展强化学习的通用多模态推理) [01:00] 🖼 MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings(MoCa:模态感知持续预训练提升双向多模态嵌入效果) [01:35] 🔬 SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks(SciArena:科学文献任务中基础模型的开放评估平台) [02:19] 🤔 Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning(数学推理能力是否能提升通用大语言模型的能力?理解大语言模型推理的迁移性) [02:59] 🎬 Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation(径向注意力:用于长视频生成的具有能量衰减的O(n log n)稀疏注意力机制) [03:37] 🤖 DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation(DiffuCoder:理解并改进用于代码生成的掩码扩散模型) [04:19] 🧠 HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context(HumanOmniV2:基于上下文理解到全模态推理) [04:53] 🧠 Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact(超越Token:从脑启发智能到通用人工智能的认知基础及其社会影响) [05:30] 💡 Data Efficacy for Language Model Training(语言模型训练中的数据效能) [06:05] 🎬 FreeLong++: Training-Free Long Video Generation via Multi-band SpectralFusion(FreeLong++:通过多频段频谱融合实现免训练长视频生成) [06:40] 🖼 IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering(IR3D-Bench:评估视觉-语言模型作为智能体进行逆向渲染的场景理解能力) [07:28] 🛡 Peccavi: Visual Paraphrase Attack Safe and Distortion Free Image Watermarking Technique for AI-Generated Images(Peccavi:一种针对AI生成图像的视觉释义攻击安全且无失真的图像水印技术) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    9 мин.
  4. -3 ДН.

    2025.07.01 | 多模态生成领先;视频扩散效率提升

    本期的 15 篇论文如下: [00:21] 🖼 Ovis-U1 Technical Report(Ovis-U1 技术报告) [00:58] 🎬 VMoBA: Mixture-of-Block Attention for Video Diffusion Models(VMoBA:用于视频扩散模型的混合块注意力机制) [01:36] ✍ Calligrapher: Freestyle Text Image Customization(书法家:自由风格的文本图像定制) [02:21] 🖼 Listener-Rewarded Thinking in VLMs for Image Preferences(图像偏好:视觉语言模型中基于监听者奖励的思考) [03:04] 🧠 SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning(SPIRAL:基于零和博弈的自博弈通过多智能体多轮强化学习激励推理) [03:46] 📸 Consistent Time-of-Flight Depth Denoising via Graph-Informed Geometric Attention(基于图结构几何注意力机制的稳定ToF深度图像去噪) [04:29] 🧬 Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective(上下文演化提示:一种开放式、自复制的视角) [05:09] 🤔 Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling?(“顿悟时刻”再探:视觉语言模型能否在推理时扩展中实现真正的自我验证?) [05:58] 💾 MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation(MEMFOF:面向内存高效多帧光流估计的高分辨率训练) [06:38] 🚀 SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity(SparseLoRA:通过上下文稀疏性加速LLM微调) [07:23] 🏙 UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding(UrbanLLaVA:一个用于城市智能的、具备空间推理与理解能力的多模态大型语言模型) [08:01] 🧠 MARBLE: A Hard Benchmark for Multimodal Spatial Reasoning and Planning(MARBLE:一个用于多模态空间推理与规划的硬基准) [08:38] 🧰 Teaching a Language Model to Speak the Language of Tools(教语言模型说工具的语言) [09:16] ✂ VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs(VOCABTRIM:用于LLM高效推测解码的词汇表剪枝) [10:01] 🤖 RoboScape: Physics-informed Embodied World Model(RoboScape:物理信息驱动的具身世界模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 мин.
  5. -4 ДН.

    2025.06.30 | 3D视觉编辑;视频令牌压缩

    本期的 14 篇论文如下: [00:26] 🎨 BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing(BlenderFusion:基于3D的视觉编辑和生成式合成) [00:59] ✂ LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs(LLaVA-Scissor:基于语义连通分量的视频LLM令牌压缩) [01:42] 🖼 XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation(XVerse:通过DiT调制实现对身份和语义属性的多主体一致性控制) [02:24] 🎬 ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models(ShotBench:视觉-语言模型中专家级电影理解) [03:05] 🖼 From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios(从理想到现实:面向真实场景的统一且数据高效的密集预测) [03:44] 🖼 MiCo: Multi-image Contrast for Reinforcement Visual Reasoning(MiCo:用于增强视觉推理的多图像对比学习) [04:24] 🧮 Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity(Pangu Pro MoE:用于高效稀疏性的分组专家混合模型) [05:06] 🗺 Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs(细粒度偏好优化提升视觉语言模型中的空间推理能力) [05:52] 🤖 Ark: An Open-source Python-based Framework for Robot Learning(Ark:一个用于机器人学习的开源Python框架) [06:36] 🎨 Noise Consistency Training: A Native Approach for One-Step Generator in Learning Additional Controls(噪声一致性训练:一种在学习额外控制时用于单步生成器的原生方法) [07:20] 🏎 The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements(自动化LLM竞速基准:复现NanoGPT的改进) [08:01] 🧠 Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training(Gazal-R1:通过参数高效的两阶段训练实现最先进的医学推理) [08:45] 🧮 Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning(Confucius3-Math:一个用于中国K-12数学学习的轻量级高性能推理大语言模型) [09:39] 👁 RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models(RetFiner:用于视网膜基础模型的视觉-语言精炼方案) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 мин.
  6. 28 ИЮН.

    2025.06.27 | 强化学习提升搜索效率;记忆增强生成逼真驾驶场景。

    本期的 15 篇论文如下: [00:25] 🔍 MMSearch-R1: Incentivizing LMMs to Search(MMSearch-R1:激励大型多模态模型进行搜索) [00:59] 🚗 MADrive: Memory-Augmented Driving Scene Modeling(MADrive:基于记忆增强的驾驶场景建模) [01:43] 🤖 WorldVLA: Towards Autoregressive Action World Model(WorldVLA:面向自回归动作世界模型) [02:23] 💡 Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test(大型语言模型预训练中Grokking现象 কোথায়? 无需测试,监测从记忆到泛化的过程) [03:14] 🤖 Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge(Mind2Web 2:使用Agent-as-a-Judge评估自主搜索) [04:00] 🚗 SAM4D: Segment Anything in Camera and LiDAR Streams(SAM4D:相机和激光雷达流中的可分割一切) [04:40] 🎨 FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing(FaSTA$^*$: 快速-慢速工具路径智能体,通过子程序挖掘实现高效的多轮图像编辑) [05:16] 🤖 Whole-Body Conditioned Egocentric Video Prediction(全身条件下的自我中心视频预测) [05:53] 🧠 Arch-Router: Aligning LLM Routing with Human Preferences(Arch-Router:将LLM路由与人类偏好对齐) [06:35] 🎨 FairyGen: Storied Cartoon Video from a Single Child-Drawn Character(FairyGen:从单张儿童绘画生成故事驱动的卡通视频) [07:12] 🌐 DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster(DiLoCoX:一种用于去中心化集群的低通信大规模训练框架) [07:55] 🧬 An Agentic System for Rare Disease Diagnosis with Traceable Reasoning(基于Agent的罕见病诊断系统,具有可追溯的推理能力) [08:35] 🤖 HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges(HeurAgenix:利用大型语言模型解决复杂组合优化难题) [09:18] 🦘 Learning to Skip the Middle Layers of Transformers(学习跳过Transformer的中间层) [09:57] 🎵 MuseControlLite: Multifunctional Music Generation with Lightweight Conditioners(MuseControlLite:基于轻量级调节器的多功能音乐生成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 мин.

Оценки и отзывы

5
из 5
Оценок: 2

Об этом подкасте

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

Вам может также понравиться

Чтобы прослушивать выпуски с ненормативным контентом, войдите в систему.

Следите за новостями подкаста

Войдите в систему или зарегистрируйтесь, чтобы следить за подкастами, сохранять выпуски и получать последние обновления.

Выберите страну или регион

Африка, Ближний Восток и Индия

Азиатско-Тихоокеанский регион

Европа

Латинская Америка и страны Карибского бассейна

США и Канада