HuggingFace 每日AI论文速递

duan

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. 11 G. TEMU

    2025.10.02 | MCTS破局RLVR瓶颈;GEM开源智能体训练场

    本期的 15 篇论文如下: [00:19] 🧠 DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search(DeepSearch:以蒙特卡洛树搜索破解强化学习可验证奖励瓶颈) [01:20] 🤖 GEM: A Gym for Agentic LLMs(GEM:面向智能体大模型的开放训练场) [01:57] 🧠 VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators(VLA-RFT:基于世界模拟器与验证奖励的视觉-语言-动作强化微调) [02:36] 🎒 Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation(背包强化学习:通过优化预算分配解锁大模型探索潜能) [03:06] 🎬 Code2Video: A Code-centric Paradigm for Educational Video Generation(Code2Video:面向教育视频生成的代码中心范式) [03:41] ⚙ PIPer: On-Device Environment Setup via Online Reinforcement Learning(PIPer:基于在线强化学习的设备端环境自动配置) [04:11] 🗜 ACON: Optimizing Context Compression for Long-horizon LLM Agents(ACON:面向长程LLM智能体的上下文压缩优化) [04:52] 🔍 Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls(为何Transformer学不会乘法?逆向工程揭示长程依赖陷阱) [05:22] ⚖ BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses(BiasFreeBench:面向大语言模型去偏响应评测的统一基准) [06:01] ⚡ Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution(Flash-Searcher:基于DAG并行执行的极速高效网络智能体) [06:42] 🚀 BroRL: Scaling Reinforcement Learning via Broadened Exploration(BroRL:通过拓宽探索规模来扩展强化学习) [07:25] 📊 Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum(超越对数似然:面向模型能力连续谱的监督微调概率目标) [08:02] 🎯 On Predictability of Reinforcement Learning Dynamics for Large Language Models(论大型语言模型强化学习动力学的可预测性) [08:31] 🖥 GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness(GUI-KV:面向具备时空感知的高效GUI智能体的KV缓存方案) [09:17] 🧠 Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned(训练视觉-语言过程奖励模型以实现多模态推理测试时扩展:关键洞见与经验总结) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  2. 1 DZIEŃ TEMU

    【月末特辑】9月最火AI论文 | 群体RL共享降本;SAPO让旧机也能训大模型

    本期的 10 篇论文如下: [00:29] TOP1(🔥640) | 🤝 Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing(共享即关爱:基于集体RL经验共享的高效大模型后训练) [02:49] TOP2(🔥341) | 🔒 A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code(A.S.E:一个用于评估AI生成代码安全的仓库级基准) [04:59] TOP3(🔥218) | 🤖 VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model(VLA-Adapter:面向小型视觉-语言-动作模型的有效范式) [07:07] TOP4(🔥212) | 🤖 The Landscape of Agentic Reinforcement Learning for LLMs: A Survey(面向大语言模型的智能体强化学习全景:一项综述) [09:17] TOP5(🔥207) | 🤔 Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth(废话学:用深度解读无意义内容挑战大型语言模型) [11:19] TOP6(🔥183) | 🤔 Why Language Models Hallucinate(语言模型为何产生幻觉) [13:06] TOP7(🔥174) | 🧠 A Survey of Reinforcement Learning for Large Reasoning Models(大型推理模型的强化学习综述) [15:32] TOP8(🔥160) | 🎬 LongLive: Real-time Interactive Long Video Generation(LongLive:实时交互式长视频生成框架) [18:13] TOP9(🔥145) | 💡 Reverse-Engineered Reasoning for Open-Ended Generation(面向开放式生成的逆向工程推理) [20:27] TOP10(🔥140) | 🤖 A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers(科学大型语言模型综述:从数据基础到智能体前沿) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    23 min
  3. 1 DZIEŃ TEMU

    2025.10.01 | 自对弈零标注训练;MCP代理深度评测

    本期的 15 篇论文如下: [00:20] 🎮 Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play(Vision-Zero:基于策略化博弈自对弈的可扩展视觉语言模型自我提升) [00:59] 🔥 MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use(MCPMark:面向真实且全面的MCP应用场景的压力测试基准) [01:36] 🐣 The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain(幼龙破壳: Transformer 与大脑模型之间缺失的环节) [02:10] 🤥 TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning(TruthRL:通过强化学习激励大模型说真话) [02:55] 🌊 OceanGym: A Benchmark Environment for Underwater Embodied Agents(OceanGym:面向水下具身智能体的综合基准环境) [03:41] ⚡ DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder(DC-VideoGen:基于深度压缩视频自编码器的高效视频生成) [04:14] 🔍 Who's Your Judge? On the Detectability of LLM-Generated Judgments(谁是你的评审?大模型生成评审意见的检测性研究) [04:59] ✂ Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning(赢得剪枝豪赌:统一样本-令牌剪枝的高效监督微调新方法) [05:45] 👁 Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training(未见先识:从语言预训练解密大模型视觉先验) [06:24] 🧠 Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training(思维火花!后训练阶段推理模型中涌现的专用注意力头) [07:09] 🧪 VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications(VitaBench:面向真实场景多功能交互任务的LLM智能体评测基准) [07:42] ⚡ dParallel: Learnable Parallel Decoding for dLLMs(dParallel:面向扩散大语言模型的可学习并行解码) [08:28] 🎯 IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance(IMG:通过隐式多模态引导校准扩散模型) [09:15] 🎬 MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation(MotionRAG:基于运动检索增强的图像到视频生成) [10:12] 🐬 Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention(基于离散唇部语义与多尺度全局-局部注意力的高效视听语音分离) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  4. 2 DNI TEMU

    2025.09.30 | SLA稀疏注意力砍算力;StableToken抗噪不训模

    本期的 15 篇论文如下: [00:22] ⚡ SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention(SLA:通过可微调稀疏线性注意力突破扩散Transformer的稀疏性极限) [01:05] 🗣 StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs(StableToken:一种面向韧性SpeechLLM的噪声鲁棒语义语音分词器) [01:54] 🎮 Multiplayer Nash Preference Optimization(多玩家纳什偏好优化) [02:57] 🔗 RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark(RealUnify:统一模型真的因“统一”而更强吗?综合基准揭晓答案) [03:44] 🎨 OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing(OpenGPT-4o-Image:面向高级图像生成与编辑的大规模综合数据集) [04:28] 🧠 Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR(超越探索-利用权衡:面向RLVR中LLM推理的隐状态方法) [05:05] 🧩 Visual Jigsaw Post-Training Improves MLLMs(视觉拼图后训练提升多模态大模型) [05:37] 🎬 SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer(SANA-Video:基于分块线性注意力Transformer的高效视频扩散生成模型) [06:15] 🔬 Democratizing AI scientists using ToolUniverse(用ToolUniverse普及AI科学家) [06:59] 🧠 When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance(推理何时真正奏效?对推理贡献度的受控研究) [07:31] 📊 GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts(GSM8K-V:视觉语言模型能否解决视觉语境下的小学数学应用题?) [08:04] 🖼 EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling(EditScore:借助高保真奖励建模解锁图像编辑在线强化学习) [08:54] 🚀 SparseD: Sparse Attention for Diffusion Language Models(SparseD:面向扩散语言模型的稀疏注意力机制) [09:40] 🎛 EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering(EasySteer:高性能可扩展LLM推理控制统一框架) [10:32] 🧠 Towards Personalized Deep Research: Benchmarks and Evaluations(迈向个性化深度研究:基准与评估) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    12 min
  5. 3 DNI TEMU

    2025.09.29 | 实时长视频边聊边播;分位数基线稳控推理熵

    本期的 15 篇论文如下: [00:20] 🎬 LongLive: Real-time Interactive Long Video Generation(LongLive:实时交互式长视频生成框架) [00:56] 🎯 Quantile Advantage Estimation for Entropy-Safe Reasoning(用于熵安全推理的分位数优势估计) [01:34] 📄 MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing(MinerU2.5:面向高效高分辨率文档解析的解耦视觉-语言模型) [02:11] 🧠 EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning(EPO:面向LLM智能体强化学习的熵正则策略优化) [03:08] 🧠 Variational Reasoning for Language Models(语言模型的变分推理框架) [03:37] 💬 Language Models Can Learn from Verbal Feedback Without Scalar Rewards(无需标量奖励,语言模型也能从语言反馈中学习) [04:32] 🔍 ReviewScore: Misinformed Peer Review Detection with Large Language Models(ReviewScore:用大模型揪出“跑偏”的同行评审) [05:12] 🎯 CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning(CapRL:用强化学习激发稠密图像描述潜能) [05:49] 🪄 MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning(MesaTask:面向任务驱动的桌面场景生成与3D空间推理) [06:32] 🎯 No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping(零方差提示不浪费:基于熵引导优势塑造的LLM强化学习新范式) [07:14] 🗣 VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing(VoiceAssistant-Eval:横跨听、说、看的AI助手基准测评) [07:58] 🧭 UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios(UltraHorizon:在长周期场景中评估智能体能力的基准) [08:29] 🖼 LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer(LucidFlux:无需文字描述的大规模扩散Transformer通用图像修复) [09:16] 🌐 WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning(WebGen-Agent:借助多级反馈与步骤级强化学习提升交互式网页生成) [09:49] 🔄 SPARK: Synergistic Policy And Reward Co-Evolving Framework(SPARK:策略与奖励协同演化的强化学习框架) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  6. 6 DNI TEMU

    2025.09.26 | SciReasoner八项全能;MMR1模糊区炼出开源多模态

    本期的 15 篇论文如下: [00:20] 🔬 SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines(SciReasoner:跨学科夯实科学推理基石) [01:00] 🧠 MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources(MMR1:基于方差感知采样与开放资源的多模态推理增强) [01:41] 📈 VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models(VCRL:面向大语言模型的方差驱动课程强化学习) [02:26] 🌳 Tree Search for LLM Agent Reinforcement Learning(基于树搜索的大语言模型智能体强化学习) [03:06] 🖼 Seedream 4.0: Toward Next-generation Multimodal Image Generation(Seedream 4.0:面向下一代多模态图像生成) [03:40] 🎯 Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets(Hunyuan3D-Omni:统一可控3D资产生成框架) [04:29] 🤖 AutoIntent: AutoML for Text Classification(AutoIntent:面向文本分类任务的自动化机器学习框架) [05:10] ⚖ TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them(TrustJudge:LLM-as-a-Judge的评分不一致性及缓解之道) [05:43] 🎢 CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning(CE-GPPO:通过梯度保留裁剪策略优化控制强化学习中的熵) [06:30] 🖼 Does FLUX Already Know How to Perform Physically Plausible Image Composition?(FLUX已掌握物理可信图像合成?) [07:31] ✂ CHARM: Control-point-based 3D Anime Hairstyle Auto-Regressive Modeling(CHARM:基于控制点的3D动漫发型自回归建模) [08:26] 🧠 Recon-Act: A Self-Evolving Multi-Agent Browser-Use System via Web Reconnaissance, Tool Generation, and Task Execution(Recon-Act:基于网络侦察、工具生成与任务执行的自我演化多智能体浏览器操作系统) [09:12] 🎮 V-GameGym: Visual Game Generation for Code Large Language Models(V-GameGym:面向代码大模型的视觉游戏生成基准) [09:49] 🗣 Interactive Recommendation Agent with Active User Commands(支持主动用户指令的交互式推荐智能体) [10:22] 🔍 BESPOKE: Benchmark for Search-Augmented Large Language Model Personalization via Diagnostic Feedback(BESPOKE:基于诊断反馈的搜索增强大模型个性化评测基准) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  7. 25 WRZ

    2025.09.25 | 视频模型零样本全能;隐式思维链省token提效

    本期的 10 篇论文如下: [00:22] 🎥 Video models are zero-shot learners and reasoners(视频模型是零样本学习者与推理者) [01:09] 🧠 SIM-CoT: Supervised Implicit Chain-of-Thought(SIM-CoT:基于监督式隐式思维链的高效推理) [01:55] 🪶 EmbeddingGemma: Powerful and Lightweight Text Representations(EmbeddingGemma:强大而轻量的文本表征模型) [02:29] 🗣 Advancing Speech Understanding in Speech-Aware Language Models with GRPO(基于GRPO提升语音感知大模型开放域理解能力) [03:06] 🌍 LLMs4All: A Review on Large Language Models for Research and Applications in Academic Disciplines(LLMs4All:面向各学科研究与应用的通用大模型综述) [03:52] 🎬 EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning(EditVerse:用上下文学习统一图像与视频编辑生成) [04:29] 🌀 Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation(Lavida-O:弹性大掩码扩散模型统一多模态理解与生成) [05:19] 🎬 PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation(PhysCtrl:基于生成式物理的可控且物理真实的视频生成框架) [05:58] 📄 Logics-Parsing Technical Report(Logics-Parsing 技术报告:基于强化学习的大模型端到端文档解析) [06:44] 🤖 On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub(关于自主编码的实证研究:GitHub上由AI代理发起的拉取请求分析) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    8 min

O programie

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

Możesz również polubić