HuggingFace 每日AI论文速递

duan

0,0 (0)
TECHNOLOGIE
UAKT. CODZIENNIE

每天10分钟，带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新，欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版，可在小红书搜索并关注【AI速递】

11 G. TEMU

2025.10.02 | MCTS破局RLVR瓶颈；GEM开源智能体训练场

本期的 15 篇论文如下： [00:19] 🧠 DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search（DeepSearch：以蒙特卡洛树搜索破解强化学习可验证奖励瓶颈） [01:20] 🤖 GEM: A Gym for Agentic LLMs（GEM：面向智能体大模型的开放训练场） [01:57] 🧠 VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators（VLA-RFT：基于世界模拟器与验证奖励的视觉-语言-动作强化微调） [02:36] 🎒 Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation（背包强化学习：通过优化预算分配解锁大模型探索潜能） [03:06] 🎬 Code2Video: A Code-centric Paradigm for Educational Video Generation（Code2Video：面向教育视频生成的代码中心范式） [03:41] ⚙ PIPer: On-Device Environment Setup via Online Reinforcement Learning（PIPer：基于在线强化学习的设备端环境自动配置） [04:11] 🗜 ACON: Optimizing Context Compression for Long-horizon LLM Agents（ACON：面向长程LLM智能体的上下文压缩优化） [04:52] 🔍 Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls（为何Transformer学不会乘法？逆向工程揭示长程依赖陷阱） [05:22] ⚖ BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses（BiasFreeBench：面向大语言模型去偏响应评测的统一基准） [06:01] ⚡ Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution（Flash-Searcher：基于DAG并行执行的极速高效网络智能体） [06:42] 🚀 BroRL: Scaling Reinforcement Learning via Broadened Exploration（BroRL：通过拓宽探索规模来扩展强化学习） [07:25] 📊 Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum（超越对数似然：面向模型能力连续谱的监督微调概率目标） [08:02] 🎯 On Predictability of Reinforcement Learning Dynamics for Large Language Models（论大型语言模型强化学习动力学的可预测性） [08:31] 🖥 GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness（GUI-KV：面向具备时空感知的高效GUI智能体的KV缓存方案） [09:17] 🧠 Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned（训练视觉-语言过程奖励模型以实现多模态推理测试时扩展：关键洞见与经验总结）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
1 DZIEŃ TEMU

【月末特辑】9月最火AI论文 | 群体RL共享降本；SAPO让旧机也能训大模型

本期的 10 篇论文如下： [00:29] TOP1(🔥640) | 🤝 Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing（共享即关爱：基于集体RL经验共享的高效大模型后训练） [02:49] TOP2(🔥341) | 🔒 A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code（A.S.E：一个用于评估AI生成代码安全的仓库级基准） [04:59] TOP3(🔥218) | 🤖 VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model（VLA-Adapter：面向小型视觉-语言-动作模型的有效范式） [07:07] TOP4(🔥212) | 🤖 The Landscape of Agentic Reinforcement Learning for LLMs: A Survey（面向大语言模型的智能体强化学习全景：一项综述） [09:17] TOP5(🔥207) | 🤔 Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth（废话学：用深度解读无意义内容挑战大型语言模型） [11:19] TOP6(🔥183) | 🤔 Why Language Models Hallucinate（语言模型为何产生幻觉） [13:06] TOP7(🔥174) | 🧠 A Survey of Reinforcement Learning for Large Reasoning Models（大型推理模型的强化学习综述） [15:32] TOP8(🔥160) | 🎬 LongLive: Real-time Interactive Long Video Generation（LongLive：实时交互式长视频生成框架） [18:13] TOP9(🔥145) | 💡 Reverse-Engineered Reasoning for Open-Ended Generation（面向开放式生成的逆向工程推理） [20:27] TOP10(🔥140) | 🤖 A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers（科学大型语言模型综述：从数据基础到智能体前沿）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

23 min
1 DZIEŃ TEMU

2025.10.01 | 自对弈零标注训练；MCP代理深度评测

本期的 15 篇论文如下： [00:20] 🎮 Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play（Vision-Zero：基于策略化博弈自对弈的可扩展视觉语言模型自我提升） [00:59] 🔥 MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use（MCPMark：面向真实且全面的MCP应用场景的压力测试基准） [01:36] 🐣 The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain（幼龙破壳： Transformer 与大脑模型之间缺失的环节） [02:10] 🤥 TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning（TruthRL：通过强化学习激励大模型说真话） [02:55] 🌊 OceanGym: A Benchmark Environment for Underwater Embodied Agents（OceanGym：面向水下具身智能体的综合基准环境） [03:41] ⚡ DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder（DC-VideoGen：基于深度压缩视频自编码器的高效视频生成） [04:14] 🔍 Who's Your Judge? On the Detectability of LLM-Generated Judgments（谁是你的评审？大模型生成评审意见的检测性研究） [04:59] ✂ Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning（赢得剪枝豪赌：统一样本-令牌剪枝的高效监督微调新方法） [05:45] 👁 Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training（未见先识：从语言预训练解密大模型视觉先验） [06:24] 🧠 Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training（思维火花！后训练阶段推理模型中涌现的专用注意力头） [07:09] 🧪 VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications（VitaBench：面向真实场景多功能交互任务的LLM智能体评测基准） [07:42] ⚡ dParallel: Learnable Parallel Decoding for dLLMs（dParallel：面向扩散大语言模型的可学习并行解码） [08:28] 🎯 IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance（IMG：通过隐式多模态引导校准扩散模型） [09:15] 🎬 MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation（MotionRAG：基于运动检索增强的图像到视频生成） [10:12] 🐬 Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention（基于离散唇部语义与多尺度全局-局部注意力的高效视听语音分离）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
2 DNI TEMU

2025.09.30 | SLA稀疏注意力砍算力；StableToken抗噪不训模

本期的 15 篇论文如下： [00:22] ⚡ SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention（SLA：通过可微调稀疏线性注意力突破扩散Transformer的稀疏性极限） [01:05] 🗣 StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs（StableToken：一种面向韧性SpeechLLM的噪声鲁棒语义语音分词器） [01:54] 🎮 Multiplayer Nash Preference Optimization（多玩家纳什偏好优化） [02:57] 🔗 RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark（RealUnify：统一模型真的因“统一”而更强吗？综合基准揭晓答案） [03:44] 🎨 OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing（OpenGPT-4o-Image：面向高级图像生成与编辑的大规模综合数据集） [04:28] 🧠 Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR（超越探索-利用权衡：面向RLVR中LLM推理的隐状态方法） [05:05] 🧩 Visual Jigsaw Post-Training Improves MLLMs（视觉拼图后训练提升多模态大模型） [05:37] 🎬 SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer（SANA-Video：基于分块线性注意力Transformer的高效视频扩散生成模型） [06:15] 🔬 Democratizing AI scientists using ToolUniverse（用ToolUniverse普及AI科学家） [06:59] 🧠 When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance（推理何时真正奏效？对推理贡献度的受控研究） [07:31] 📊 GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts（GSM8K-V：视觉语言模型能否解决视觉语境下的小学数学应用题？） [08:04] 🖼 EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling（EditScore：借助高保真奖励建模解锁图像编辑在线强化学习） [08:54] 🚀 SparseD: Sparse Attention for Diffusion Language Models（SparseD：面向扩散语言模型的稀疏注意力机制） [09:40] 🎛 EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering（EasySteer：高性能可扩展LLM推理控制统一框架） [10:32] 🧠 Towards Personalized Deep Research: Benchmarks and Evaluations（迈向个性化深度研究：基准与评估）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12 min
3 DNI TEMU

2025.09.29 | 实时长视频边聊边播；分位数基线稳控推理熵

本期的 15 篇论文如下： [00:20] 🎬 LongLive: Real-time Interactive Long Video Generation（LongLive：实时交互式长视频生成框架） [00:56] 🎯 Quantile Advantage Estimation for Entropy-Safe Reasoning（用于熵安全推理的分位数优势估计） [01:34] 📄 MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing（MinerU2.5：面向高效高分辨率文档解析的解耦视觉-语言模型） [02:11] 🧠 EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning（EPO：面向LLM智能体强化学习的熵正则策略优化） [03:08] 🧠 Variational Reasoning for Language Models（语言模型的变分推理框架） [03:37] 💬 Language Models Can Learn from Verbal Feedback Without Scalar Rewards（无需标量奖励，语言模型也能从语言反馈中学习） [04:32] 🔍 ReviewScore: Misinformed Peer Review Detection with Large Language Models（ReviewScore：用大模型揪出“跑偏”的同行评审） [05:12] 🎯 CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning（CapRL：用强化学习激发稠密图像描述潜能） [05:49] 🪄 MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning（MesaTask：面向任务驱动的桌面场景生成与3D空间推理） [06:32] 🎯 No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping（零方差提示不浪费：基于熵引导优势塑造的LLM强化学习新范式） [07:14] 🗣 VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing（VoiceAssistant-Eval：横跨听、说、看的AI助手基准测评） [07:58] 🧭 UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios（UltraHorizon：在长周期场景中评估智能体能力的基准） [08:29] 🖼 LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer（LucidFlux：无需文字描述的大规模扩散Transformer通用图像修复） [09:16] 🌐 WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning（WebGen-Agent：借助多级反馈与步骤级强化学习提升交互式网页生成） [09:49] 🔄 SPARK: Synergistic Policy And Reward Co-Evolving Framework（SPARK：策略与奖励协同演化的强化学习框架）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
5 DNI TEMU

【周末特辑】9月第5周最火AI论文 | Qwen3-Omni开源称王; 锁定视觉训解码，Baseer刷新阿文OCR；

本期的 5 篇论文如下： [00:38] TOP1(🔥116) | 📜 Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR（Baseer：面向阿拉伯文档OCR的视觉-语言模型） [02:43] TOP2(🔥113) | 🌐 Qwen3-Omni Technical Report（Qwen3-Omni技术报告：首个无性能损耗的全模态大模型） [05:23] TOP3(🔥112) | 🗺 RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation（RPG：用于统一可扩展代码库生成的仓库规划图） [07:45] TOP4(🔥104) | 📈 VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models（VCRL：面向大语言模型的方差驱动课程强化学习） [10:05] TOP5(🔥89) | 🚀 LIMI: Less is More for Agency（LIMI：少即是多，打造AI智能体）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13 min
6 DNI TEMU

2025.09.26 | SciReasoner八项全能；MMR1模糊区炼出开源多模态

本期的 15 篇论文如下： [00:20] 🔬 SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines（SciReasoner：跨学科夯实科学推理基石） [01:00] 🧠 MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources（MMR1：基于方差感知采样与开放资源的多模态推理增强） [01:41] 📈 VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models（VCRL：面向大语言模型的方差驱动课程强化学习） [02:26] 🌳 Tree Search for LLM Agent Reinforcement Learning（基于树搜索的大语言模型智能体强化学习） [03:06] 🖼 Seedream 4.0: Toward Next-generation Multimodal Image Generation（Seedream 4.0：面向下一代多模态图像生成） [03:40] 🎯 Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets（Hunyuan3D-Omni：统一可控3D资产生成框架） [04:29] 🤖 AutoIntent: AutoML for Text Classification（AutoIntent：面向文本分类任务的自动化机器学习框架） [05:10] ⚖ TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them（TrustJudge：LLM-as-a-Judge的评分不一致性及缓解之道） [05:43] 🎢 CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning（CE-GPPO：通过梯度保留裁剪策略优化控制强化学习中的熵） [06:30] 🖼 Does FLUX Already Know How to Perform Physically Plausible Image Composition?（FLUX已掌握物理可信图像合成？） [07:31] ✂ CHARM: Control-point-based 3D Anime Hairstyle Auto-Regressive Modeling（CHARM：基于控制点的3D动漫发型自回归建模） [08:26] 🧠 Recon-Act: A Self-Evolving Multi-Agent Browser-Use System via Web Reconnaissance, Tool Generation, and Task Execution（Recon-Act：基于网络侦察、工具生成与任务执行的自我演化多智能体浏览器操作系统） [09:12] 🎮 V-GameGym: Visual Game Generation for Code Large Language Models（V-GameGym：面向代码大模型的视觉游戏生成基准） [09:49] 🗣 Interactive Recommendation Agent with Active User Commands（支持主动用户指令的交互式推荐智能体） [10:22] 🔍 BESPOKE: Benchmark for Search-Augmented Large Language Model Personalization via Diagnostic Feedback（BESPOKE：基于诊断反馈的搜索增强大模型个性化评测基准）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
25 WRZ

2025.09.25 | 视频模型零样本全能；隐式思维链省token提效

本期的 10 篇论文如下： [00:22] 🎥 Video models are zero-shot learners and reasoners（视频模型是零样本学习者与推理者） [01:09] 🧠 SIM-CoT: Supervised Implicit Chain-of-Thought（SIM-CoT：基于监督式隐式思维链的高效推理） [01:55] 🪶 EmbeddingGemma: Powerful and Lightweight Text Representations（EmbeddingGemma：强大而轻量的文本表征模型） [02:29] 🗣 Advancing Speech Understanding in Speech-Aware Language Models with GRPO（基于GRPO提升语音感知大模型开放域理解能力） [03:06] 🌍 LLMs4All: A Review on Large Language Models for Research and Applications in Academic Disciplines（LLMs4All：面向各学科研究与应用的通用大模型综述） [03:52] 🎬 EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning（EditVerse：用上下文学习统一图像与视频编辑生成） [04:29] 🌀 Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation（Lavida-O：弹性大掩码扩散模型统一多模态理解与生成） [05:19] 🎬 PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation（PhysCtrl：基于生成式物理的可控且物理真实的视频生成框架） [05:58] 📄 Logics-Parsing Technical Report（Logics-Parsing 技术报告：基于强化学习的大模型端到端文档解析） [06:44] 🤖 On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub（关于自主编码的实证研究：GitHub上由AI代理发起的拉取请求分析）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

8 min

Wyświetl wszystko (400)

Twórca

duan
Lata aktywności

2024 - 2025
Odcinki

400
Klasyfikacja

Dla wszystkich
Witryna programu

HuggingFace 每日AI论文速递

Technologie

Technologie

Uakt. co dwa tyg.
Technologie

Technologie

Uakt. co dwa tyg.
Technologie

Technologie

Uakt. co tydzień
Technologie

Technologie

Uakt. co dwa tyg.
Biznes

Biznes

Uakt. codziennie
Społeczeństwo i kultura

Społeczeństwo i kultura

Uakt. co tydzień
Inwestycje

Inwestycje

Uakt. co tydzień

HuggingFace 每日AI论文速递

2025.10.02 | MCTS破局RLVR瓶颈；GEM开源智能体训练场

【月末特辑】9月最火AI论文 | 群体RL共享降本；SAPO让旧机也能训大模型

2025.10.01 | 自对弈零标注训练；MCP代理深度评测

2025.09.30 | SLA稀疏注意力砍算力；StableToken抗噪不训模

2025.09.29 | 实时长视频边聊边播；分位数基线稳控推理熵

【周末特辑】9月第5周最火AI论文 | Qwen3-Omni开源称王; 锁定视觉训解码，Baseer刷新阿文OCR；

2025.09.26 | SciReasoner八项全能；MMR1模糊区炼出开源多模态

2025.09.25 | 视频模型零样本全能；隐式思维链省token提效

O programie

Informacje

Możesz również polubić

HuggingFace 每日AI论文速递

Odcinki

2025.10.02 | MCTS破局RLVR瓶颈；GEM开源智能体训练场

【月末特辑】9月最火AI论文 | 群体RL共享降本；SAPO让旧机也能训大模型

2025.10.01 | 自对弈零标注训练；MCP代理深度评测

2025.09.30 | SLA稀疏注意力砍算力；StableToken抗噪不训模

2025.09.29 | 实时长视频边聊边播；分位数基线稳控推理熵

【周末特辑】9月第5周最火AI论文 | Qwen3-Omni开源称王; 锁定视觉训解码，Baseer刷新阿文OCR；

2025.09.26 | SciReasoner八项全能；MMR1模糊区炼出开源多模态

2025.09.25 | 视频模型零样本全能；隐式思维链省token提效

O programie

Informacje

Możesz również polubić