HuggingFace 每日AI论文速递

2025.10.10 | 早期经验的Agent Learning;图文交错反思链跃升至24.9%

本期的 14 篇论文如下:

[00:16] 🌱 Agent Learning via Early Experience(基于早期经验的主体学习)

[00:50] 🧠 MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization(MM-HELIX:以整体平台与自适应混合策略优化激发多模态长链反思推理)

[01:42] 🧪 From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning(从“是什么”到“为什么”:面向循证化学反应条件推理的多智能体系统)

[02:19] 🎬 UniVideo: Unified Understanding, Generation, and Editing for Videos(UniVideo:统一理解、生成与编辑视频的多模态框架)

[03:01] 🧠 When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs(当思想邂逅事实:面向长上下文语言模型的可复用推理)

[03:43] 🧠 Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning(元认知增强推理模型:自对齐强化学习)

[04:25] 🧠 MemMamba: Rethinking Memory Patterns in State Space Model(MemMamba:重新思考状态空间模型中的记忆模式)

[05:17] 🛡 The Alignment Waltz: Jointly Training Agents to Collaborate for Safety(对齐圆舞曲:联合训练智能体协同守护安全)

[05:53] 🎯 Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense(混合强化:奖励稀疏时,密集信号更胜一筹)

[06:40] 🧪 NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents(NewtonBench:评测大模型智能体在通用科学定律发现中的基准)

[07:17] 🪚 DeepPrune: Parallel Scaling without Inter-trace Redundancy(DeepPrune:并行扩展中消除跨路径冗余的高效推理框架)

[07:54] 🚀 Training-Free Group Relative Policy Optimization(免训练群组相对策略优化)

[08:24] 🪄 ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation(ARTDECO:面向高效高保真即时三维重建的结构化场景表征)

[08:55] 🤥 LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions(大模型在欺骗性样本与偏见人机交互中意外学会欺骗:不诚实行为的新兴错位)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递