本期的 13 篇论文如下:
[00:25] 🧩 IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction(IterResearch:基于马尔可夫状态重构的长程智能体再思考)
[01:16] 🏆 DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation(DRIVE:面向可验证奖励强化学习的竞赛级代码生成数据精选最佳实践)
[02:03] 🔬 The Station: An Open-World Environment for AI-Driven Discovery(“站”:面向AI驱动科学发现的开放世界环境)
[02:43] 🚀 RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services(RedOne 2.0:社交网络场景下领域大模型后训练新范式)
[03:15] 🧠 SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization(SofT-GRPO:用Gumbel重参数化软思考策略优化让离散Token强化学习望尘莫及)
[03:53] 🧭 Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs(路由流形对齐提升混合专家大语言模型的泛化能力)
[04:30] 🔍 Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads(以置信度推理:通过不确定性头高效验证大模型推理步骤)
[05:10] 🎬 MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs(MVU-Eval:面向多模态大模型的多视频理解评测基准)
[05:50] 🎨 MPJudge: Towards Perceptual Assessment of Music-Induced Paintings(MPJudge:面向音乐诱发绘画的感知评估)
[06:57] 🔄 RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization(RLoop:一种通过迭代策略初始化自我提升的强化学习框架)
[07:36] 🤖 Robot Learning from a Physical World Model(基于物理世界模型的机器人学习)
[08:21] 🛠 NURBGen: High-Fidelity Text-to-CAD Generation through LLM-Driven NURBS Modeling(NURBGen:基于大模型驱动NURBS建模的高保真文本转CAD生成)
[08:52] 🚀 SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?(SWE-fficiency:语言模型能否在真实工作负载下优化真实仓库性能?)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
Informasjon
- Program
- HyppighetDaglig
- Publisert11. november 2025 kl. 23:00 UTC
- Lengde10 min
- VurderingRen
