2025.09.10 | 强化学习并行思维；视觉搜索推理扩展

本期的 14 篇论文如下：

[00:22] 🧠 Parallel-R1: Towards Parallel Thinking via Reinforcement Learning（Parallel-R1: 通过强化学习实现并行思维）

[00:50] 🔍 Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search（Mini-o3：扩展视觉搜索中的推理模式与交互轮次）

[01:15] 👁 Visual Representation Alignment for Multimodal Large Language Models（多模态大语言模型的视觉表征对齐）

[01:54] 🔄 Reconstruction Alignment Improves Unified Multimodal Models（重建对齐改进统一多模态模型）

[02:19] 🔄 UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward（UMO：通过匹配奖励扩展图像定制中的多身份一致性）

[02:46] 🧠 Curia: A Multi-Modal Foundation Model for Radiology（Curia：一种用于放射学的多模态基础模型）

[03:06] 🔮 F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions（F1：一种连接理解与生成到行动的视觉-语言-行动模型）

[03:33] 🧠 Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding（保持在最佳状态：通过能力自适应提示脚手架实现响应式推理进化）

[03:56] 🔄 Language Self-Play For Data-Free Training（语言自我博弈用于无数据训练）

[04:22] 🔍 Causal Attention with Lookahead Keys（带前瞻键的因果注意力）

[04:43] 🎨 Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference（直接将完整扩散轨迹与细粒度人类偏好对齐）

[05:07] ✅ SimpleQA Verified: A Reliable Factuality Benchmark to Measure Parametric Knowledge（SimpleQA Verified：衡量参数化知识的可靠事实性基准）

[05:30] 🚀 Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling（Q-Sched：通过量化感知调度推动少步扩散模型的边界）

[06:01] 📈 $ΔL$ Normalization: Rethink Loss Aggregation in RLVR（$ΔL$ 归一化：重新思考RLVR中的损失聚合）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

Thông Tin