本期的 15 篇论文如下:
[00:24] 📊 TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning(TaTToo:面向表格推理测试时扩展的“工具落地思维”过程奖励模型)
[00:57] 🔍 Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs(Fathom-DeepResearch:解锁小模型长程信息检索与综合的钥匙)
[01:39] 🚀 Fast-dLLM v2: Efficient Block-Diffusion LLM(Fast-dLLM v2:高效的块扩散大语言模型)
[02:30] 🧑 CoDA: Coding LM via Diffusion Adaptation(CoDA:基于扩散适配的轻量级代码生成模型)
[03:01] 🧩 Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning(规模化代码辅助思维链与指令以增强模型推理)
[03:52] ⚖ ASPO: Asymmetric Importance Sampling Policy Optimization(ASPO:非对称重要性采样策略优化)
[04:34] 🔗 Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context(混合机制:语言模型如何在上下文中检索绑定实体)
[05:15] 🧠 AInstein: Assessing the Feasibility of AI-Generated Approaches to Research Problems(AInstein:评估AI生成科研方案可行性的研究框架)
[05:51] 🪂 Refusal Falls off a Cliff: How Safety Alignment Fails in Reasoning?(拒绝断崖:安全对齐在推理中为何崩塌)
[06:35] 🌍 HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video(HoloScene:单视频生成可交互3D仿真世界)
[07:22] ⚡ TensorBLEU: Vectorized GPU-based BLEU Score Implementation for Per-Sentence In-Training Evaluation(TensorBLEU:面向逐句训练评估的向量化GPU加速BLEU分数实现)
[08:09] 🎯 Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization(边缘自适应DPO:利用奖励模型实现偏好优化的粒度控制)
[09:00] 🩺 Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation(基于多模态大语言模型的离散扩散模型实现统一医学多模态生成)
[09:46] 🧠 MixReasoning: Switching Modes to Think(混合推理:动态切换思考模式)
[10:20] ⚡ LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation(LightCache:面向视频生成的内存高效、无需训练的加速方法)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
Thông Tin
- Chương trình
- Tần suấtHằng ngày
- Đã xuất bảnlúc 23:00 UTC 8 tháng 10, 2025
- Thời lượng11 phút
- Xếp hạngSạch