HuggingFace 每日AI论文速递

2025.07.23 | TIM模型突破LLM上下文限制;Step-Audio 2提升多模态语音对话。

本期的 15 篇论文如下:

[00:24] ♾ Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning(超越上下文限制:用于长程推理的潜意识线索)

[01:05] 🔊 Step-Audio 2 Technical Report(Step-Audio 2 技术报告)

[01:41] 🚀 MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning(MegaScience:推动科学推理后训练数据集的前沿)

[02:23] ⚡ Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers(上采样重要区域:用于加速扩散Transformer的区域自适应潜在采样)

[03:17] 🧠 Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning(面向视觉-语言慢思考推理的半离线策略强化学习)

[03:56] 🧩 Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning(Zebra-CoT:一个用于交错式视觉语言推理的数据集)

[04:36] 🤔 ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning(ThinkAct:基于强化视觉潜在规划的视觉-语言-动作推理)

[05:03] 🤖 Experience is the Best Teacher: Grounding VLMs for Robotics through Self-Generated Memory(经验是最好的老师:通过自生成记忆将视觉语言模型应用于机器人领域)

[05:56] ✨ HOComp: Interaction-Aware Human-Object Composition(HOComp:交互感知的人物-物体合成)

[06:54] 🧐 RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback(RefCritic:利用精炼反馈训练长思维链评论模型)

[07:36] 🚀 Task-Specific Zero-shot Quantization-Aware Training for Object Detection(面向目标检测的任务特异性零样本量化感知训练)

[08:06] 🔍 SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search(SPAR: 基于LLM代理的学术论文检索,增强学术搜索能力)

[08:35] ⚠ Does More Inference-Time Compute Really Help Robustness?(推理时计算量增加真的有助于提升鲁棒性吗?)

[09:16] 🧭 Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning(概念消融微调:引导域外泛化)

[10:02] 🧠 ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting(ObjectGS:基于高斯泼溅的对象感知场景重建与场景理解)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递