2025.06.06 | 智能助手加速ComfyUI开发；单步视频修复提升效率。

本期的 15 篇论文如下：

[00:24] 🤖 ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development（ComfyUI-Copilot：用于自动化工作流开发的智能助手）

[00:59] 🎬 SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training（SeedVR2：基于扩散对抗后训练的单步视频修复）

[01:39] 🤖 RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics（RoboRefer：面向机器人视觉-语言模型中基于推理的空间指代）

[02:26] 🚄 Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts（对角批量处理解锁循环记忆Transformer在长文本中的并行性）

[03:08] 🧠 Video World Models with Long-term Spatial Memory（基于长期空间记忆的视频世界模型）

[03:46] 🌐 Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights（Surfer-H：基于开放权重的低成本高效能Web代理）

[04:32] ⚛ VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models（VideoREPA：通过与基础模型的关系对齐学习物理知识以用于视频生成）

[05:17] 📚 Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models（Qwen3 Embedding：通过基础模型推进文本嵌入和重排序）

[05:55] 🔢 AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs（AV-Reasoner：提升多模态大型语言模型线索引导的音视频计数能力及构建基准）

[06:38] 🌌 Aligning Latent Spaces with Flow Priors（利用流动先验对齐隐空间）

[07:22] 📚 The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text（Common Pile v0.1：一个包含公共领域和开放许可文本的8TB数据集）

[08:15] 🧠 Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations（展开空间认知：评估视觉模拟上的多模态模型）

[09:06] 🧠 StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs（StreamBP：LLM长序列训练的内存高效精确反向传播）

[09:48] 🚀 Inference-Time Hyper-Scaling with KV Cache Compression（基于KV缓存压缩的推理时超 масштабирование）

[10:30] 👁 SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs（SparseMM：多模态大型语言模型中视觉概念响应涌现的 Head 稀疏性）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

資訊