2025.06.19 | SEKAI数据集提升视频生成；原型推理增强LLM泛化能力。

本期的 15 篇论文如下：

[00:22] 🌍 Sekai: A Video Dataset towards World Exploration（Sekai：一个面向世界探索的视频数据集）

[01:02] 💡 ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs（原型推理：作为大型语言模型中通用推理基础的原型）

[01:43] 💡 GenRecal: Generation after Recalibration from Large to Small Vision-Language Models（GenRecal：从大型到小型视觉-语言模型的重校准后生成）

[02:24] 🗣 BUT System for the MLC-SLM Challenge（用于MLC-SLM挑战赛的BUT系统）

[03:10] 🤖 Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence（具身Web智能体：连接物理与数字领域，实现集成智能）

[03:57] 💡 Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation（自由形式生成中基于语义感知的开放式R1训练奖励）

[04:43] 🔬 SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification（SciVer：评估多模态科学声明验证中的基础模型）

[05:26] 🚀 Truncated Proximal Policy Optimization（截断近端策略优化）

[06:04] 🖼 PictSure: Pretraining Embeddings Matters for In-Context Learning Image Classifiers（PictSure：预训练嵌入对上下文学习图像分类器的影响）

[06:37] 🖼 CoMemo: LVLMs Need Image Context with Image Memory（CoMemo：LVLM需要带有图像记忆的图像上下文）

[07:21] 🤖 SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence（群体智能代理：迈向基于群体智能的全自动代理系统生成）

[08:01] 🧠 MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models（MoTE：面向内存高效的大型多模态模型的三元专家混合）

[08:45] 🛡 OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents（OS-Harm：衡量计算机使用Agent安全性的基准）

[09:34] 🏞 ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies（ImmerseGen：基于代理引导的、使用Alpha纹理代理的沉浸式世界生成）

[10:09] 🤝 FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models（FedNano：面向预训练多模态大语言模型的轻量级联邦调优）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

정보