本期的 14 篇论文如下:
[00:19] 🧠 LightMem: Lightweight and Efficient Memory-Augmented Generation(LightMem:轻量高效的记忆增强生成框架)
[00:55] 🌀 World-in-World: World Models in a Closed-Loop World(世界中的世界:闭环环境下的世界模型)
[01:44] 🖼 UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation(UniGenBench++:面向文本到图像生成的统一语义评测基准)
[02:29] 🧪 Chem-R: Learning to Reason as a Chemist(Chem-R:像化学家一样学习推理)
[03:10] 🎬 MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation(MoGA:面向端到端长视频生成的分组混合注意力机制)
[03:52] 🔍 Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs(任意区域皆可掌握:面向多模态大模型的精准上下文像素级理解)
[04:49] 🎬 IF-VidCap: Can Video Caption Models Follow Instructions?(IF-VidCap:视频字幕模型能听懂指令吗?)
[05:35] 🚀 Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model(万亿参数思维模型的强化学习扩展之路)
[06:21] 🎬 MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues(MT-Video-Bench:面向多轮对话评估多模态大模型视频理解能力的综合基准)
[07:12] 🧠 ssToken: Self-modulated and Semantic-aware Token Selection for LLM Fine-tuning(ssToken:面向大模型微调的自调制语义感知Token筛选方法)
[07:43] 🎬 MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models(MUG-V 10B:面向大视频生成模型的高效训练流水线)
[08:18] 🎯 ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder(ProCLIP:基于大语言模型嵌入器的渐进式视觉-语言对齐方法)
[09:29] 🎬 UltraGen: High-Resolution Video Generation with Hierarchical Attention(UltraGen:基于分层注意力的原生高分辨率视频生成)
[10:15] 🔄 DSI-Bench: A Benchmark for Dynamic Spatial Intelligence(DSI-Bench:动态空间智能评测基准)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
Información
- Programa
- FrecuenciaCada día
- Publicado22 de octubre de 2025, 11:00 p.m. UTC
- Duración11 min
- ClasificaciónApto