HuggingFace 每日AI论文速递

2025.10.22 | LightMem压缩记忆千倍提速12倍;闭环世界模型微调8万数据反超巨兽

本期的 14 篇论文如下:

[00:19] 🧠 LightMem: Lightweight and Efficient Memory-Augmented Generation(LightMem:轻量高效的记忆增强生成框架)

[00:55] 🌀 World-in-World: World Models in a Closed-Loop World(世界中的世界:闭环环境下的世界模型)

[01:44] 🖼 UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation(UniGenBench++:面向文本到图像生成的统一语义评测基准)

[02:29] 🧪 Chem-R: Learning to Reason as a Chemist(Chem-R:像化学家一样学习推理)

[03:10] 🎬 MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation(MoGA:面向端到端长视频生成的分组混合注意力机制)

[03:52] 🔍 Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs(任意区域皆可掌握:面向多模态大模型的精准上下文像素级理解)

[04:49] 🎬 IF-VidCap: Can Video Caption Models Follow Instructions?(IF-VidCap:视频字幕模型能听懂指令吗?)

[05:35] 🚀 Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model(万亿参数思维模型的强化学习扩展之路)

[06:21] 🎬 MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues(MT-Video-Bench:面向多轮对话评估多模态大模型视频理解能力的综合基准)

[07:12] 🧠 ssToken: Self-modulated and Semantic-aware Token Selection for LLM Fine-tuning(ssToken:面向大模型微调的自调制语义感知Token筛选方法)

[07:43] 🎬 MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models(MUG-V 10B:面向大视频生成模型的高效训练流水线)

[08:18] 🎯 ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder(ProCLIP:基于大语言模型嵌入器的渐进式视觉-语言对齐方法)

[09:29] 🎬 UltraGen: High-Resolution Video Generation with Hierarchical Attention(UltraGen:基于分层注意力的原生高分辨率视频生成)

[10:15] 🔄 DSI-Bench: A Benchmark for Dynamic Spatial Intelligence(DSI-Bench:动态空间智能评测基准)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递