HuggingFace 每日AI论文速递

duan

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. HÁ 8 H

    2025.10.22 | LightMem压缩记忆千倍提速12倍;闭环世界模型微调8万数据反超巨兽

    本期的 14 篇论文如下: [00:19] 🧠 LightMem: Lightweight and Efficient Memory-Augmented Generation(LightMem:轻量高效的记忆增强生成框架) [00:55] 🌀 World-in-World: World Models in a Closed-Loop World(世界中的世界:闭环环境下的世界模型) [01:44] 🖼 UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation(UniGenBench++:面向文本到图像生成的统一语义评测基准) [02:29] 🧪 Chem-R: Learning to Reason as a Chemist(Chem-R:像化学家一样学习推理) [03:10] 🎬 MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation(MoGA:面向端到端长视频生成的分组混合注意力机制) [03:52] 🔍 Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs(任意区域皆可掌握:面向多模态大模型的精准上下文像素级理解) [04:49] 🎬 IF-VidCap: Can Video Caption Models Follow Instructions?(IF-VidCap:视频字幕模型能听懂指令吗?) [05:35] 🚀 Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model(万亿参数思维模型的强化学习扩展之路) [06:21] 🎬 MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues(MT-Video-Bench:面向多轮对话评估多模态大模型视频理解能力的综合基准) [07:12] 🧠 ssToken: Self-modulated and Semantic-aware Token Selection for LLM Fine-tuning(ssToken:面向大模型微调的自调制语义感知Token筛选方法) [07:43] 🎬 MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models(MUG-V 10B:面向大视频生成模型的高效训练流水线) [08:18] 🎯 ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder(ProCLIP:基于大语言模型嵌入器的渐进式视觉-语言对齐方法) [09:29] 🎬 UltraGen: High-Resolution Video Generation with Hierarchical Attention(UltraGen:基于分层注意力的原生高分辨率视频生成) [10:15] 🔄 DSI-Bench: A Benchmark for Dynamic Spatial Intelligence(DSI-Bench:动态空间智能评测基准) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11min
  2. HÁ 1 DIA

    2025.10.21 | 模型不懂光影折射;小模型也能写报告

    本期的 13 篇论文如下: [00:21] 🪞 PICABench: How Far Are We from Physically Realistic Image Editing?(PICABench:我们离物理真实的图像编辑还有多远?) [01:04] 🤖 DeepAnalyze: Agentic Large Language Models for Autonomous Data Science(DeepAnalyze:面向自主数据科学的智能体大模型) [01:50] 🗜 Glyph: Scaling Context Windows via Visual-Text Compression(Glyph:通过视觉-文本压缩扩展上下文窗口长度) [02:23] 🔍 Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation(面向通用检索增强生成的混合模态检索研究) [03:10] 🔗 When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling(何时集成:定位Token级位置实现稳定高效的大模型集成) [04:09] 🎯 Annotation-Efficient Universal Honesty Alignment(注释高效型通用诚实对齐) [04:49] 🖌 Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback(Uniworld-V2:借助扩散负感知微调与MLLM隐式反馈强化图像编辑) [05:46] 👁 RL makes MLLMs see better than SFT(强化学习让多模态大模型看得比监督微调更清楚) [06:33] 🚀 Visual Autoregressive Models Beat Diffusion Models on Inference Time Scaling(视觉自回归模型在推理时扩展上击败扩散模型) [07:09] 🎨 ConsistEdit: Highly Consistent and Precise Training-free Visual Editing(ConsistEdit:面向MM-DiT的高一致免训练视觉编辑) [07:56] 🔄 Deep Self-Evolving Reasoning(深度自演化推理) [08:22] 🧠 Beyond Pipelines: A Survey of the Paradigm Shift toward Model-Native Agentic AI(超越流水线:模型原生智能体AI范式转移综述) [09:07] 🔮 Chronos-2: From Univariate to Universal Forecasting(Chronos-2:从单变量到通用预测) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    10min
  3. HÁ 2 DIAS

    2025.10.20 | RPC剪枝提速保准;OmniVinci小数据跨模态称王

    本期的 15 篇论文如下: [00:20] 🧠 A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning(大模型推理中内部概率与自洽性桥接的理论研究) [01:04] 🌐 OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM(OmniVinci:面向全模态理解大模型的架构与数据增强) [01:44] 🎬 Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset(用百万级合成数据集放大指令式视频编辑) [02:28] ✂ NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks(NANO3D:无需训练与掩码的高效3D编辑新方法) [03:05] 🛰 Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery(Skyfall-GS:仅凭卫星影像合成沉浸式3D城市场景) [03:41] ⚠ Emergent Misalignment via In-Context Learning: Narrow in-context examples can produce broadly misaligned LLMs(情境学习中的突发错位:狭窄示例可让大模型广泛失准) [04:18] 🧬 Latent Diffusion Model without Variational Autoencoder(无需变分自编码器的潜在扩散模型) [04:52] 📸 LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal(LightsOut:基于扩散的延展补全提升镜头眩光去除) [05:30] 🧠 MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning(MorphoBench:随模型推理能力自适应难度的评测基准) [06:14] 🧠 A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning(A²FM:面向工具感知混合推理的自适应智能体基础模型) [06:56] 🗣 Language Models Model Language(语言模型即语言本身) [07:36] 🖼 BLIP3o-NEXT: Next Frontier of Native Image Generation(BLIP3o-NEXT:原生图像生成的下一个前沿) [08:30] 🌐 Paper2Web: Let's Make Your Paper Alive!(Paper2Web:让你的论文“活”起来!) [09:12] 🔬 Foundation Models for Scientific Discovery: From Paradigm Enhancement to Paradigm Transition(面向科学发现的基础模型:从范式增强到范式跃迁) [09:55] 🔍 Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents(探索以进化:通过主动在线探索扩展深度研究智能体的聚合逻辑) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11min
  4. HÁ 5 DIAS

    2025.10.17 | AI眼镜预判式服务;视频生成补想象力

    本期的 11 篇论文如下: [00:25] 👓 AI for Service: Proactive Assistance with AI Glasses(AI服务:AI眼镜的主动式协助) [01:06] 🎬 ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints(ImagerySearch:面向超越语义依赖约束的自适应测试时搜索视频生成) [01:43] 🎯 LaSeR: Reinforcement Learning with Last-Token Self-Rewarding(LaSeR:基于末词元自奖励的强化学习) [02:33] 🧩 TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar(TokDrift:当大模型用子词而代码用语法时) [03:35] 🧠 Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents(基于信息增益的策略优化:一种简单有效的多轮LLM智能体训练方法) [04:04] ⚡ Attention Is All You Need for KV Cache in Diffusion LLMs(扩散式大语言模型只需注意力即可搞定KV缓存) [04:45] 🤥 When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA(当模型撒谎时我们反而学到东西:用PsiloQA实现跨语言细粒度幻觉检测) [05:33] 📄 PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model(PaddleOCR-VL:以9亿参数超轻量多模态模型刷新多语言文档解析性能) [06:13] 🧠 VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning(VR-Thinker:通过“边看边想”推理提升视频奖励模型) [06:52] 📐 MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning(MathCanvas:面向多模态数学推理的内生视觉思维链) [07:39] 🧠 COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with Thought Processes(COIG-Writer:高质量中文创意写作数据集,附带思维过程) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    9min
  5. HÁ 6 DIAS

    2025.10.16 | UniMoE一统语音音乐;注意力图点亮大模型推理

    本期的 15 篇论文如下: [00:21] 🎧 UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE(UniMoE-Audio:基于动态容量MoE的统一语音与音乐生成模型) [00:57] 🔍 Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization(注意力照亮大模型推理:预规划-锚定节奏实现细粒度策略优化) [01:38] ⚡ FlashWorld: High-quality 3D Scene Generation within Seconds(FlashWorld:秒级高质量3D场景生成) [02:06] 🐝 Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs(Bee:高质量语料与全栈套件解锁完全开源多模态大模型) [02:37] 🗣 InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue(InteractiveOmni:面向音视频多轮对话的统一全模态模型) [03:24] 🌍 PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning(PhysMaster:通过强化学习掌握视频生成的物理表征) [04:00] 🧪 LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models(LIBERO-Plus:视觉-语言-动作模型鲁棒性深度剖析) [04:43] 🚗 CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving(CVD-STORM:面向自动驾驶的跨视角视频扩散时空重建模型) [05:21] 🔍 Generative Universal Verifier as Multimodal Meta-Reasoner(生成式通用验证器:多模态元推理的反思引擎) [06:07] ⚖ ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs(ParallelBench:探明扩散式大模型并行解码的取舍) [06:43] 🎞 Trace Anything: Representing Any Video in 4D via Trajectory Fields(任意视频4D轨迹场表示:一次前馈即可还原每像素连续时空路径) [07:27] 🌍 Reasoning in Space via Grounding in the World(基于世界锚定的空间推理) [07:54] 🧠 The Role of Computing Resources in Publishing Foundation Model Research(计算资源在基础模型研究发表中的角色) [08:28] ⚖ UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning(UniME-V2:用多模态大模型当裁判,打造通用多模态表征) [09:05] 🤖 InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy(InternVLA-M1:面向通用机器人策略的空间引导视觉-语言-动作框架) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    10min
  6. 15 DE OUT.

    2025.10.15 | 像素级自监督ViT刷新生成基准;多智能体评测网文翻译新标尺

    本期的 14 篇论文如下: [00:20] 🖼 Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training(通过自监督预训练推进端到端像素空间生成建模) [00:53] 📚 DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation(DITING:面向网络小说翻译评测的多智能体基准框架) [01:41] 🌐 Scaling Language-Centric Omnimodal Representation Learning(以语言为中心的跨模态表征扩展学习) [02:29] 🎯 Detect Anything via Next Point Prediction(通过下一点预测检测万物) [03:02] ⚡ FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution(FlashVSR:迈向实时扩散式流媒体视频超分辨率) [03:40] 🎯 Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models(时间对齐引导:扩散模型中的流形采样) [04:16] 🧠 Dr.LLM: Dynamic Layer Routing in LLMs(Dr.LLM:大模型中的动态层级路由) [05:03] 🎯 Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model(空间强迫:面向视觉-语言-动作模型的隐式空间表征对齐) [05:50] 🤖 ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning(ERA:借助具身先验学习与在线强化学习将视觉-语言模型转化为具身智能体) [06:35] 🤖 Robot Learning: A Tutorial(机器人学习教程:从强化学习到多任务通用模型) [07:27] 🔄 SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models(SRUM:面向统一多模态模型的细粒度自奖励机制) [08:01] 🧠 Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models(面向扩散大语言模型的边界引导策略优化:内存高效的强化学习) [09:06] 🖼 UniFusion: Vision-Language Model as Unified Encoder in Image Generation(UniFusion:将视觉-语言模型统一作为图像生成的编码器) [09:43] 🧠 Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks(记忆即行动:面向长程智能体任务的自主上下文策展) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11min
  7. 14 DE OUT.

    2025.10.14 | 量化误差变奖励,单卡训32B;面向多模态大模型的音视频评测基准

    本期的 15 篇论文如下: [00:23] 🚀 QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs(QeRL:超越效率——面向大语言模型的量化增强强化学习) [01:22] 🧠 Diffusion Transformers with Representation Autoencoders(基于表示自编码器的扩散Transformer) [02:12] 🎬 OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs(OmniVideoBench:面向全向多模态大模型的音视频协同理解评测基准) [02:41] 🔄 Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States(潜变量精化解码:通过精化信念状态增强基于扩散的语言模型) [03:18] 🌊 RLFR: Extending Reinforcement Learning for LLMs with Flow Environment(RLFR:基于潜流环境扩展大模型强化学习) [04:11] 🔍 Spotlight on Token Perception for Multimodal Reinforcement Learning(多模态强化学习中token感知的光束聚焦) [04:50] 🎬 AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration(AVoCaDO:面向时序编排的音视频联合字幕生成器) [05:25] 🌐 DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training(DiT360:混合训练视角与全景数据的高保真全景图像生成) [05:56] 🧠 Demystifying Reinforcement Learning in Agentic Reasoning(揭开强化学习在智能体推理中的神秘面纱) [06:51] 🧮 Making Mathematical Reasoning Adaptive(让数学推理具备自适应性) [07:26] 🛡 Building a Foundational Guardrail for General Agentic Systems via Synthetic Data(面向通用智能体的基础护栏:基于合成数据的预执行安全框架) [08:05] 🧠 ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems(ACADREASON:用学术研究问题探索推理模型的极限) [08:43] 🎨 InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models(InternSVG:用多模态大模型统一搞定SVG理解、编辑与生成) [09:23] 🧾 FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs(FinAuditing:面向LLM评估的财务分类多文档基准) [10:09] 🧠 GIR-Bench: Versatile Benchmark for Generating Images with Reasoning(GIR-Bench:面向推理图像生成的多功能基准) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11min

Classificações e avaliações

5
de 5
2 avaliações

Sobre

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

Você também pode gostar de