HuggingFace 每日AI论文速递

duan

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. 42分钟前

    2025.10.27 | DeepAgent一步推理+ToolPO;视频即提示DiT秒控百种语义

    本期的 15 篇论文如下: [00:27] 🧠 DeepAgent: A General Reasoning Agent with Scalable Toolsets(DeepAgent:具备可扩展工具集的通用推理智能体) [01:01] 🎬 Video-As-Prompt: Unified Semantic Control for Video Generation(视频即提示:统一语义控制的视频生成新范式) [01:35] 🔧 From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model(从去噪到精修:视觉-语言扩散模型的纠错式生成框架) [02:14] 🧩 Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation(逐段采样、分块优化:面向文本到图像生成的块级GRPO方法) [02:51] 🧠 A Definition of AGI(AGI的量化定义) [03:23] 🧩 Sparser Block-Sparse Attention via Token Permutation(基于Token置换的稀疏块稀疏注意力机制) [04:14] 🧭 UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning(UI-Ins:以“指令即推理”多视角增强GUI定位) [04:57] 🧠 Reasoning with Sampling: Your Base Model is Smarter Than You Think(基于采样的推理:你的基础模型比你想象的更聪明) [05:30] 🧠 RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging(RECALL:基于表示对齐的层级模型融合缓解大模型灾难性遗忘) [06:08] 📐 Visual Diffusion Models are Geometric Solvers(视觉扩散模型是几何求解器) [06:56] 🌍 WorldGrow: Generating Infinite 3D World(无限3D世界生成:WorldGrow) [07:35] 🎬 RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling(RAPO++:面向文生视频的跨阶段提示优化——数据对齐与测试时缩放) [08:14] 🔗 Model Merging with Functional Dual Anchors(基于功能双锚点的模型融合方法) [08:49] 🧭 Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs(揭示VideoLLM隐藏信息通路:视频语言模型内部流动图谱) [09:34] 📊 Document Understanding, Measurement, and Manipulation Using Category Theory(基于范畴论的文档理解、度量与操控) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    10 分钟
  2. 3天前

    2025.10.24 | AdaSPEC挑40% token提速两成;AutoPage 10美分生成交互网页

    本期的 15 篇论文如下: [00:23] 🎯 AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders(AdaSPEC:面向高效推测解码的选择性知识蒸馏) [00:57] 🤖 Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1(低成本人机协作论文一键成页:低于0.1美元) [01:35] 🔍 Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence(Open-o3视频:显式时空证据支撑的开放域视频推理) [02:06] 🎬 HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives(HoloCine:端到端生成多镜头长时电影级叙事视频) [02:52] 🌀 Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall(绕过离散扩散采样墙的确定性捷径) [03:33] 💎 Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values(每个问题都有它的价值:显式人类价值驱动的强化学习) [04:06] ⚖ The Massive Legal Embedding Benchmark (MLEB)(大规模法律嵌入评测基准(MLEB)) [04:48] 🔍 DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion(DyPE:面向超高分辨率扩散模型的动态位置外推方法) [05:33] 🕵 Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence(柯南:像侦探一样在多尺度视觉证据上渐进式推理) [06:12] 🤖 Search Self-play: Pushing the Frontier of Agent Capability without Supervision(搜索自博弈:无需监督即可拓展智能体能力边界) [06:56] 🎭 Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations(探究大音频语言模型在说话人情绪变化下的安全漏洞) [07:42] 🖼 LayerComposer: Interactive Personalized T2I via Spatially-Aware Layered Canvas(LayerComposer:基于空间感知分层画布的交互式个性化文生图) [08:10] 🎧 SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models(SAKE:面向大型音频-语言模型听觉属性知识编辑的探索) [08:51] 🖼 ARGenSeg: Image Segmentation with Autoregressive Image Generation Model(ARGenSeg:基于自回归图像生成的图像分割) [09:39] 🧩 Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets(Seed3D 1.0:从单张图像生成高保真、可仿真的3D资产) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 分钟
  3. 4天前

    2025.10.23 | 线性注意力显存降十倍;动态裁剪PPO稳提分

    本期的 15 篇论文如下: [00:19] 🧠 Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning(每一种注意力都重要:面向长上下文推理的高效混合架构) [00:59] ⚖ BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping(BAPO:通过自适应裁剪的平衡策略优化稳定LLM离策略强化学习) [01:40] 🧠 LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts(LoongRL:面向长文本高级推理的强化学习方法) [02:18] 🌍 GigaBrain-0: A World Model-Powered Vision-Language-Action Model(GigaBrain-0:基于世界模型的通才视觉-语言-动作大模型) [02:49] 🔄 Language Models are Injective and Hence Invertible(语言模型是单射的,因此可逆) [03:25] 📹 VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos(VideoAgentTrek:利用无标注视频预训练计算机操作智能体) [04:01] 📲 DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents(DaMo:面向手机智能体的多模态大模型微调数据配比优化器) [04:55] 🚀 Unified Reinforcement and Imitation Learning for Vision-Language Models(统一强化与模仿学习的视觉-语言模型) [05:28] 🖼 Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing(Pico-Banana-400K:面向文本引导图像编辑的大规模高质量数据集) [06:17] 📊 FinSight: Towards Real-World Financial Deep Research(FinSight:迈向真实场景的金融深度研究) [07:06] 🧠 Are they lovers or friends? Evaluating LLMs' Social Reasoning in English and Korean Dialogues(他们是恋人还是朋友?评估大语言模型在英韩对话中的社会推理能力) [07:43] 🌍 OmniNWM: Omniscient Driving Navigation World Models(OmniNWM:全景驾驶导航全知世界模型) [08:28] 🕳 Attention Sinks in Diffusion Language Models(扩散语言模型中的注意力沉陷现象) [09:04] 📄 olmOCR 2: Unit Test Rewards for Document OCR(olmOCR 2:基于单元测试奖励的文档OCR系统) [09:42] 🧠 KORE: Enhancing Knowledge Injection for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints(KORE:通过知识导向增强与约束为大模型持续注入知识) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 分钟
  4. 5天前

    2025.10.22 | LightMem压缩记忆千倍提速12倍;闭环世界模型微调8万数据反超巨兽

    本期的 14 篇论文如下: [00:19] 🧠 LightMem: Lightweight and Efficient Memory-Augmented Generation(LightMem:轻量高效的记忆增强生成框架) [00:55] 🌀 World-in-World: World Models in a Closed-Loop World(世界中的世界:闭环环境下的世界模型) [01:44] 🖼 UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation(UniGenBench++:面向文本到图像生成的统一语义评测基准) [02:29] 🧪 Chem-R: Learning to Reason as a Chemist(Chem-R:像化学家一样学习推理) [03:10] 🎬 MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation(MoGA:面向端到端长视频生成的分组混合注意力机制) [03:52] 🔍 Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs(任意区域皆可掌握:面向多模态大模型的精准上下文像素级理解) [04:49] 🎬 IF-VidCap: Can Video Caption Models Follow Instructions?(IF-VidCap:视频字幕模型能听懂指令吗?) [05:35] 🚀 Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model(万亿参数思维模型的强化学习扩展之路) [06:21] 🎬 MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues(MT-Video-Bench:面向多轮对话评估多模态大模型视频理解能力的综合基准) [07:12] 🧠 ssToken: Self-modulated and Semantic-aware Token Selection for LLM Fine-tuning(ssToken:面向大模型微调的自调制语义感知Token筛选方法) [07:43] 🎬 MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models(MUG-V 10B:面向大视频生成模型的高效训练流水线) [08:18] 🎯 ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder(ProCLIP:基于大语言模型嵌入器的渐进式视觉-语言对齐方法) [09:29] 🎬 UltraGen: High-Resolution Video Generation with Hierarchical Attention(UltraGen:基于分层注意力的原生高分辨率视频生成) [10:15] 🔄 DSI-Bench: A Benchmark for Dynamic Spatial Intelligence(DSI-Bench:动态空间智能评测基准) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 分钟
  5. 6天前

    2025.10.21 | 模型不懂光影折射;小模型也能写报告

    本期的 13 篇论文如下: [00:21] 🪞 PICABench: How Far Are We from Physically Realistic Image Editing?(PICABench:我们离物理真实的图像编辑还有多远?) [01:04] 🤖 DeepAnalyze: Agentic Large Language Models for Autonomous Data Science(DeepAnalyze:面向自主数据科学的智能体大模型) [01:50] 🗜 Glyph: Scaling Context Windows via Visual-Text Compression(Glyph:通过视觉-文本压缩扩展上下文窗口长度) [02:23] 🔍 Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation(面向通用检索增强生成的混合模态检索研究) [03:10] 🔗 When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling(何时集成:定位Token级位置实现稳定高效的大模型集成) [04:09] 🎯 Annotation-Efficient Universal Honesty Alignment(注释高效型通用诚实对齐) [04:49] 🖌 Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback(Uniworld-V2:借助扩散负感知微调与MLLM隐式反馈强化图像编辑) [05:46] 👁 RL makes MLLMs see better than SFT(强化学习让多模态大模型看得比监督微调更清楚) [06:33] 🚀 Visual Autoregressive Models Beat Diffusion Models on Inference Time Scaling(视觉自回归模型在推理时扩展上击败扩散模型) [07:09] 🎨 ConsistEdit: Highly Consistent and Precise Training-free Visual Editing(ConsistEdit:面向MM-DiT的高一致免训练视觉编辑) [07:56] 🔄 Deep Self-Evolving Reasoning(深度自演化推理) [08:22] 🧠 Beyond Pipelines: A Survey of the Paradigm Shift toward Model-Native Agentic AI(超越流水线:模型原生智能体AI范式转移综述) [09:07] 🔮 Chronos-2: From Univariate to Universal Forecasting(Chronos-2:从单变量到通用预测) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    10 分钟
  6. 10月20日

    2025.10.20 | RPC剪枝提速保准;OmniVinci小数据跨模态称王

    本期的 15 篇论文如下: [00:20] 🧠 A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning(大模型推理中内部概率与自洽性桥接的理论研究) [01:04] 🌐 OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM(OmniVinci:面向全模态理解大模型的架构与数据增强) [01:44] 🎬 Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset(用百万级合成数据集放大指令式视频编辑) [02:28] ✂ NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks(NANO3D:无需训练与掩码的高效3D编辑新方法) [03:05] 🛰 Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery(Skyfall-GS:仅凭卫星影像合成沉浸式3D城市场景) [03:41] ⚠ Emergent Misalignment via In-Context Learning: Narrow in-context examples can produce broadly misaligned LLMs(情境学习中的突发错位:狭窄示例可让大模型广泛失准) [04:18] 🧬 Latent Diffusion Model without Variational Autoencoder(无需变分自编码器的潜在扩散模型) [04:52] 📸 LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal(LightsOut:基于扩散的延展补全提升镜头眩光去除) [05:30] 🧠 MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning(MorphoBench:随模型推理能力自适应难度的评测基准) [06:14] 🧠 A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning(A²FM:面向工具感知混合推理的自适应智能体基础模型) [06:56] 🗣 Language Models Model Language(语言模型即语言本身) [07:36] 🖼 BLIP3o-NEXT: Next Frontier of Native Image Generation(BLIP3o-NEXT:原生图像生成的下一个前沿) [08:30] 🌐 Paper2Web: Let's Make Your Paper Alive!(Paper2Web:让你的论文“活”起来!) [09:12] 🔬 Foundation Models for Scientific Discovery: From Paradigm Enhancement to Paradigm Transition(面向科学发现的基础模型:从范式增强到范式跃迁) [09:55] 🔍 Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents(探索以进化:通过主动在线探索扩展深度研究智能体的聚合逻辑) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 分钟

评分及评论

5
共 5 分
2 个评分

关于

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

你可能还喜欢