HuggingFace 每日AI论文速递

duan

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. 15시간 전

    2025.10.14 | 量化误差变奖励,单卡训32B;面向多模态大模型的音视频评测基准

    本期的 15 篇论文如下: [00:23] 🚀 QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs(QeRL:超越效率——面向大语言模型的量化增强强化学习) [01:22] 🧠 Diffusion Transformers with Representation Autoencoders(基于表示自编码器的扩散Transformer) [02:12] 🎬 OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs(OmniVideoBench:面向全向多模态大模型的音视频协同理解评测基准) [02:41] 🔄 Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States(潜变量精化解码:通过精化信念状态增强基于扩散的语言模型) [03:18] 🌊 RLFR: Extending Reinforcement Learning for LLMs with Flow Environment(RLFR:基于潜流环境扩展大模型强化学习) [04:11] 🔍 Spotlight on Token Perception for Multimodal Reinforcement Learning(多模态强化学习中token感知的光束聚焦) [04:50] 🎬 AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration(AVoCaDO:面向时序编排的音视频联合字幕生成器) [05:25] 🌐 DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training(DiT360:混合训练视角与全景数据的高保真全景图像生成) [05:56] 🧠 Demystifying Reinforcement Learning in Agentic Reasoning(揭开强化学习在智能体推理中的神秘面纱) [06:51] 🧮 Making Mathematical Reasoning Adaptive(让数学推理具备自适应性) [07:26] 🛡 Building a Foundational Guardrail for General Agentic Systems via Synthetic Data(面向通用智能体的基础护栏:基于合成数据的预执行安全框架) [08:05] 🧠 ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems(ACADREASON:用学术研究问题探索推理模型的极限) [08:43] 🎨 InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models(InternSVG:用多模态大模型统一搞定SVG理解、编辑与生成) [09:23] 🧾 FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs(FinAuditing:面向LLM评估的财务分类多文档基准) [10:09] 🧠 GIR-Bench: Versatile Benchmark for Generating Images with Reasoning(GIR-Bench:面向推理图像生成的多功能基准) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11분
  2. 1일 전

    2025.10.13 | 桌面交互预训练解锁机器人潜能;统一模型赋予相机空间想象力

    本期的 14 篇论文如下: [00:20] 🖥 D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI(D2E:利用桌面数据规模化视觉-动作预训练以迁移至具身智能) [01:13] 📷 Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation(基于相机的统一多模态理解与生成模型) [01:56] 🎨 TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling(TAG:抑制幻觉的扩散采样切向放大引导) [02:31] 🧠 Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs(多模态提示优化:为何不为多模态大模型释放全模态潜能) [03:05] 🚀 AutoPR: Let's Automate Your Academic Promotion!(AutoPR:让学术晋升一键自动化!) [03:39] 🧭 R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?(R-HORIZON:你的大推理模型在广度与深度上究竟能走多远?) [04:14] 🚀 Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels(Webscale-RL:把强化学习数据扩展到预训练体量的自动化流水线) [04:56] 🛰 SpaceVista: All-Scale Visual Spatial Reasoning from mm to km(SpaceVista:毫米到千米全尺度视觉空间推理) [05:37] 🎥 StreamingVLM: Real-Time Understanding for Infinite Video Streams(StreamingVLM:面向无限视频流的实时理解框架) [06:19] 🌐 KORMo: Korean Open Reasoning Model for Everyone(KORMo:人人可用的韩语开放推理模型) [06:42] ♻ Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting(别浪费错误:通过置信度加权利用负RL组) [07:25] 🧠 Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization(从推理到学习的桥梁:以复杂度分布外泛化揭穿幻觉) [08:16] ⚡ DISCO: Diversifying Sample Condensation for Efficient Model Evaluation(DISCO:以模型分歧为导向的样本浓缩加速评测) [08:56] 🚗 Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction(面向开放词汇占用预测的各向异性采样渐进高斯Transformer) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    10분
  3. 4일 전

    2025.10.10 | 早期经验的Agent Learning;图文交错反思链跃升至24.9%

    本期的 14 篇论文如下: [00:16] 🌱 Agent Learning via Early Experience(基于早期经验的主体学习) [00:50] 🧠 MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization(MM-HELIX:以整体平台与自适应混合策略优化激发多模态长链反思推理) [01:42] 🧪 From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning(从“是什么”到“为什么”:面向循证化学反应条件推理的多智能体系统) [02:19] 🎬 UniVideo: Unified Understanding, Generation, and Editing for Videos(UniVideo:统一理解、生成与编辑视频的多模态框架) [03:01] 🧠 When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs(当思想邂逅事实:面向长上下文语言模型的可复用推理) [03:43] 🧠 Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning(元认知增强推理模型:自对齐强化学习) [04:25] 🧠 MemMamba: Rethinking Memory Patterns in State Space Model(MemMamba:重新思考状态空间模型中的记忆模式) [05:17] 🛡 The Alignment Waltz: Jointly Training Agents to Collaborate for Safety(对齐圆舞曲:联合训练智能体协同守护安全) [05:53] 🎯 Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense(混合强化:奖励稀疏时,密集信号更胜一筹) [06:40] 🧪 NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents(NewtonBench:评测大模型智能体在通用科学定律发现中的基准) [07:17] 🪚 DeepPrune: Parallel Scaling without Inter-trace Redundancy(DeepPrune:并行扩展中消除跨路径冗余的高效推理框架) [07:54] 🚀 Training-Free Group Relative Policy Optimization(免训练群组相对策略优化) [08:24] 🪄 ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation(ARTDECO:面向高效高保真即时三维重建的结构化场景表征) [08:55] 🤥 LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions(大模型在欺骗性样本与偏见人机交互中意外学会欺骗:不诚实行为的新兴错位) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    10분
  4. 5일 전

    2025.10.09 | Ming-UniVision统一视觉词表;KV-Cache直连让大模型秒聊

    本期的 15 篇论文如下: [00:21] 🔄 Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer(Ming-UniVision:用统一连续视觉词表打通图像理解与生成) [00:59] 🧠 Cache-to-Cache: Direct Semantic Communication Between Large Language Models(缓存到缓存:大模型间的直接语义通信) [01:32] 🌀 Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding(Lumina-DiMOO:面向多模态生成与理解的离散扩散大模型) [02:07] 🧠 SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models(SHANKS:口语模型边听边想的同步推理框架) [03:06] 🤖 RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training(RLinf-VLA:面向VLA模型强化学习训练的统一高效框架) [04:02] 🎬 MATRIX: Mask Track Alignment for Interaction-aware Video Generation(MATRIX:面向交互感知视频生成的掩码轨迹对齐) [04:51] 🎯 Vibe Checker: Aligning Code Evaluation with Human Preference(Vibe Checker:让代码评估对齐人类偏好) [05:44] 🤖 Multi-Agent Tool-Integrated Policy Optimization(多智能体工具集成策略优化) [06:24] 🧠 CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling(风暴前夜:解锁优化建模原生推理潜能的轻量化矫正框架) [06:59] ✂ OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot(OBS-Diff:一次性精准剪枝扩散模型) [07:52] 🧠 Artificial Hippocampus Networks for Efficient Long-Context Modeling(面向高效长上下文建模的人工海马网络) [08:30] 🔍 Revisiting Long-context Modeling from Context Denoising Perspective(基于上下文降噪视角的长文本建模再审视) [09:11] 🧠 Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought(推动多语言推理模型:语言混合思维链新范式) [09:51] 💥 Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention(低精度Transformer训练为何失败:Flash Attention失效机理剖析) [10:37] ⚡ Native Hybrid Attention for Efficient Sequence Modeling(原生混合注意力高效序列建模) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    12분
  5. 6일 전

    2025.10.08 | TaTToo用外挂代码干翻大模型;4B小模型32步逼近闭源巨头

    本期的 15 篇论文如下: [00:24] 📊 TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning(TaTToo:面向表格推理测试时扩展的“工具落地思维”过程奖励模型) [00:57] 🔍 Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs(Fathom-DeepResearch:解锁小模型长程信息检索与综合的钥匙) [01:39] 🚀 Fast-dLLM v2: Efficient Block-Diffusion LLM(Fast-dLLM v2:高效的块扩散大语言模型) [02:30] 🧑 CoDA: Coding LM via Diffusion Adaptation(CoDA:基于扩散适配的轻量级代码生成模型) [03:01] 🧩 Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning(规模化代码辅助思维链与指令以增强模型推理) [03:52] ⚖ ASPO: Asymmetric Importance Sampling Policy Optimization(ASPO:非对称重要性采样策略优化) [04:34] 🔗 Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context(混合机制:语言模型如何在上下文中检索绑定实体) [05:15] 🧠 AInstein: Assessing the Feasibility of AI-Generated Approaches to Research Problems(AInstein:评估AI生成科研方案可行性的研究框架) [05:51] 🪂 Refusal Falls off a Cliff: How Safety Alignment Fails in Reasoning?(拒绝断崖:安全对齐在推理中为何崩塌) [06:35] 🌍 HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video(HoloScene:单视频生成可交互3D仿真世界) [07:22] ⚡ TensorBLEU: Vectorized GPU-based BLEU Score Implementation for Per-Sentence In-Training Evaluation(TensorBLEU:面向逐句训练评估的向量化GPU加速BLEU分数实现) [08:09] 🎯 Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization(边缘自适应DPO:利用奖励模型实现偏好优化的粒度控制) [09:00] 🩺 Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation(基于多模态大语言模型的离散扩散模型实现统一医学多模态生成) [09:46] 🧠 MixReasoning: Switching Modes to Think(混合推理:动态切换思考模式) [10:20] ⚡ LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation(LightCache:面向视频生成的内存高效、无需训练的加速方法) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11분
  6. 10월 7일

    2025.10.07 | 论文秒变演讲;Video-LMM后训练突破

    本期的 15 篇论文如下: [00:21] 🎬 Paper2Video: Automatic Video Generation from Scientific Papers(论文自动生成学术演讲视频) [00:55] 🎬 Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models(Video-LMM后训练:深入剖析大型多模态模型的视频推理) [01:38] 🎬 VChain: Chain-of-Visual-Thought for Reasoning in Video Generation(VChain:面向视频生成推理的视觉思维链) [02:14] 👻 Imperceptible Jailbreaking against Large Language Models(针对大语言模型的隐形越狱攻击) [02:56] 🌳 MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information(MITS:基于点互信息的树搜索增强大模型推理) [03:30] 🧬 Hybrid Architectures for Language Models: Systematic Analysis and Design Insights(语言模型混合架构:系统剖析与设计洞见) [04:07] 📊 Factuality Matters: When Image Generation and Editing Meet Structured Visuals(事实至关重要:当图像生成与编辑遇上结构化视觉) [04:59] 🔄 Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models(反应式Transformer:事件驱动的实时有状态对话模型) [05:55] ⚖ Judging with Confidence: Calibrating Autoraters to Preference Distributions(置信评判:将自动评分器校准到偏好分布) [06:44] 🎯 Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training(Reinforce-Ada:面向Reinforce风格LLM训练的自适应采样框架) [07:27] 📏 Optimal Scaling Needs Optimal Norm(最优扩放需要最优范数) [07:51] 🔬 Code4MeV2: a Research-oriented Code-completion Platform(Code4MeV2:面向研究的代码补全平台) [08:31] 🪞 Self-Reflective Generation at Test Time(测试时自反思生成) [09:15] 🔄 SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs(SwiReasoning:在显式与潜空间之间切换思维,实现帕累托更优的推理大模型) [10:00] 👀 Watch and Learn: Learning to Use Computers from Online Videos(观看与学习:从在线视频中学习使用计算机) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11분
  7. 10월 6일

    2025.10.06 | 15B小模型追平DeepSeek-R1;渐进蒸馏128 token省八成算力

    本期的 15 篇论文如下: [00:28] 🧠 Apriel-1.5-15b-Thinker(Apriel-1.5-15B-Thinker:以小博大实现前沿多模态推理的15B开源模型) [01:04] 🚀 Efficient Multi-modal Large Language Models via Progressive Consistency Distillation(基于渐进一致性蒸馏的高效多模态大模型) [01:42] 🧩 Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition(组合式策略!利用测试时段分布级组合提升基于扩散或流的机器人策略性能) [02:19] 🪞 Self-Improvement in Multimodal Large Language Models: A Survey(多模态大语言模型自我提升综述) [02:59] 🧬 Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents(你的智能体可能误入歧途:自演化大模型智能体中的涌现风险) [03:38] 📊 CoDA: Agentic Systems for Collaborative Data Visualization(CoDA:面向协同数据可视化的智能体系统) [04:21] 🧐 SurveyBench: How Well Can LLM(-Agents) Write Academic Surveys?(SurveyBench:大模型(智能体)写学术综述能有多靠谱?) [05:06] 🔧 REPAIR: Robust Editing via Progressive Adaptive Intervention and Reintegration(REPAIR:渐进式自适应干预与再融合的鲁棒编辑框架) [05:53] 🔍 OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features(OrtSAE:正交稀疏自编码器揭示原子级特征) [06:38] 🔍 FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents(FocusAgent:轻量级检索器为网页智能体精简冗长上下文的简易高效方案) [07:14] 🎯 Improving GUI Grounding with Explicit Position-to-Coordinate Mapping(基于显式位置-坐标映射的GUI定位改进方法) [08:05] 📏 LSPO: Length-aware Dynamic Sampling for Policy Optimization in LLM Reasoning(LSPO:面向大模型推理的基于长度感知的动态采样策略优化) [08:45] 🤖 WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents(WAInjectBench:面向网页智能体的提示注入攻防基准评测) [09:19] 🍱 Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs(无需配对偏好图像即可免费对齐文本到图像扩散模型) [09:54] 🎯 LEAML: Label-Efficient Adaptation to Out-of-Distribution Visual Tasks for Multimodal Large Language Models(LEAML:面向多模态大模型的标签高效分布外视觉任务适配) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11분

평가 및 리뷰

5
최고 5점
2개의 평가

소개

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

좋아할 만한 다른 항목