HuggingFace 每日AI论文速递

duan

0.0 (0)
TECHNOLOGY
UPDATED DAILY

每天10分钟，带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新，欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版，可在小红书搜索并关注【AI速递】

3 HR AGO

2025.09.29 | 实时长视频边聊边播；分位数基线稳控推理熵

本期的 15 篇论文如下： [00:20] 🎬 LongLive: Real-time Interactive Long Video Generation（LongLive：实时交互式长视频生成框架） [00:56] 🎯 Quantile Advantage Estimation for Entropy-Safe Reasoning（用于熵安全推理的分位数优势估计） [01:34] 📄 MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing（MinerU2.5：面向高效高分辨率文档解析的解耦视觉-语言模型） [02:11] 🧠 EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning（EPO：面向LLM智能体强化学习的熵正则策略优化） [03:08] 🧠 Variational Reasoning for Language Models（语言模型的变分推理框架） [03:37] 💬 Language Models Can Learn from Verbal Feedback Without Scalar Rewards（无需标量奖励，语言模型也能从语言反馈中学习） [04:32] 🔍 ReviewScore: Misinformed Peer Review Detection with Large Language Models（ReviewScore：用大模型揪出“跑偏”的同行评审） [05:12] 🎯 CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning（CapRL：用强化学习激发稠密图像描述潜能） [05:49] 🪄 MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning（MesaTask：面向任务驱动的桌面场景生成与3D空间推理） [06:32] 🎯 No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping（零方差提示不浪费：基于熵引导优势塑造的LLM强化学习新范式） [07:14] 🗣 VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing（VoiceAssistant-Eval：横跨听、说、看的AI助手基准测评） [07:58] 🧭 UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios（UltraHorizon：在长周期场景中评估智能体能力的基准） [08:29] 🖼 LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer（LucidFlux：无需文字描述的大规模扩散Transformer通用图像修复） [09:16] 🌐 WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning（WebGen-Agent：借助多级反馈与步骤级强化学习提升交互式网页生成） [09:49] 🔄 SPARK: Synergistic Policy And Reward Co-Evolving Framework（SPARK：策略与奖励协同演化的强化学习框架）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
2 DAYS AGO

【周末特辑】9月第5周最火AI论文 | Qwen3-Omni开源称王; 锁定视觉训解码，Baseer刷新阿文OCR；

本期的 5 篇论文如下： [00:38] TOP1(🔥116) | 📜 Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR（Baseer：面向阿拉伯文档OCR的视觉-语言模型） [02:43] TOP2(🔥113) | 🌐 Qwen3-Omni Technical Report（Qwen3-Omni技术报告：首个无性能损耗的全模态大模型） [05:23] TOP3(🔥112) | 🗺 RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation（RPG：用于统一可扩展代码库生成的仓库规划图） [07:45] TOP4(🔥104) | 📈 VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models（VCRL：面向大语言模型的方差驱动课程强化学习） [10:05] TOP5(🔥89) | 🚀 LIMI: Less is More for Agency（LIMI：少即是多，打造AI智能体）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13 min
3 DAYS AGO

2025.09.26 | SciReasoner八项全能；MMR1模糊区炼出开源多模态

本期的 15 篇论文如下： [00:20] 🔬 SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines（SciReasoner：跨学科夯实科学推理基石） [01:00] 🧠 MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources（MMR1：基于方差感知采样与开放资源的多模态推理增强） [01:41] 📈 VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models（VCRL：面向大语言模型的方差驱动课程强化学习） [02:26] 🌳 Tree Search for LLM Agent Reinforcement Learning（基于树搜索的大语言模型智能体强化学习） [03:06] 🖼 Seedream 4.0: Toward Next-generation Multimodal Image Generation（Seedream 4.0：面向下一代多模态图像生成） [03:40] 🎯 Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets（Hunyuan3D-Omni：统一可控3D资产生成框架） [04:29] 🤖 AutoIntent: AutoML for Text Classification（AutoIntent：面向文本分类任务的自动化机器学习框架） [05:10] ⚖ TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them（TrustJudge：LLM-as-a-Judge的评分不一致性及缓解之道） [05:43] 🎢 CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning（CE-GPPO：通过梯度保留裁剪策略优化控制强化学习中的熵） [06:30] 🖼 Does FLUX Already Know How to Perform Physically Plausible Image Composition?（FLUX已掌握物理可信图像合成？） [07:31] ✂ CHARM: Control-point-based 3D Anime Hairstyle Auto-Regressive Modeling（CHARM：基于控制点的3D动漫发型自回归建模） [08:26] 🧠 Recon-Act: A Self-Evolving Multi-Agent Browser-Use System via Web Reconnaissance, Tool Generation, and Task Execution（Recon-Act：基于网络侦察、工具生成与任务执行的自我演化多智能体浏览器操作系统） [09:12] 🎮 V-GameGym: Visual Game Generation for Code Large Language Models（V-GameGym：面向代码大模型的视觉游戏生成基准） [09:49] 🗣 Interactive Recommendation Agent with Active User Commands（支持主动用户指令的交互式推荐智能体） [10:22] 🔍 BESPOKE: Benchmark for Search-Augmented Large Language Model Personalization via Diagnostic Feedback（BESPOKE：基于诊断反馈的搜索增强大模型个性化评测基准）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
4 DAYS AGO

2025.09.25 | 视频模型零样本全能；隐式思维链省token提效

本期的 10 篇论文如下： [00:22] 🎥 Video models are zero-shot learners and reasoners（视频模型是零样本学习者与推理者） [01:09] 🧠 SIM-CoT: Supervised Implicit Chain-of-Thought（SIM-CoT：基于监督式隐式思维链的高效推理） [01:55] 🪶 EmbeddingGemma: Powerful and Lightweight Text Representations（EmbeddingGemma：强大而轻量的文本表征模型） [02:29] 🗣 Advancing Speech Understanding in Speech-Aware Language Models with GRPO（基于GRPO提升语音感知大模型开放域理解能力） [03:06] 🌍 LLMs4All: A Review on Large Language Models for Research and Applications in Academic Disciplines（LLMs4All：面向各学科研究与应用的通用大模型综述） [03:52] 🎬 EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning（EditVerse：用上下文学习统一图像与视频编辑生成） [04:29] 🌀 Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation（Lavida-O：弹性大掩码扩散模型统一多模态理解与生成） [05:19] 🎬 PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation（PhysCtrl：基于生成式物理的可控且物理真实的视频生成框架） [05:58] 📄 Logics-Parsing Technical Report（Logics-Parsing 技术报告：基于强化学习的大模型端到端文档解析） [06:44] 🤖 On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub（关于自主编码的实证研究：GitHub上由AI代理发起的拉取请求分析）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

8 min
5 DAYS AGO

2025.09.24 | 阿语OCR刷新指标；无标注RL涨分

本期的 15 篇论文如下： [00:24] 📜 Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR（Baseer：面向阿拉伯文档OCR的视觉-语言模型） [00:58] 🚀 Reinforcement Learning on Pre-Training Data（基于预训练数据的强化学习） [01:37] 👁 Do You Need Proprioceptive States in Visuomotor Policies?（无需本体感觉状态的视觉-运动策略是否可行？） [02:36] 🚀 MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe（MiniCPM-V 4.5：通过架构、数据与训练配方烹饪高效多模态大模型） [03:24] 🎯 MAPO: Mixed Advantage Policy Optimization（混合优势策略优化：解决GRPO中优势分配难题） [04:06] 🚀 Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation（Hyper-Bagel：统一加速多模态理解与生成的一体化框架） [04:44] 🎯 VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction（VolSplat：基于体素对齐预测的前馈3D高斯抛雪球重建新范式） [05:31] 🌌 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation（Lyra：基于视频扩散模型自蒸馏的生成式3D场景重建） [06:08] 🧩 What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT（有效推理的密码：重新审视思维链长度、回顾与结构） [06:41] 🗣 Large Language Models Discriminate Against Speakers of German Dialects（大型语言模型对德语方言使用者的歧视） [07:32] 📊 OpenGVL - Benchmarking Visual Temporal Progress for Data Curation（OpenGVL——面向数据整理的视觉时序进展评测基准） [08:19] 🪄 HyRF: Hybrid Radiance Fields for Memory-efficient and High-quality Novel View Synthesis（HyRF：混合辐射场实现内存高效且高质量的新视角合成） [09:07] 🛠 CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching（条件感知重参数化对齐源域与目标域的流匹配） [09:41] 🛰 Zero-Shot Multi-Spectral Learning: Reimagining a Generalist Multimodal Gemini 2.5 Model for Remote Sensing Applications（零样本多光谱学习：让通用多模态Gemini 2.5模型在遥感任务中重焕新生） [10:28] 🌍 VIR-Bench: Evaluating Geospatial and Temporal Understanding of MLLMs via Travel Video Itinerary Reconstruction（VIR-Bench：通过旅行视频行程重建评测多模态大模型的地理-时空理解力）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12 min
6 DAYS AGO

2025.09.23 | 少78条示范让AI飙73.5%；免掩膜视频插主体超Pika

本期的 15 篇论文如下： [00:21] 🚀 LIMI: Less is More for Agency（LIMI：少即是多，打造AI智能体） [00:55] 🎬 OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models（无需掩膜的视频任意主体插入：基于扩散Transformer模型） [01:28] 🧩 OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System（OnePiece：面向工业级级联排序系统的上下文工程与推理融合框架） [02:19] 🌐 Qwen3-Omni Technical Report（Qwen3-Omni技术报告：首个无性能损耗的全模态大模型） [02:55] 🎬 TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs（TempSamp-R1：面向视频时序定位任务的高效离策略强化微调框架） [03:28] 📐 GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning（GeoPQA：弥合多模态大模型几何推理中的视觉感知鸿沟） [04:15] 🎯 DiffusionNFT: Online Diffusion Reinforcement with Forward Process（DiffusionNFT：基于前向过程在线扩散强化学习） [05:05] 🤖 ByteWrist: A Parallel Robotic Wrist Enabling Flexible and Anthropomorphic Motion for Confined Spaces（ByteWrist：面向狭窄空间的可穿戴并行机器人腕关节） [05:42] 💬 EpiCache: Episodic KV Cache Management for Long Conversational Question Answering（EpiCache：面向长对话问答的情景式KV缓存管理） [06:24] 🧠 SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?（SWE-Bench Pro：AI智能体能攻克长周期软件工程难题吗？） [07:01] 🧠 FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions（FlagEval发现报告：大推理模型在可自动验证文本与视觉问题上的初步测评） [08:05] 🎬 VideoFrom3D: 3D Scene Video Generation via Complementary Image and Video Diffusion Models（VideoFrom3D：基于互补图像与视频扩散模型的3D场景视频生成） [08:53] 🧪 ARE: Scaling Up Agent Environments and Evaluations（ARE：扩展智能体环境与评测规模） [09:28] 🧩 QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models（QWHA：面向大模型量化部署的沃尔什-哈达玛参数高效微调方法） [10:17] 🔍 Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels（从token与参数双视角解析监督微调对模型知识的影响）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
22 SEPT

2025.09.22 | 有向图驱动代码生成；双通道视觉统一模型

本期的 13 篇论文如下： [00:25] 🗺 RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation（RPG：用于统一可扩展代码库生成的仓库规划图） [01:00] 🌉 MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer（MANZANO：基于混合视觉词元器的简洁可扩展统一多模态模型） [01:42] 🧩 Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification（潜区分网络：生成建模、表示学习与分类的统一原理） [02:25] 🎯 BaseReward: A Strong Baseline for Multimodal Reward Model（BaseReward：多模态奖励模型的强力基线） [02:56] 🏠 SPATIALGEN: Layout-guided 3D Indoor Scene Generation（SpatialGen：布局引导的3D室内场景生成） [03:46] 🧠 BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent（BTL-UI：面向GUI智能体的“眨眼-思考-连接”脑启发推理模型） [04:30] 🎭 Lynx: Towards High-Fidelity Personalized Video Generation（Lynx：面向高保真个性化视频生成） [05:20] 🤖 A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning（用于机器人真实强化学习的视觉-语言-动作-评价模型） [05:54] 📹 RGB-Only Supervised Camera Parameter Optimization in Dynamic Scenes（动态场景下仅基于RGB视频监督的相机参数优化） [06:21] 🗣 Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems（你听见的是我想表达的吗？量化指令感知差距的表达型文本转语音系统研究） [07:07] 🎬 Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents（Video2Roleplay：面向视频引导角色扮演智能体的多模态数据集与框架） [07:50] 🗣 WhisTLE: Deeply Supervised, Text-Only Domain Adaptation for Pretrained Speech Recognition Transformers（WhisTLE：面向预训练语音识别Transformer的纯文本深度监督域适应方法） [08:30] 🗣 Ask-to-Clarify: Resolving Instruction Ambiguity through Multi-turn Dialogue（主动询问以澄清：通过多轮对话消解指令歧义）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10 min
20 SEPT

【周末特辑】9月第4周最火AI论文 | OmniWorld打造4D数据工厂；WebWeaver让AI边搜边写

本期的 5 篇论文如下： [00:43] TOP1(🔥95) | 🌍 OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling（OmniWorld：面向4D世界建模的多领域多模态大规模数据集） [02:51] TOP2(🔥93) | 🔍 WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research（WebWeaver：面向开放型深度研究的动态提纲式网络证据结构化框架） [05:09] TOP3(🔥91) | 🤖 Scaling Agents via Continual Pre-training（基于持续预训练扩展智能体系统规模的研究） [07:33] TOP4(🔥88) | 🖥 ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data（ScaleCUA：基于跨平台数据的开源计算机智能体规模化方案） [10:48] TOP5(🔥79) | 🌊 FlowRL: Matching Reward Distributions for LLM Reasoning（FlowRL：通过流匹配奖励分布提升大语言模型推理能力）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13 min

See All (396)

Creator

duan
Years Active

2024 - 2025
Episodes

396
Rating

Clean
Show Website

HuggingFace 每日AI论文速递

Technology

Technology

Every two weeks
Technology

Technology

Every two weeks
Technology

Technology

Updated weekly
Technology

Technology

Every two weeks
Investing

Investing

Updated weekly
Investing

Investing

Updated weekly
Leisure

Leisure

Updated weekly

HuggingFace 每日AI论文速递

2025.09.29 | 实时长视频边聊边播；分位数基线稳控推理熵

【周末特辑】9月第5周最火AI论文 | Qwen3-Omni开源称王; 锁定视觉训解码，Baseer刷新阿文OCR；

2025.09.26 | SciReasoner八项全能；MMR1模糊区炼出开源多模态

2025.09.25 | 视频模型零样本全能；隐式思维链省token提效

2025.09.24 | 阿语OCR刷新指标；无标注RL涨分

2025.09.23 | 少78条示范让AI飙73.5%；免掩膜视频插主体超Pika

2025.09.22 | 有向图驱动代码生成；双通道视觉统一模型

【周末特辑】9月第4周最火AI论文 | OmniWorld打造4D数据工厂；WebWeaver让AI边搜边写

About

Information

You Might Also Like

HuggingFace 每日AI论文速递

Episodes

2025.09.29 | 实时长视频边聊边播；分位数基线稳控推理熵

【周末特辑】9月第5周最火AI论文 | Qwen3-Omni开源称王; 锁定视觉训解码，Baseer刷新阿文OCR；

2025.09.26 | SciReasoner八项全能；MMR1模糊区炼出开源多模态

2025.09.25 | 视频模型零样本全能；隐式思维链省token提效

2025.09.24 | 阿语OCR刷新指标；无标注RL涨分

2025.09.23 | 少78条示范让AI飙73.5%；免掩膜视频插主体超Pika

2025.09.22 | 有向图驱动代码生成；双通道视觉统一模型

【周末特辑】9月第4周最火AI论文 | OmniWorld打造4D数据工厂；WebWeaver让AI边搜边写

About

Information

You Might Also Like