HuggingFace 每日AI论文速递

duan

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. 1天前

    2025.09.03 | 智能体RL提升大模型自主性;SimpleTIR解多轮工具推理

    本期的 15 篇论文如下: [00:19] 🤖 The Landscape of Agentic Reinforcement Learning for LLMs: A Survey(面向大语言模型的智能体强化学习全景:一项综述) [00:40] 🚀 SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning(SimpleTIR:面向多轮工具集成推理的端到端强化学习) [01:12] 🤖 UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning(UI-TARS-2技术报告:通过多轮强化学习推进GUI代理) [01:41] 🎥 ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding(ELV-Halluc:长视频理解中的语义聚合幻觉基准测试) [02:12] 🔄 LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model(LLaVA-Critic-R1:你的评论模型其实是一个强大的策略模型) [02:43] 🔧 VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use(VerlTool:迈向整体性代理强化学习与工具使用) [03:11] 📄 POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion(POINTS-Reader:无蒸馏适配的视觉-语言模型用于文档转换) [03:33] 🩺 Baichuan-M2: Scaling Medical Capability with Large Verifier System(百川-M2:通过大规模验证系统扩展医疗能力) [03:57] 🎥 Kwai Keye-VL 1.5 Technical Report(快手 Keye-VL 1.5 技术报告) [04:20] 🤖 Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR(通过监督学习框架实现隐式Actor-Critic耦合用于RLVR) [04:45] 🧠 Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic(推理向量:通过任务算术传递思维链能力) [05:11] 🔄 Jointly Reinforcing Diversity and Quality in Language Model Generations(在语言模型生成中联合强化多样性与质量) [05:42] 🚀 DCPO: Dynamic Clipping Policy Optimization(DCPO: 动态裁剪策略优化) [06:04] 🚀 OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning(OpenVision 2:用于多模态学习的生成式预训练视觉编码器系列) [06:27] 🎬 GenCompositor: Generative Video Compositing with Diffusion Transformer(GenCompositor:基于扩散变换器的生成式视频合成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    7 分钟
  2. 3天前

    2025.09.01 | R-4B模型优化思考效率;EO-1提升机器人控制能力

    本期的 15 篇论文如下: [00:24] 🧠 R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning(R-4B: 通过双模式退火和强化学习激励多模态大语言模型的通用自动思考能力) [00:59] 🤖 EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control(具身一体视觉:交错视觉-文本-动作预训练用于通用机器人控制) [01:29] 🔒 A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code(A.S.E:一个用于评估AI生成代码安全的仓库级基准) [01:57] 🎥 Droplet3D: Commonsense Priors from Videos Facilitate 3D Generation(Droplet3D:视频中的常识先验促进3D生成) [02:26] 🗣 TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis(TalkVid: 一个用于音频驱动说话头部合成的大规模多样化数据集) [02:58] 🤖 A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers(科学大型语言模型综述:从数据基础到智能体前沿) [03:28] 🤖 UItron: Foundational GUI Agent with Advanced Perception and Planning(UItron:具有先进感知和规划能力的基础GUI代理) [03:50] 🎮 Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models(在游戏中思考:通过强化学习与大型语言模型学习游戏推理) [04:20] 🔄 TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training(TiKMiX:将数据影响力纳入语言模型预训练的动态混合) [04:45] 💻 Efficient Code Embeddings from Code Generation Models(来自代码生成模型的高效代码嵌入) [05:10] ⏸ Morae: Proactively Pausing UI Agents for User Choices(Morae: 主动暂停UI代理以供用户选择) [05:37] 🔍 AHELM: A Holistic Evaluation of Audio-Language Models(AHELM:音频语言模型的全面评估) [06:05] 🤖 HERMES: Human-to-Robot Embodied Learning from Multi-Source Motion Data for Mobile Dexterous Manipulation(HERMES: 基于多源运动数据的人到机器人具身学习用于移动灵巧操作) [06:34] 🔄 Model-Task Alignment Drives Distinct RL Outcomes(模型-任务对齐驱动强化学习的差异化结果) [07:08] 👁 Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery(模仿物理学家的眼睛:一种以视觉语言模型为中心的物理公式发现方法) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    8 分钟
  3. 4天前

    【月末特辑】8月最火AI论文 | 科学AI模型缩小性能差距;图像模型解决文本渲染与编辑

    本期的 10 篇论文如下: [00:30] TOP1(🔥242) | 🧪 Intern-S1: A Scientific Multimodal Foundation Model(Intern-S1:一个科学多模态基础模型) [01:36] TOP2(🔥239) | 🎨 Qwen-Image Technical Report(Qwen-Image技术报告) [02:46] TOP3(🔥227) | 🤔 Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens(LLM思维链推理是海市蜃楼吗?一个数据分布的视角) [04:14] TOP4(🔥220) | 🚀 DINOv3(DINOv3:视觉基础模型新里程碑) [05:25] TOP5(🔥168) | 🚀 GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models(GLM-4.5:智能体、推理与编程(ARC)基础模型) [06:25] TOP6(🔥166) | ✨ On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification(关于SFT泛化性的研究:一个基于奖励修正的强化学习视角) [07:29] TOP7(🔥164) | 🚀 InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency(InternVL3.5:提升开源多模态模型在通用性、推理能力和效率上的表现) [08:45] TOP8(🔥156) | 🤖 VeriGUI: Verifiable Long-Chain GUI Dataset(VeriGUI:可验证的长链GUI数据集) [09:53] TOP9(🔥142) | 📚 We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning(We-Math 2.0:一个激励视觉数学推理的多功能数学手册系统) [11:26] TOP10(🔥139) | 🚀 NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale(NextStep-1:迈向大规模连续令牌自回归图像生成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    13 分钟
  4. 6天前

    2025.08.29 | 稳定文本到图像生成;高效数学推理

    本期的 15 篇论文如下: [00:24] ⚖ Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning(Pref-GRPO:基于成对偏好奖励的GRPO用于稳定的文本到图像强化学习) [00:57] 🧠 rStar2-Agent: Agentic Reasoning Technical Report(rStar2-Agent:智能体推理技术报告) [01:28] 🎨 USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning(USO: 通过解耦和奖励学习的统一风格与主题驱动生成) [01:56] 🚀 AWorld: Orchestrating the Training Recipe for Agentic AI(AWorld:编排智能体AI的训练配方) [02:26] 🎯 TCIA: A Task-Centric Instruction Augmentation Method for Instruction Finetuning(TCIA:一种用于指令微调的任务中心式指令增强方法) [02:54] 🧠 Mixture of Contexts for Long Video Generation(上下文混合用于长视频生成) [03:17] 🧠 CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification(CogVLA:基于指令驱动路由与稀疏化的认知对齐视觉-语言-动作模型) [03:51] 🔍 MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers(MCP-Bench: 通过MCP服务器使用复杂现实世界任务对工具使用LLM代理进行基准测试) [04:23] 🎨 OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning(OneReward:通过多任务人类偏好学习实现统一的掩码引导图像生成) [04:54] 🛡 Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection(扭转局面:通过秩一安全注入实现轻量级对齐增强) [05:21] 🧠 Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD(大型语言模型中的说服动态:使用DuET-PD研究知识和安全方面的鲁棒性和适应性) [05:56] 💃 Dress&Dance: Dress up and Dance as You Like It - Technical Preview(着装与舞蹈:随心着装与舞蹈 - 技术预览) [06:18] 🎯 OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language Models(OnGoal:在大型语言模型多轮对话中跟踪和可视化对话目标) [06:42] 📷 Multi-View 3D Point Tracking(多视图3D点跟踪) [07:10] 🎭 FakeParts: a New Family of AI-Generated DeepFakes(FakeParts:一种新型AI生成的深度伪造家族) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    8 分钟
  5. 8月28日

    2025.08.28 | 推理分解减幻觉;可解释性编码信息

    本期的 14 篇论文如下: [00:25] 🧠 Self-Rewarding Vision-Language Model via Reasoning Decomposition(通过推理分解的自奖励视觉语言模型) [00:49] 🔍 Beyond Transcription: Mechanistic Interpretability in ASR(超越转录:自动语音识别中的机械可解释性) [01:22] 🤖 Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies(离散扩散VLA:将离散扩散引入视觉-语言-动作策略中的动作解码) [01:52] 🧠 CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning(CODA:基于解耦强化学习的双脑计算机代理协调大脑与小脑) [02:19] 🤖 MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation(MIDAS:通过实时自回归视频生成的多模态交互式数字人合成) [02:51] 🔮 Predicting the Order of Upcoming Tokens Improves Language Modeling(预测未来token顺序提升语言建模效果) [03:20] 💓 Gaze into the Heart: A Multi-View Video Dataset for rPPG and Health Biomarkers Estimation(凝视心脏:用于rPPG和健康生物标志物估计的多视角视频数据集) [03:52] ⚡ Diffusion Language Models Know the Answer Before Decoding(扩散语言模型在解码前就知道答案) [04:16] 👁 Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents(当心第三只眼!MLLM驱动的智能手机代理中的隐私意识基准测试) [04:38] 🎧 AudioStory: Generating Long-Form Narrative Audio with Large Language Models(AudioStory:使用大型语言模型生成长篇叙事音频) [05:01] 🧠 StepWiser: Stepwise Generative Judges for Wiser Reasoning(StepWiser:逐步生成式评判器以实现更明智的推理) [05:25] 🔄 Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference(驯服混沌:异构与解耦大语言模型推理的协调自动扩展) [05:53] 💃 MotionFlux: Efficient Text-Guided Motion Generation through Rectified Flow Matching and Preference Alignment(MotionFlux:基于整流流匹配和偏好优化的高效文本引导运动生成) [06:18] 📊 DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis(DeepScholar-Bench:用于生成式研究综合的实时基准与自动化评估) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    7 分钟

评分及评论

5
共 5 分
3 个评分

关于

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

你可能还喜欢