HuggingFace 每日AI论文速递

duan

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. 1 DZIEŃ TEMU

    2025.09.19 | 跨平台GUI模型刷榜;FlowRL分布匹配提推理

    本期的 15 篇论文如下: [00:26] 🖥 ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data(ScaleCUA:基于跨平台数据的开源计算机智能体规模化方案) [01:01] 🌊 FlowRL: Matching Reward Distributions for LLM Reasoning(FlowRL:通过流匹配奖励分布提升大语言模型推理能力) [01:57] 🧭 Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration(跨越边界推理:借助测试时深思提升规范对齐) [02:55] 🧬 Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation(无需标签即可让语言模型自我进化:多数选择驱动,新颖性促进变异) [03:34] 🎨 Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation(先理解再生成:面向自回归图像生成的自引导训练) [04:12] 🔍 FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning(FinSearchComp:迈向真实专家级金融搜索与推理评测) [04:56] 🤖 RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation(RynnVLA-001:利用人类示范提升机器人操作能力) [05:39] 🔮 AToken: A Unified Tokenizer for Vision(AToken:面向视觉的统一Tokenizer) [06:10] 🌌 WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance(WorldForge:无需训练即可在视频扩散模型中解锁3D/4D生成的涌现能力) [06:58] 🖼 MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks(MultiEdit:面向多样复杂任务的指令式图像编辑新突破) [07:54] 🎮 RecoWorld: Building Simulated Environments for Agentic Recommender Systems(RecoWorld:为智能推荐系统打造仿真训练沙盒) [08:28] 🎯 Unleashing the Potential of Multimodal LLMs for Zero-Shot Spatio-Temporal Video Grounding(释放多模态大模型零样本时空视频定位潜能) [09:03] 🔍 Mind the Gap: A Closer Look at Tokenization for Multiple-Choice Question Answering with LLMs(留意空格:面向LLM选择题问答的Tokenization再审视) [09:51] 🩺 EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence(EchoVLM:面向通用超声智能的动态混合专家视觉-语言模型) [10:34] 🛰 FSG-Net: Frequency-Spatial Synergistic Gated Network for High-Resolution Remote Sensing Change Detection(FSG-Net:频-空协同门控网络用于高分辨率遥感变化检测) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    12 min
  2. 2 DNI TEMU

    2025.09.18 | FP8压缩+翻译微调低成本炼阿语大模型;2B-8B小模型洗数据硬刚GPT-4o

    本期的 14 篇论文如下: [00:19] 🐪 Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale(Hala技术报告:规模化构建阿拉伯语为中心的指令与翻译模型) [00:56] 🚀 SAIL-VL2 Technical Report(SAIL-VL2技术报告) [01:42] 🌐 PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI Era(全景视界:具身AI时代的360°视觉崛起) [02:33] 🎓 GenExam: A Multidisciplinary Text-to-Image Exam(GenExam:多学科文本到图像生成考试基准) [03:25] 🧹 Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning(擦除敏感记忆!用机器遗忘技术为代码大模型“去隐私”) [03:59] 🩺 MedResearcher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework(MedResearcher-R1:基于知识引导轨迹合成的专家级医学深度研究智能体) [04:37] 🔍 MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook(MARS2 2025多模态推理挑战赛:数据集、方法、结果、讨论与展望) [05:22] 🎭 Wan-Animate: Unified Character Animation and Replacement with Holistic Replication(Wan-Animate:统一角色动画与替换的完整复现框架) [05:59] 🧮 THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning(THOR:融合工具的分层强化学习优化数学推理) [06:40] 🔍 Improving Context Fidelity via Native Retrieval-Augmented Reasoning(提升上下文保真度的原生检索增强推理方法) [07:20] 🌍 AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions(AERIS:面向可靠且高技巧地球系统预测的阿尔贡地球系统模型) [08:13] 🎛 SteeringControl: Holistic Evaluation of Alignment Steering in LLMs(SteeringControl:对大模型对齐操控的全景评估) [08:48] ⚛ Quantum Variational Activation Functions Empower Kolmogorov-Arnold Networks(量子变分激活函数赋能Kolmogorov-Arnold网络) [09:37] 🚀 Hybrid Quantum-Classical Model for Image Classification(用于图像分类的混合量子-经典模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  3. 3 DNI TEMU

    2025.09.17 | WebWeaver框架提升可信长文报告;Agentic预训练扩展智能体系统

    本期的 11 篇论文如下: [00:27] 🔍 WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research(WebWeaver:面向开放型深度研究的动态提纲式网络证据结构化框架) [01:08] 🤖 Scaling Agents via Continual Pre-training(基于持续预训练扩展智能体系统规模的研究) [01:52] ⛵ WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning(WebSailor-V2:依托合成数据与可扩展强化学习跨越开源与私有代理鸿沟) [02:36] 🧠 Towards General Agentic Intelligence via Environment Scaling(迈向通用智能体的环境规模化之路) [03:09] 🔍 WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents(WebResearcher:在长程智能体中释放无界推理能力) [03:59] 🧠 ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization(ReSum:基于上下文压缩的无限视界搜索智能解锁) [04:39] 🚀 Single-stream Policy Optimization(单流策略优化:大语言模型强化学习的去组化革新) [05:19] 🎮 Hunyuan3D Studio: End-to-End AI Pipeline for Game-Ready 3D Asset Generation(Hunyuan3D工作室:面向游戏级3D资产生成的端到端AI管线) [06:00] 🧩 3D Aware Region Prompted Vision Language Model(具备3D感知能力的区域提示视觉语言模型) [06:36] 💡 EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving(EconProver:面向自动定理证明的更经济测试时扩展方法) [07:07] ⚛ Exact Coset Sampling for Quantum Lattice Algorithms(量子格点算法的精确陪集采样) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    8 min
  4. 4 DNI TEMU

    2025.09.16 | OmniWorld建4D数据底座;UI-S1半在线驯界面代理

    本期的 14 篇论文如下: [00:24] 🌍 OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling(OmniWorld:面向4D世界建模的多领域多模态大规模数据集) [01:12] 🤖 UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning(UI-S1:基于半在线强化学习的图形界面自动化新进展) [01:51] 🏠 InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts(InternScenes:具备真实布局的大规模可模拟室内场景数据集) [02:27] 🖱 LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence(LazyDrag:通过显式对应关系在多模态扩散Transformer上实现稳定拖拽编辑) [02:58] 📊 Locality in Image Diffusion Models Emerges from Data Statistics(图像扩散模型中的局部性源于数据统计特性) [03:29] 🤔 Measuring Epistemic Humility in Multimodal Large Language Models(多模态大模型中的认知谦逊评估研究) [03:57] 🤖 Nav-R1: Reasoning and Navigation in Embodied Scenes(Nav-R1:具身场景中的推理与导航) [04:25] 🔍 Lost in Embeddings: Information Loss in Vision-Language Models(迷失在嵌入空间:视觉-语言模型中的信息损失) [04:54] 🌐 CognitiveSky: Scalable Sentiment and Narrative Analysis for Decentralized Social Media(CognitiveSky:面向去中心化社交媒体的情感与叙事可扩展分析框架) [05:19] 🔍 Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models(再审视,慢思考:增强视觉语言模型的视觉反思能力) [05:57] 🧠 EthicsMH: A Pilot Benchmark for Ethical Reasoning in Mental Health AI(心理健康AI伦理推理的试验基准:EthicsMH) [06:30] ⚖ Learning to Optimize Multi-Objective Alignment Through Dynamic Reward Weighting(通过动态奖励加权实现多目标对齐优化学习) [07:16] 🧠 PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits(PersonaX:基于大语言模型推断行为特质的多模态数据集) [07:52] 🔍 GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings(GAPrune:面向领域感知嵌入的梯度对齐剪枝方法) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    9 min
  5. 5 DNI TEMU

    2025.09.15 | 数据集升级测互动;模型大小非长程瓶颈

    本期的 14 篇论文如下: [00:25] 📚 IntrEx: A Dataset for Modeling Engagement in Educational Conversations(IntrEx:面向教育对话中参与度建模的数据集) [01:02] 📏 The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs(“收益递减的幻觉”:衡量大语言模型的长时程执行能力) [01:54] 🧩 X-Part: high fidelity and structure coherent shape decomposition(X-Part:高保真且结构一致的三维形状分解) [02:33] 🖼 InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis(InfGen:分辨率无关的可扩展图像合成新范式) [03:04] 🔍 HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented Generation for Multi-hop Question Answering(HANRAG:面向多跳问答的启发式精准抗噪检索增强生成方法) [03:50] 🎙 VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions(VStyle:基于语音指令的语音风格自适应基准) [04:44] 🌸 FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies(FLOWER:以高效视觉-语言-动作流策略普及通用机器人策略) [05:20] 🎨 Inpainting-Guided Policy Optimization for Diffusion Large Language Models(面向扩散大语言模型的基于文本补全引导的策略优化方法) [05:58] 🤖 Virtual Agent Economies(虚拟代理经济) [06:28] 📈 QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading(QuantAgent:面向高频交易的价格驱动多智能体大语言模型框架) [07:02] 🧪 MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools(MCP-AgentBench:基于MCP中介工具的通用语言智能体真实性能评测) [07:41] 🎨 Color Me Correctly: Bridging Perceptual Color Spaces and Text Embeddings for Improved Diffusion Generation(精准上色:连接感知色彩空间与文本嵌入以提升扩散生成质量) [08:31] 🦎 LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning in Open-World Scenarios(LoFT:面向开放世界长尾场景的参数高效半监督微调方法) [09:13] 🗞 CMHG: A Dataset and Benchmark for Headline Generation of Minority Languages in China(CMHG:中国少数民族语言新闻标题生成数据集与评测基准) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    10 min
  6. 12 WRZ

    2025.09.12 | HuMo多模态控人视频;SimpleVLA-RL强化升效

    本期的 15 篇论文如下: [00:27] 🎭 HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning(HuMo:通过协同多模态条件控制实现以人为中心的视频生成) [01:18] 🤖 SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning(SimpleVLA-RL:通过强化学习实现VLA训练规模化) [02:02] 🗣 EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs(EchoX:基于回声训练弥合声学-语义鸿沟的语音大模型研究) [02:37] 🎭 Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis(Kling-Avatar:面向级联长时化身动画合成的多模态指令语义落地方法) [03:11] 🧭 Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents(驾驭不确定性:面向长周期LLM智能体的熵调制策略梯度方法) [03:57] 🎨 FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark(FLUX-Reason-6M和PRISM-Bench:百万级文生图推理数据集与全面评测基准) [04:34] 🤖 VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model(VLA-Adapter:面向小型视觉-语言-动作模型的有效范式) [05:14] 🔄 Can Understanding and Generation Truly Benefit Together -- or Just Coexist?(理解与生成真能互惠共进,抑或仅共存?) [05:46] 📹 SpatialVID: A Large-Scale Video Dataset with Spatial Annotations(SpatialVID大规模带空间标注的视频数据集) [06:16] 📊 Visual Programmability: A Guide for Code-as-Thought in Chart Understanding(视觉可编程性:面向图表理解的Code-as-Thought指南) [06:55] 🕵 Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval(梯度-注意力引导的双重掩码协同框架用于鲁棒的基于文本的人物检索) [07:35] 🖼 2D Gaussian Splatting with Semantic Alignment for Image Inpainting(面向图像修复的语义对齐2D高斯泼溅) [08:10] 📏 LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering(LoCoBench:面向复杂软件工程的长上下文大模型基准测试) [08:45] 🤖 OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning(OmniEVA:面向具身任务的自适应3D感知与本体约束联合规划器) [09:31] 🎯 The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward(散度选择:缓解可验证奖励强化学习多样性坍缩的关键) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min

O programie

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

Możesz również polubić