HuggingFace 每日AI论文速递

duan

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. 17 小時前

    【月末特辑】7月最火AI论文 | GSPO稳训练;序列级裁剪降方差;上下文工程综述,动态拼装信息流

    本期的 10 篇论文如下: [00:30] TOP1(🔥257) | 🚀 Group Sequence Policy Optimization(组序列策略优化) [02:21] TOP2(🔥227) | 🧮 A Survey of Context Engineering for Large Language Models(大型语言模型上下文工程综述) [03:33] TOP3(🔥207) | 🧠 GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning(GLM-4.1V-Thinking:基于可扩展强化学习的通用多模态推理) [05:02] TOP4(🔥151) | 🎬 Scaling RL to Long Videos(强化学习驱动视觉语言模型扩展至长视频) [06:57] TOP5(🔥144) | 🧠 MemOS: A Memory OS for AI System(MemOS:面向人工智能系统的内存操作系统) [08:47] TOP6(🔥126) | 🎬 Kwai Keye-VL Technical Report(Kwai Keye-VL 技术报告) [10:41] TOP7(🔥126) | 🎯 GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding(GUI-G$^2$: 基于高斯奖励模型的GUI定位) [12:38] TOP8(🔥121) | 🤖 Agentic Reinforced Policy Optimization(智能体强化策略优化) [14:21] TOP9(🔥120) | 🧮 MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization(MiroMind-M1:通过上下文感知多阶段策略优化实现数学推理的开源进展) [15:53] TOP10(🔥118) | ⚡ $\nabla$NABLA: Neighborhood Adaptive Block-Level Attention(邻域自适应块级注意力) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    18 分鐘
  2. 2 天前

    2025.08.01 | Seed-Prover融合LLM解决IMO数学题;Phi-Ground提升GUI感知精度。

    本期的 15 篇论文如下: [00:22] 🏆 Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving(Seed-Prover:自动化定理证明的深度与广度推理) [01:04] 🎯 Phi-Ground Tech Report: Advancing Perception in GUI Grounding(Phi-Ground 技术报告:提升 GUI 接地感知能力) [01:30] 🤔 C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations(C3:探索复杂对话挑战的双语口语对话模型基准) [02:07] 🚀 RecGPT Technical Report(RecGPT 技术报告) [02:36] 🤖 villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models(villa-X:增强视觉-语言-动作模型中的潜在动作建模) [03:14] 🤖 Scalable Multi-Task Reinforcement Learning for Generalizable Spatial Intelligence in Visuomotor Agents(可扩展的多任务强化学习,赋能视觉运动智能体可泛化空间智能) [04:07] ⚖ Persona Vectors: Monitoring and Controlling Character Traits in Language Models(人格向量:语言模型中性格特征的监测与控制) [04:41] 🚀 iLRM: An Iterative Large 3D Reconstruction Model(iLRM:迭代式大型3D重建模型) [05:32] ✅ TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs(TARS:多模态大语言模型幻觉抑制的最小最大词元自适应偏好策略) [06:02] 💡 On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective(Softmax注意力机制的表达能力:循环神经网络视角) [06:29] 🤝 NeRF Is a Valuable Assistant for 3D Gaussian Splatting(NeRF 是 3D Gaussian Splatting 的得力助手) [07:05] 🌾 AgroBench: Vision-Language Model Benchmark in Agriculture(AgroBench:农业视觉-语言模型基准) [07:36] 🎨 Beyond Linear Bottlenecks: Spline-Based Knowledge Distillation for Culturally Diverse Art Style Classification(超越线性瓶颈:基于样条的知识蒸馏用于文化多样性艺术风格分类) [08:15] 🔎 Enhanced Arabic Text Retrieval with Attentive Relevance Scoring(采用注意力相关性评分的增强型阿拉伯语文本检索) [08:45] 🌊 Flow Equivariant Recurrent Neural Networks(流等变循环神经网络) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    10 分鐘
  3. 3 天前

    2025.07.31 | ScreenCoder自动化UI转代码;Falcon-H1混合架构,提升长序列效率。

    本期的 9 篇论文如下: [00:22] 💻 ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents(ScreenCoder:模块化多模态智能体赋能前端视觉代码生成) [01:02] 🚀 Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance(Falcon-H1:重塑效率与性能的混合架构语言模型系列) [01:33] 💥 BANG: Dividing 3D Assets via Generative Exploded Dynamics(BANG:基于生成式爆炸动态的三维资产分解) [02:17] 🧠 VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning(VL-Cogito:面向高级多模态推理的渐进式课程强化学习) [02:51] 🚁 Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision(弱监督下航空影像车辆检测器在未知领域的适配) [03:34] 🧩 Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation(迈向指代性音视频分割中的全模态表达与推理) [04:04] 🚀 Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning(基于强化学习的大语言模型高效差分隐私微调) [04:56] 🛠 Repair-R1: Better Test Before Repair(Repair-R1:修复前先测试,效果更佳) [05:33] 🌍 MetaCLIP 2: A Worldwide Scaling Recipe(MetaCLIP 2:全球规模化训练方案) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    6 分鐘
  4. 4 天前

    2025.07.30 | 混元世界从文字像素生成沉浸3D世界;X-Omni用强化学习提升图像生成质量。

    本期的 8 篇论文如下: [00:23] 🌍 HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels(混元世界 1.0:从文字或像素生成沉浸式、可探索、可交互的3D世界) [00:56] ✨ X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again(X-Omni:强化学习让离散自回归图像生成模型再展辉煌) [01:59] 🚀 CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning(CUDA-L1:通过对比强化学习改进CUDA优化) [02:43] ✨ MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge(MaPPO:结合先验知识的最大后验偏好优化) [03:32] 🐾 AnimalClue: Recognizing Animals by their Traces(AnimalClue:通过痕迹识别动物) [04:04] 🏃 MOVE: Motion-Guided Few-Shot Video Object Segmentation(MOVE:运动引导的少样本视频目标分割) [04:31] 🤥 MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions(MoHoBench:通过无法回答的视觉问题评估多模态大语言模型的诚实性) [04:59] 🐘 Evaluating Deep Learning Models for African Wildlife Image Classification: From DenseNet to Vision Transformers(评估用于非洲野生动物图像分类的深度学习模型:从DenseNet到视觉Transformer) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    6 分鐘
  5. 5 天前

    2025.07.29 | ARPO提升LLM工具交互性能;ARC-Hunyuan-Video-7B深耕短视频理解。

    本期的 15 篇论文如下: [00:23] 🤖 Agentic Reinforced Policy Optimization(智能体强化策略优化) [00:55] 🧠 ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts(ARC-Hunyuan-Video-7B:真实世界短视频的结构化理解) [01:35] 🚀 Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning(Rep-MTL:释放表示层任务显著性在多任务学习中的力量) [02:03] 🌐 Reconstructing 4D Spatial Intelligence: A Survey(重建4D空间智能:一项综述) [02:55] 💡 SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment(SmallThinker:原生为本地部署而训练的高效大型语言模型家族) [03:35] 🚀 A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence(自进化智能体综述:通往人工超级智能之路) [04:17] ⚖ Geometric-Mean Policy Optimization(几何平均策略优化) [04:59] 🎯 Region-based Cluster Discrimination for Visual Representation Learning(面向视觉表征学习的区域聚类判别) [05:38] ✨ GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset(GPT-IMAGE-EDIT-1.5M:一个百万规模的GPT生成图像数据集) [06:18] 🚀 UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities(UloRL:一种提升大型语言模型推理能力的超长输出强化学习方法) [06:47] ⚡ Met$^2$Net: A Decoupled Two-Stage Spatio-Temporal Forecasting Model for Complex Meteorological Systems(Met$^2$Net:一种针对复杂气象系统的解耦两阶段时空预测模型) [07:18] ✨ ForCenNet: Foreground-Centric Network for Document Image Rectification(ForCenNet:面向前景的文档图像矫正网络) [07:52] 🎨 ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment(ScenePainter:基于概念关系对齐的语义一致永续三维场景生成) [08:43] 🏆 Music Arena: Live Evaluation for Text-to-Music(Music Arena:文本到音乐的实时评估) [09:13] 🎶 JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment(JAM:一个具有细粒度可控性和审美对齐的微型基于流的歌曲生成器) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 分鐘

評分與評論

5
(滿分 5 顆星)
2 則評分

簡介

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

你可能也會喜歡