HuggingFace 每日AI论文速递

duan
HuggingFace 每日AI论文速递

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. 3H AGO

    2025.06.27 | 强化学习提升搜索效率;记忆增强生成逼真驾驶场景。

    本期的 15 篇论文如下: [00:25] 🔍 MMSearch-R1: Incentivizing LMMs to Search(MMSearch-R1:激励大型多模态模型进行搜索) [00:59] 🚗 MADrive: Memory-Augmented Driving Scene Modeling(MADrive:基于记忆增强的驾驶场景建模) [01:43] 🤖 WorldVLA: Towards Autoregressive Action World Model(WorldVLA:面向自回归动作世界模型) [02:23] 💡 Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test(大型语言模型预训练中Grokking现象 কোথায়? 无需测试,监测从记忆到泛化的过程) [03:14] 🤖 Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge(Mind2Web 2:使用Agent-as-a-Judge评估自主搜索) [04:00] 🚗 SAM4D: Segment Anything in Camera and LiDAR Streams(SAM4D:相机和激光雷达流中的可分割一切) [04:40] 🎨 FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing(FaSTA$^*$: 快速-慢速工具路径智能体,通过子程序挖掘实现高效的多轮图像编辑) [05:16] 🤖 Whole-Body Conditioned Egocentric Video Prediction(全身条件下的自我中心视频预测) [05:53] 🧠 Arch-Router: Aligning LLM Routing with Human Preferences(Arch-Router:将LLM路由与人类偏好对齐) [06:35] 🎨 FairyGen: Storied Cartoon Video from a Single Child-Drawn Character(FairyGen:从单张儿童绘画生成故事驱动的卡通视频) [07:12] 🌐 DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster(DiLoCoX:一种用于去中心化集群的低通信大规模训练框架) [07:55] 🧬 An Agentic System for Rare Disease Diagnosis with Traceable Reasoning(基于Agent的罕见病诊断系统,具有可追溯的推理能力) [08:35] 🤖 HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges(HeurAgenix:利用大型语言模型解决复杂组合优化难题) [09:18] 🦘 Learning to Skip the Middle Layers of Transformers(学习跳过Transformer的中间层) [09:57] 🎵 MuseControlLite: Multifunctional Music Generation with Lightweight Conditioners(MuseControlLite:基于轻量级调节器的多功能音乐生成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  2. 1D AGO

    2025.06.26 | 高质量多模态模型;4比特量化提升性能

    本期的 14 篇论文如下: [00:23] 🖼 ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation(ShareGPT-4o-Image:通过GPT-4o级别的图像生成能力对齐多模态模型) [01:05] 🛡 Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models(面向稳健4比特量化的异常值安全预训练大语言模型) [01:49] 🎨 Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models(逆向与编辑:基于循环一致性模型的高效快速图像编辑) [02:30] 🧠 OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling(OctoThinker:中期训练激励强化学习扩展) [03:13] 🤖 DualTHOR: A Dual-Arm Humanoid Simulation Platform for Contingency-Aware Planning(DualTHOR:一个用于情境感知规划的双臂人形机器人仿真平台) [03:49] 🦾 RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation(RoboTwin 2.0:一种可扩展的数据生成器和基准,具有强大的领域随机化,用于鲁棒的双臂机器人操作) [04:33] 🧪 Use Property-Based Testing to Bridge LLM Code Generation and Validation(利用基于属性的测试弥合LLM代码生成与验证之间的差距) [05:18] 🌍 When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs(当生活给你样本时:扩展多语言LLM的推理计算的益处) [05:56] 🖼 HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling(HiWave:基于小波变换扩散采样的免训练高分辨率图像生成) [06:39] 🤖 ReCode: Updating Code API Knowledge with Reinforcement Learning(ReCode:利用强化学习更新代码API知识) [07:15] 💬 Is There a Case for Conversation Optimized Tokenizers in Large Language Models?(大型语言模型中,面向对话优化的分词器是否有意义?) [07:59] 🔬 Biomed-Enriched: A Biomedical Dataset Enriched with LLMs for Pretraining and Extracting Rare and Hidden Content(Biomed-Enriched:一个利用大型语言模型富集的生物医学数据集,用于预训练和提取稀有及隐藏内容) [08:47] 🤖 MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications(MATE:基于LLM的多智能体翻译环境,用于辅助应用) [09:28] 📉 The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs(调试衰减指数:重新思考代码大语言模型的调试策略) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  3. 2D AGO

    2025.06.25 | AnimaX提升3D非生物体动画效果;Matrix-Game优化游戏世界模型。

    本期的 15 篇论文如下: [00:25] 🤖 AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models(AnimaX:利用联合视频-姿态扩散模型为3D非生物体赋予动画效果) [01:11] 🎮 Matrix-Game: Interactive World Foundation Model(矩阵游戏:交互式世界基础模型) [01:50] 🧠 GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning(GRPO-CARE:一致性感知的多模态推理强化学习) [02:33] 💡 Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs(Skywork-SWE:揭示LLM在软件工程领域的数据扩展法则) [03:18] 🖼 ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing(ScaleCap:通过双模态去偏实现推理时可扩展的图像描述) [03:58] 🤔 Can Large Language Models Capture Human Annotator Disagreements?(大型语言模型能否捕捉人类标注者的分歧?) [04:49] 🛠 SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications(SWE-SQL:揭示大型语言模型在解决真实应用中用户SQL问题上的途径) [05:37] 🎨 JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent(JarvisArt:通过智能照片修饰代理释放人类艺术创造力) [06:21] 🧠 SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning(SRFT:一种用于推理的监督和强化微调的单阶段方法) [07:04] 🎬 SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution(SimpleGVR:一种用于潜在级联视频超分辨率的简单基线) [07:41] 🖼 Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low CFG Scales(频域指导助力低CFG规模下的高保真采样) [08:22] 🤖 Unified Vision-Language-Action Model(统一的视觉-语言-动作模型) [08:59] 🤔 Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study(为什么开源LLM在数据分析中表现不佳?一项系统的实证研究) [09:33] 🗣 Lost in the Mix: Evaluating LLM Understanding of Code-Switched Text(迷失在混合中:评估大型语言模型对语码转换文本的理解) [10:08] 🔊 USAD: Universal Speech and Audio Representation via Distillation(USAD:通过知识蒸馏实现的通用语音和音频表征) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  4. 3D AGO

    2025.06.24 | 法线光照新方法提升细节;多模态生成模型表现优异。

    本期的 15 篇论文如下: [00:24] 💡 Light of Normals: Unified Feature Representation for Universal Photometric Stereo(法线光照:用于通用光度立体的统一特征表示) [01:00] 🎨 OmniGen2: Exploration to Advanced Multimodal Generation(OmniGen2:迈向更高级的多模态生成探索) [01:39] ✍ LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning(LongWriter-Zero:通过强化学习掌握超长文本生成) [02:17] 🎭 Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset(幻影数据:面向通用主题一致性视频生成数据集) [02:58] 🧠 RLPR: Extrapolating RLVR to General Domains without Verifiers(RLPR:将RLVR推广到无验证器的一般领域) [03:36] 🧠 ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs(ReasonFlux-PRM:LLM中用于长链思维推理的轨迹感知PRM) [04:11] 🤖 OAgents: An Empirical Study of Building Effective Agents(OAgents:构建有效智能体的实证研究) [04:52] 🖼 Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations(视觉即方言:通过文本对齐表征统一视觉理解与生成) [05:31] 🎬 VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory(VMem:基于Surfel索引视图记忆的交互式一致视频场景生成) [06:06] 🧑 LettinGo: Explore User Profile Generation for Recommendation System(LettinGo:探索用于推荐系统的用户画像生成) [06:48] 🔀 ReDit: Reward Dithering for Improved LLM Policy Optimization(ReDit:通过奖励抖动改进LLM策略优化) [07:29] 💡 FinCoT: Grounding Chain-of-Thought in Expert Financial Reasoning(FinCoT:将思维链扎根于专家金融推理) [08:08] 🎬 ViDAR: Video Diffusion-Aware 4D Reconstruction From Monocular Inputs(ViDAR:基于视频扩散的单目输入四维重建) [08:47] 🖼 Auto-Regressively Generating Multi-View Consistent Images(自回归生成多视角一致性图像) [09:35] 💡 SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation(SlimMoE:通过专家精简和知识蒸馏实现大型MoE模型的结构化压缩) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  5. 4D AGO

    2025.06.23 | DnD降低计算开销;视觉引导提升RAG性能。

    本期的 12 篇论文如下: [00:23] 🧲 Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights(拖拽式大语言模型:零样本提示到权重) [01:04] 🖼 Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding(视觉引导分块:增强RAG的多模态文档理解方案) [01:49] 🔀 PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models(PAROAttention:视觉生成模型中高效稀疏和量化注意力的模式感知重排序) [02:30] 🤖 VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning(VIKI-R:通过强化学习协调具身多智能体合作) [03:08] 🎮 Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition(Hunyuan-GameCraft:基于混合历史条件的高动态交互式游戏视频生成) [03:48] 🖼 DreamCube: 3D Panorama Generation via Multi-plane Synchronization(DreamCube:基于多平面同步的3D全景图生成) [04:26] 🖼 Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details(Hunyuan3D 2.5:迈向具有极致细节的高保真3D资产生成) [05:06] 💽 InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding(InfiniPot-V:面向流视频理解的内存约束KV缓存压缩) [05:48] 🖼 Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material(Hunyuan3D 2.1:从图像到具有生产级PBR材质的高保真3D资产) [06:36] 🧠 UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation(UniFork:探索模态对齐以实现统一的多模态理解与生成) [07:16] ⚖ Reranking-based Generation for Unbiased Perspective Summarization(基于重排序生成方法的无偏视角摘要) [07:52] 🚗 Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation(基于交错自回归运动和场景生成的长期交通仿真) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    9 min
  6. JUN 19

    2025.06.19 | SEKAI数据集提升视频生成;原型推理增强LLM泛化能力。

    本期的 15 篇论文如下: [00:22] 🌍 Sekai: A Video Dataset towards World Exploration(Sekai:一个面向世界探索的视频数据集) [01:02] 💡 ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs(原型推理:作为大型语言模型中通用推理基础的原型) [01:43] 💡 GenRecal: Generation after Recalibration from Large to Small Vision-Language Models(GenRecal:从大型到小型视觉-语言模型的重校准后生成) [02:24] 🗣 BUT System for the MLC-SLM Challenge(用于MLC-SLM挑战赛的BUT系统) [03:10] 🤖 Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence(具身Web智能体:连接物理与数字领域,实现集成智能) [03:57] 💡 Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation(自由形式生成中基于语义感知的开放式R1训练奖励) [04:43] 🔬 SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification(SciVer:评估多模态科学声明验证中的基础模型) [05:26] 🚀 Truncated Proximal Policy Optimization(截断近端策略优化) [06:04] 🖼 PictSure: Pretraining Embeddings Matters for In-Context Learning Image Classifiers(PictSure:预训练嵌入对上下文学习图像分类器的影响) [06:37] 🖼 CoMemo: LVLMs Need Image Context with Image Memory(CoMemo:LVLM需要带有图像记忆的图像上下文) [07:21] 🤖 SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence(群体智能代理:迈向基于群体智能的全自动代理系统生成) [08:01] 🧠 MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models(MoTE:面向内存高效的大型多模态模型的三元专家混合) [08:45] 🛡 OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents(OS-Harm:衡量计算机使用Agent安全性的基准) [09:34] 🏞 ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies(ImmerseGen:基于代理引导的、使用Alpha纹理代理的沉浸式世界生成) [10:09] 🤝 FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models(FedNano:面向预训练多模态大语言模型的轻量级联邦调优) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min

Ratings & Reviews

5
out of 5
2 Ratings

About

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

You Might Also Like

To listen to explicit episodes, sign in.

Stay up to date with this show

Sign in or sign up to follow shows, save episodes, and get the latest updates.

Select a country or region

Africa, Middle East, and India

Asia Pacific

Europe

Latin America and the Caribbean

The United States and Canada