HuggingFace 每日AI论文速递

duan

5.0 (2)
TECHNOLOGY
UPDATED DAILY

每天10分钟，带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新，欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版，可在小红书搜索并关注【AI速递】

3H AGO

2025.06.27 | 强化学习提升搜索效率；记忆增强生成逼真驾驶场景。

本期的 15 篇论文如下： [00:25] 🔍 MMSearch-R1: Incentivizing LMMs to Search（MMSearch-R1：激励大型多模态模型进行搜索） [00:59] 🚗 MADrive: Memory-Augmented Driving Scene Modeling（MADrive：基于记忆增强的驾驶场景建模） [01:43] 🤖 WorldVLA: Towards Autoregressive Action World Model（WorldVLA：面向自回归动作世界模型） [02:23] 💡 Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test（大型语言模型预训练中Grokking现象 কোথায়? 无需测试，监测从记忆到泛化的过程） [03:14] 🤖 Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge（Mind2Web 2：使用Agent-as-a-Judge评估自主搜索） [04:00] 🚗 SAM4D: Segment Anything in Camera and LiDAR Streams（SAM4D：相机和激光雷达流中的可分割一切） [04:40] 🎨 FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing（FaSTA$^*$: 快速-慢速工具路径智能体，通过子程序挖掘实现高效的多轮图像编辑） [05:16] 🤖 Whole-Body Conditioned Egocentric Video Prediction（全身条件下的自我中心视频预测） [05:53] 🧠 Arch-Router: Aligning LLM Routing with Human Preferences（Arch-Router：将LLM路由与人类偏好对齐） [06:35] 🎨 FairyGen: Storied Cartoon Video from a Single Child-Drawn Character（FairyGen：从单张儿童绘画生成故事驱动的卡通视频） [07:12] 🌐 DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster（DiLoCoX：一种用于去中心化集群的低通信大规模训练框架） [07:55] 🧬 An Agentic System for Rare Disease Diagnosis with Traceable Reasoning（基于Agent的罕见病诊断系统，具有可追溯的推理能力） [08:35] 🤖 HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges（HeurAgenix：利用大型语言模型解决复杂组合优化难题） [09:18] 🦘 Learning to Skip the Middle Layers of Transformers（学习跳过Transformer的中间层） [09:57] 🎵 MuseControlLite: Multifunctional Music Generation with Lightweight Conditioners（MuseControlLite：基于轻量级调节器的多功能音乐生成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
1D AGO

2025.06.26 | 高质量多模态模型；4比特量化提升性能

本期的 14 篇论文如下： [00:23] 🖼 ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation（ShareGPT-4o-Image：通过GPT-4o级别的图像生成能力对齐多模态模型） [01:05] 🛡 Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models（面向稳健4比特量化的异常值安全预训练大语言模型） [01:49] 🎨 Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models（逆向与编辑：基于循环一致性模型的高效快速图像编辑） [02:30] 🧠 OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling（OctoThinker：中期训练激励强化学习扩展） [03:13] 🤖 DualTHOR: A Dual-Arm Humanoid Simulation Platform for Contingency-Aware Planning（DualTHOR：一个用于情境感知规划的双臂人形机器人仿真平台） [03:49] 🦾 RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation（RoboTwin 2.0：一种可扩展的数据生成器和基准，具有强大的领域随机化，用于鲁棒的双臂机器人操作） [04:33] 🧪 Use Property-Based Testing to Bridge LLM Code Generation and Validation（利用基于属性的测试弥合LLM代码生成与验证之间的差距） [05:18] 🌍 When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs（当生活给你样本时：扩展多语言LLM的推理计算的益处） [05:56] 🖼 HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling（HiWave：基于小波变换扩散采样的免训练高分辨率图像生成） [06:39] 🤖 ReCode: Updating Code API Knowledge with Reinforcement Learning（ReCode：利用强化学习更新代码API知识） [07:15] 💬 Is There a Case for Conversation Optimized Tokenizers in Large Language Models?（大型语言模型中，面向对话优化的分词器是否有意义？） [07:59] 🔬 Biomed-Enriched: A Biomedical Dataset Enriched with LLMs for Pretraining and Extracting Rare and Hidden Content（Biomed-Enriched：一个利用大型语言模型富集的生物医学数据集，用于预训练和提取稀有及隐藏内容） [08:47] 🤖 MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications（MATE：基于LLM的多智能体翻译环境，用于辅助应用） [09:28] 📉 The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs（调试衰减指数：重新思考代码大语言模型的调试策略）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
2D AGO

2025.06.25 | AnimaX提升3D非生物体动画效果；Matrix-Game优化游戏世界模型。

本期的 15 篇论文如下： [00:25] 🤖 AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models（AnimaX：利用联合视频-姿态扩散模型为3D非生物体赋予动画效果） [01:11] 🎮 Matrix-Game: Interactive World Foundation Model（矩阵游戏：交互式世界基础模型） [01:50] 🧠 GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning（GRPO-CARE：一致性感知的多模态推理强化学习） [02:33] 💡 Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs（Skywork-SWE：揭示LLM在软件工程领域的数据扩展法则） [03:18] 🖼 ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing（ScaleCap：通过双模态去偏实现推理时可扩展的图像描述） [03:58] 🤔 Can Large Language Models Capture Human Annotator Disagreements?（大型语言模型能否捕捉人类标注者的分歧？） [04:49] 🛠 SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications（SWE-SQL：揭示大型语言模型在解决真实应用中用户SQL问题上的途径） [05:37] 🎨 JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent（JarvisArt：通过智能照片修饰代理释放人类艺术创造力） [06:21] 🧠 SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning（SRFT：一种用于推理的监督和强化微调的单阶段方法） [07:04] 🎬 SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution（SimpleGVR：一种用于潜在级联视频超分辨率的简单基线） [07:41] 🖼 Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low CFG Scales（频域指导助力低CFG规模下的高保真采样） [08:22] 🤖 Unified Vision-Language-Action Model（统一的视觉-语言-动作模型） [08:59] 🤔 Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study（为什么开源LLM在数据分析中表现不佳？一项系统的实证研究） [09:33] 🗣 Lost in the Mix: Evaluating LLM Understanding of Code-Switched Text（迷失在混合中：评估大型语言模型对语码转换文本的理解） [10:08] 🔊 USAD: Universal Speech and Audio Representation via Distillation（USAD：通过知识蒸馏实现的通用语音和音频表征）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
3D AGO

2025.06.24 | 法线光照新方法提升细节；多模态生成模型表现优异。

本期的 15 篇论文如下： [00:24] 💡 Light of Normals: Unified Feature Representation for Universal Photometric Stereo（法线光照：用于通用光度立体的统一特征表示） [01:00] 🎨 OmniGen2: Exploration to Advanced Multimodal Generation（OmniGen2：迈向更高级的多模态生成探索） [01:39] ✍ LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning（LongWriter-Zero：通过强化学习掌握超长文本生成） [02:17] 🎭 Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset（幻影数据：面向通用主题一致性视频生成数据集） [02:58] 🧠 RLPR: Extrapolating RLVR to General Domains without Verifiers（RLPR：将RLVR推广到无验证器的一般领域） [03:36] 🧠 ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs（ReasonFlux-PRM：LLM中用于长链思维推理的轨迹感知PRM） [04:11] 🤖 OAgents: An Empirical Study of Building Effective Agents（OAgents：构建有效智能体的实证研究） [04:52] 🖼 Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations（视觉即方言：通过文本对齐表征统一视觉理解与生成） [05:31] 🎬 VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory（VMem：基于Surfel索引视图记忆的交互式一致视频场景生成） [06:06] 🧑 LettinGo: Explore User Profile Generation for Recommendation System（LettinGo：探索用于推荐系统的用户画像生成） [06:48] 🔀 ReDit: Reward Dithering for Improved LLM Policy Optimization（ReDit：通过奖励抖动改进LLM策略优化） [07:29] 💡 FinCoT: Grounding Chain-of-Thought in Expert Financial Reasoning（FinCoT：将思维链扎根于专家金融推理） [08:08] 🎬 ViDAR: Video Diffusion-Aware 4D Reconstruction From Monocular Inputs（ViDAR：基于视频扩散的单目输入四维重建） [08:47] 🖼 Auto-Regressively Generating Multi-View Consistent Images（自回归生成多视角一致性图像） [09:35] 💡 SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation（SlimMoE：通过专家精简和知识蒸馏实现大型MoE模型的结构化压缩）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
4D AGO

2025.06.23 | DnD降低计算开销；视觉引导提升RAG性能。

本期的 12 篇论文如下： [00:23] 🧲 Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights（拖拽式大语言模型：零样本提示到权重） [01:04] 🖼 Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding（视觉引导分块：增强RAG的多模态文档理解方案） [01:49] 🔀 PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models（PAROAttention：视觉生成模型中高效稀疏和量化注意力的模式感知重排序） [02:30] 🤖 VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning（VIKI-R：通过强化学习协调具身多智能体合作） [03:08] 🎮 Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition（Hunyuan-GameCraft：基于混合历史条件的高动态交互式游戏视频生成） [03:48] 🖼 DreamCube: 3D Panorama Generation via Multi-plane Synchronization（DreamCube：基于多平面同步的3D全景图生成） [04:26] 🖼 Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details（Hunyuan3D 2.5：迈向具有极致细节的高保真3D资产生成） [05:06] 💽 InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding（InfiniPot-V：面向流视频理解的内存约束KV缓存压缩） [05:48] 🖼 Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material（Hunyuan3D 2.1：从图像到具有生产级PBR材质的高保真3D资产） [06:36] 🧠 UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation（UniFork：探索模态对齐以实现统一的多模态理解与生成） [07:16] ⚖ Reranking-based Generation for Unbiased Perspective Summarization（基于重排序生成方法的无偏视角摘要） [07:52] 🚗 Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation（基于交错自回归运动和场景生成的长期交通仿真）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

9 min
6D AGO

【周末特辑】6月第4周最火AI论文 | 高效扩展推理能力；多模态金融评估基准。

本期的 5 篇论文如下： [00:36] TOP1(🔥216) | 💡 MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention（MiniMax-M1：利用闪电注意力高效扩展测试时计算） [02:44] TOP2(🔥82) | 📊 MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation（MultiFinBen：一个多语言、多模态和难度感知的金融领域大语言模型评估基准） [05:32] TOP3(🔥64) | 🔬 Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning（科学家的首次考试：通过感知、理解和推理来探索多模态大型语言模型的认知能力） [07:53] TOP4(🔥53) | 🧐 DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents（DeepResearch Bench：一个面向深度研究Agent的综合性评测基准） [09:39] TOP5(🔥52) | 🤖 Scaling Test-time Compute for LLM Agents（扩展LLM Agent的测试时计算）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12 min
JUN 20

2025.06.20 | 强化学习提升跨领域推理；语音情感检测基准精细化。

本期的 4 篇论文如下： [00:24] 🧠 Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective（跨领域视角下重新审视强化学习在大型语言模型推理中的应用） [01:00] 🗣 EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection（EmoNet-Voice：一个用于语音情感检测的精细化、专家验证的基准） [01:55] 🎵 SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning（SonicVerse：基于音乐特征信息的多任务音乐描述生成） [02:36] 📊 Improved Iterative Refinement for Chart-to-Code Generation via Structured Instruction（基于结构化指令的图表到代码生成改进迭代优化方法）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

4 min
JUN 19

2025.06.19 | SEKAI数据集提升视频生成；原型推理增强LLM泛化能力。

本期的 15 篇论文如下： [00:22] 🌍 Sekai: A Video Dataset towards World Exploration（Sekai：一个面向世界探索的视频数据集） [01:02] 💡 ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs（原型推理：作为大型语言模型中通用推理基础的原型） [01:43] 💡 GenRecal: Generation after Recalibration from Large to Small Vision-Language Models（GenRecal：从大型到小型视觉-语言模型的重校准后生成） [02:24] 🗣 BUT System for the MLC-SLM Challenge（用于MLC-SLM挑战赛的BUT系统） [03:10] 🤖 Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence（具身Web智能体：连接物理与数字领域，实现集成智能） [03:57] 💡 Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation（自由形式生成中基于语义感知的开放式R1训练奖励） [04:43] 🔬 SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification（SciVer：评估多模态科学声明验证中的基础模型） [05:26] 🚀 Truncated Proximal Policy Optimization（截断近端策略优化） [06:04] 🖼 PictSure: Pretraining Embeddings Matters for In-Context Learning Image Classifiers（PictSure：预训练嵌入对上下文学习图像分类器的影响） [06:37] 🖼 CoMemo: LVLMs Need Image Context with Image Memory（CoMemo：LVLM需要带有图像记忆的图像上下文） [07:21] 🤖 SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence（群体智能代理：迈向基于群体智能的全自动代理系统生成） [08:01] 🧠 MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models（MoTE：面向内存高效的大型多模态模型的三元专家混合） [08:45] 🛡 OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents（OS-Harm：衡量计算机使用Agent安全性的基准） [09:34] 🏞 ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies（ImmerseGen：基于代理引导的、使用Alpha纹理代理的沉浸式世界生成） [10:09] 🤝 FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models（FedNano：面向预训练多模态大语言模型的轻量级联邦调优）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min

See All (313)

out of 5

2 Ratings

支持！！

Feb 16

Fergie.W

希望能一直做下去

Creator

duan
Years Active

2024 - 2025
Episodes

313
Rating

Clean
Show Website

HuggingFace 每日AI论文速递

Technology

Technology

Updated Weekly
Technology

Technology

Updated Biweekly
Technology

Technology

Updated Weekly
Business

Business

Updated Daily
Society & Culture

Society & Culture

Updated Weekly
Business News

Business News

Updated Daily
Investing

Investing

Updated Weekly

HuggingFace 每日AI论文速递

2025.06.27 | 强化学习提升搜索效率；记忆增强生成逼真驾驶场景。

2025.06.26 | 高质量多模态模型；4比特量化提升性能

2025.06.25 | AnimaX提升3D非生物体动画效果；Matrix-Game优化游戏世界模型。

2025.06.24 | 法线光照新方法提升细节；多模态生成模型表现优异。

2025.06.23 | DnD降低计算开销；视觉引导提升RAG性能。

【周末特辑】6月第4周最火AI论文 | 高效扩展推理能力；多模态金融评估基准。

2025.06.20 | 强化学习提升跨领域推理；语音情感检测基准精细化。

2025.06.19 | SEKAI数据集提升视频生成；原型推理增强LLM泛化能力。

Ratings & Reviews

支持！！

About

Information

You Might Also Like

HuggingFace 每日AI论文速递

Episodes

2025.06.27 | 强化学习提升搜索效率；记忆增强生成逼真驾驶场景。

2025.06.26 | 高质量多模态模型；4比特量化提升性能

2025.06.25 | AnimaX提升3D非生物体动画效果；Matrix-Game优化游戏世界模型。

2025.06.24 | 法线光照新方法提升细节；多模态生成模型表现优异。

2025.06.23 | DnD降低计算开销；视觉引导提升RAG性能。

【周末特辑】6月第4周最火AI论文 | 高效扩展推理能力；多模态金融评估基准。

2025.06.20 | 强化学习提升跨领域推理；语音情感检测基准精细化。

2025.06.19 | SEKAI数据集提升视频生成；原型推理增强LLM泛化能力。

Ratings & Reviews

About

Information

You Might Also Like