HuggingFace 每日AI论文速递

duan

5.0 (1)
TECHNOLOGY
UPDATED DAILY

每天10分钟，带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新，欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版，可在小红书搜索并关注【AI速递】

13 HR AGO

2025.06.18 | MultiFinBen揭示金融模型局限；测试时计算提升LLM Agent性能。

本期的 15 篇论文如下： [00:23] 📊 MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation（MultiFinBen：一个多语言、多模态和难度感知的金融领域大语言模型评估基准） [01:03] 🤖 Scaling Test-time Compute for LLM Agents（扩展LLM Agent的测试时计算） [01:38] 🎼 CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following（CMI-Bench：一个评估音乐指令跟随的综合性基准） [02:16] 💬 LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs（LongLLaDA：解锁扩散语言模型中的长文本能力） [02:57] 🤔 Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs（基于可验证奖励的强化学习隐式地激励基础大语言模型中的正确推理） [03:40] 🧠 Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team（Xolver: 像奥林匹克团队一样利用整体经验进行多智能体推理） [04:20] 🗣 Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model（Stream-Omni：与大型语言-视觉-语音模型的同时多模态交互） [05:02] ⚕ Efficient Medical VIE via Reinforcement Learning（基于强化学习的高效医学视觉信息抽取） [05:40] 🤔 Reasoning with Exploration: An Entropy Perspective（基于探索的推理：一个熵的视角） [06:18] 🧠 QFFT, Question-Free Fine-Tuning for Adaptive Reasoning（QFFT：用于自适应推理的无问题微调） [06:52] 🎨 Align Your Flow: Scaling Continuous-Time Flow Map Distillation（对齐你的流：扩展连续时间流映射蒸馏） [07:27] 🧪 Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure（大语言模型能否为算法问题生成高质量测试用例？TestCase-Eval：容错覆盖和暴露的系统性评估） [08:07] 🤖 Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees（有保证的猜测：一种基于语言建模的CISC到RISC代码转换方法，并提供测试保证） [08:58] 🛠 CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios（CRITICTOOL：评估大型语言模型在工具调用错误场景中的自我批判能力） [09:38] 📊 xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations（xbench：通过与职业对齐的真实世界评估追踪Agent的生产力提升）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
1 DAY AGO

2025.06.17 | MiniMax-M1提升推理性能；多模态模型认知测试创新。

本期的 15 篇论文如下： [00:22] 💡 MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention（MiniMax-M1：利用闪电注意力高效扩展测试时计算） [01:00] 🔬 Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning（科学家的首次考试：通过感知、理解和推理来探索多模态大型语言模型的认知能力） [01:47] 🧐 DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents（DeepResearch Bench：一个面向深度研究Agent的综合性评测基准） [02:28] 🧠 DoTA-RAG: Dynamic of Thought Aggregation RAG（思想动态聚合RAG：一种用于大规模网络知识索引的检索增强生成系统） [03:08] 🧠 Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning（Ego-R1：用于超长第一视角视频推理的工具链式思考） [03:52] 💡 Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency（等等，我们不需要“等等”！移除思考Token提升推理效率） [04:28] 🤖 TaskCraft: Automated Generation of Agentic Tasks（任务工坊：自动化生成自主Agent任务） [05:04] 🤯 Discrete Diffusion in Large Language and Multimodal Models: A Survey（大型语言和多模态模型中的离散扩散：一项综述） [05:42] 🪞 Test3R: Learning to Reconstruct 3D at Test Time（Test3R：测试时学习三维重建） [06:25] 🖼 VGR: Visual Grounded Reasoning（VGR：视觉基础推理） [07:06] 🤖 PersonaFeedback: A Large-scale Human-annotated Benchmark For Personalization（PersonaFeedback：一个大规模的人工标注的个性化基准） [07:50] 🤖 From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding（从真实到合成：通过属性化基础生成数百万条多样化且复杂的用户指令） [08:32] 🤖 BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models（BridgeVLA: 基于输入-输出对齐的视觉-语言模型高效3D操作学习） [09:11] 🧠 Language Surgery in Multilingual Large Language Models（多语言大型语言模型中的语言手术） [09:44] 🤖 AI Agent Behavioral Science（人工智能体行为科学）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
2 DAYS AGO

2025.06.16 | 跨模态合成新视角图像；策略依从型智能体抗攻击

本期的 15 篇论文如下： [00:23] 🖼 Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation（基于跨模态注意力提炼的对齐新视角图像与几何体合成） [01:02] 🛡 Effective Red-Teaming of Policy-Adherent Agents（有效对抗策略依从型智能体） [01:39] 🔄 The Diffusion Duality（扩散二元性） [02:20] 🤖 LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?（LiveCodeBench Pro：奥林匹克竞赛奖牌获得者如何评价大型语言模型在算法竞赛中的表现？） [03:09] 🧠 pLSTM: parallelizable Linear Source Transition Mark networks（pLSTM：可并行化的线性源转移马尔可夫网络） [03:50] 🖼 A High-Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation（高质量的图文交错生成数据集与可靠评估） [04:36] 🧠 Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache（超越同质注意力：通过傅里叶近似KV缓存实现内存高效的LLM） [05:16] 🤖 SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending（SkillBlender: 面向通用人形机器人全身Loco-操作的技能融合） [06:00] 🧠 SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning（SwS：基于自感知弱点驱动的问题合成，用于提升大型语言模型在强化学习中的推理能力） [06:42] 🛡 Detecting Harmful Memes with Decoupled Understanding and Guided CoT Reasoning（利用解耦理解和引导式CoT推理检测有害模因） [07:17] 🎬 DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO（DeepVideo-R1：通过难度感知回归GRPO进行视频强化微调） [07:59] ⚙ Configurable Preference Tuning with Rubric-Guided Synthetic Data（基于规则引导合成数据的可配置偏好调整） [08:41] 👁 ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs（ViCrit：一种用于VLM中视觉感知的可验证强化学习代理任务） [09:29] 🔄 A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data（一种利用TTS合成数据增强ASR的自精炼框架） [10:16] 🔍 Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings（稠密检索器在简单查询上可能失效：揭示嵌入的粒度困境）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
4 DAYS AGO

【周末特辑】6月第3周最火AI论文 | 强化预训练提升语言模型推理能力；多语种分类器改善问答系统可信度。

本期的 5 篇论文如下： [00:43] TOP1(🔥199) | 🤖 Reinforcement Pre-Training（强化预训练） [03:06] TOP2(🔥124) | 🕰 Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA（明日依旧为真吗？多语种常青问题分类以提升可信赖的问答系统） [05:07] TOP3(🔥105) | 🧠 Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models（自信即全部：基于语言模型的小样本强化学习微调） [07:23] TOP4(🔥99) | 🩺 Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning（灵枢：用于统一多模态医学理解和推理的通用基础模型） [10:01] TOP5(🔥76) | 🩺 ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning（ReasonMed：一个用于推进医学推理的37万多智能体生成数据集）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13 min
5 DAYS AGO

2025.06.13 | 医学推理模型新范式；自动化构建软件工程数据集

本期的 15 篇论文如下： [00:22] 🩺 ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning（ReasonMed：一个用于推进医学推理的37万多智能体生成数据集） [01:12] 🏭 SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks（SWE-Factory：你的问题解决训练数据和评估基准自动化工厂） [01:55] 🖼 Text-Aware Image Restoration with Diffusion Models（基于扩散模型的文本感知图像修复） [02:36] 🎬 VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos（VRBench：长篇叙事视频中多步骤推理的基准测试） [03:22] 🎬 AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation（AniMaker：基于MCTS驱动的片段生成实现自动化多智能体动画故事叙述） [04:09] 🧮 Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training（Domain2Vec：向量化数据集以在无训练情况下找到最优数据混合） [04:52] 🎮 Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts（Optimus-3: 面向具有可扩展任务专家的通用多模态Minecraft智能体） [05:27] 🧠 Magistral（Magistral：Mistral 的首个推理模型） [06:07] 🤖 AutoMind: Adaptive Knowledgeable Agent for Automated Data Science（AutoMind：面向自动化数据科学的自适应知识型智能体） [06:53] 🎨 PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework（PosterCraft：重新思考统一框架下的高质量美学海报生成） [07:43] 🎬 VideoDeepResearch: Long Video Understanding With Agentic Tool Using（VideoDeepResearch：使用Agentic工具的长视频理解） [08:22] 🚫 ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark（ChineseHarm-Bench：一个中文有害内容检测的基准） [09:01] 🎨 CreatiPoster: Towards Editable and Controllable Multi-Layer Graphic Design Generation（CreatiPoster：面向可编辑和可控的多层图形设计生成） [09:48] 💡 Resa: Transparent Reasoning Models via SAEs（Resa：基于稀疏自编码器的透明推理模型） [10:30] 🤖 Ming-Omni: A Unified Multimodal Model for Perception and Generation（Ming-Omni：一个用于感知和生成的统一多模态模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12 min
6 DAYS AGO

2025.06.12 | 自信微调提升模型表现；视频生成模型高效优化。

本期的 13 篇论文如下： [00:23] 🧠 Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models（自信即全部：基于语言模型的小样本强化学习微调） [01:07] 🎬 Seedance 1.0: Exploring the Boundaries of Video Generation Models（Seedance 1.0：探索视频生成模型的边界） [01:50] 🥽 PlayerOne: Egocentric World Simulator（PlayerOne：以自我为中心的真实世界模拟器） [02:30] 🎬 Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation（用于实时交互视频生成的自回归对抗后训练） [03:15] 🤖 ComfyUI-R1: Exploring Reasoning Models for Workflow Generation（ComfyUI-R1：探索用于工作流生成的推理模型） [03:48] 🧠 SeerAttention-R: Sparse Attention Adaptation for Long Reasoning（SeerAttention-R：用于长程推理的稀疏注意力自适应） [04:25] 🧪 SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner（SWE-Flow：以测试驱动的方式合成软件工程数据） [05:10] 🎶 Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation（自回归 vs. 流匹配：文本到音乐生成建模范式的比较研究） [05:52] 🎭 InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions（InterActHuman：基于布局对齐音频条件的多概念人物动画） [06:34] 🤖 SAFE: Multitask Failure Detection for Vision-Language-Action Models（SAFE：视觉-语言-动作模型的多任务失败检测） [07:14] 🧠 Reparameterized LLM Training via Orthogonal Equivalence Transformation（基于正交等价变换的重参数化LLM训练） [07:56] 👁 MIRAGE: Multimodal foundation model and benchmark for comprehensive retinal OCT image analysis（MIRAGE：用于全面视网膜OCT图像分析的多模态基础模型与基准） [08:39] 🌱 Branched Schrödinger Bridge Matching（分支薛定谔桥匹配）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10 min
11 JUN

2025.06.11 | LLM存在地缘政治偏见；RuleReasoner提升推理效率。

本期的 15 篇论文如下： [00:22] 🌍 Geopolitical biases in LLMs: what are the "good" and the "bad" countries according to contemporary language models（LLM中的地缘政治偏见：在当代语言模型中，哪些是“好”国家，哪些是“坏”国家？） [01:09] 🤖 RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling（RuleReasoner：基于领域感知动态采样的强化规则推理） [01:48] 🖼 Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better（自回归语义视觉重建助力视觉-语言模型更好地理解） [02:30] 🎬 Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion（自激：弥合自回归视频扩散中的训练-测试差距） [03:08] 🧮 Solving Inequality Proofs with Large Language Models（利用大型语言模型求解不等式证明） [03:49] 🤖 Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation（三思而后行：用于GUI自动化中术前错误诊断的GUI-Critic-R1模型） [04:25] 🖼 Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models（帧引导：视频扩散模型中用于帧级别控制的免训练引导） [05:05] 🤖 Aligning Text, Images, and 3D Structure Token-by-Token（逐Token对齐文本、图像与3D结构） [05:51] 🔍 ECoRAG: Evidentiality-guided Compression for Long Context RAG（ECoRAG：证据性引导的长文本RAG压缩） [06:28] 🎬 DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval（DiscoVLA：面向参数高效视频-文本检索的视觉、语言和对齐差异缩减） [07:14] 🖼 Interpretable and Reliable Detection of AI-Generated Images via Grounded Reasoning in MLLMs（基于多模态大语言模型中具身推理的可解释、可靠的AI生成图像检测） [08:06] 🗜 Squeeze3D: Your 3D Generation Model is Secretly an Extreme Neural Compressor（Squeeze3D：你的3D生成模型实际上是一个极致的神经压缩器） [08:46] 🤖 Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction（思考与行动：通过扩展测试时交互进行推理的智能体） [09:21] 🧩 MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models（MoA：用于大语言模型参数高效微调的异构适配器混合） [09:58] 📚 Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability（机构书籍1.0：来自哈佛图书馆馆藏的2420亿token数据集，经过精确化处理，具有更高的准确性和可用性）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
10 JUN

2025.06.10 | 强化学习改进语言模型；医学多模态模型提升推理能力。

本期的 15 篇论文如下： [00:21] 🤖 Reinforcement Pre-Training（强化预训练） [01:01] 🩺 Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning（灵枢：用于统一多模态医学理解与推理的通用基础模型） [01:42] 📱 MiniCPM4: Ultra-Efficient LLMs on End Devices（MiniCPM4：终端设备上的超高效大型语言模型） [02:30] 🛡 Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance（Saffron-1：面向LLM安全保障的推理扩展范式） [03:07] 🖼 OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation（OneIG-Bench：用于图像生成的全方位细致评估） [03:49] 🏠 SpatialLM: Training Large Language Models for Structured Indoor Modeling（SpatialLM：用于结构化室内建模的大型语言模型训练） [04:35] 🤖 Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning（Astra：通过分层多模态学习迈向通用移动机器人） [05:14] 🖼 Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers（重新思考多模态扩散Transformer中的跨模态交互） [06:02] 🖼 Image Reconstruction as a Tool for Feature Analysis（图像重建作为特征分析的工具） [06:41] 🧪 GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition（GTR-CoT：用于分子结构识别的图遍历视觉链式思考） [07:22] 📉 Through the Valley: Path to Effective Long CoT Training for Small Language Models（穿越低谷：小语言模型有效长链思考训练之路） [08:04] 🤖 BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation（BitVLA：用于机器人操作的1-bit视觉-语言-动作模型） [08:42] 🧠 Pre-trained Large Language Models Learn Hidden Markov Models In-context（预训练大语言模型上下文学习隐马尔可夫模型） [09:25] 🤔 The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity（思考的幻觉：通过问题复杂性的视角理解推理模型的优势与局限性） [10:04] 🧠 CCI4.0: A Bilingual Pretraining Dataset for Enhancing Reasoning in Large Language Models（CCI4.0：用于增强大型语言模型推理能力的双语预训练数据集）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min

See All (305)

Creator

duan
Years Active

2024 - 2025
Episodes

305
Rating

Clean
Show Website

HuggingFace 每日AI论文速递

Technology

Technology

Updated weekly
Technology

Technology

Every two weeks
Technology

Technology

Updated weekly
Business

Business

Updated daily
Business News

Business News

Updated daily
Investing

Investing

Updated weekly
Documentary

Documentary

Updated weekly

HuggingFace 每日AI论文速递

2025.06.18 | MultiFinBen揭示金融模型局限；测试时计算提升LLM Agent性能。

2025.06.17 | MiniMax-M1提升推理性能；多模态模型认知测试创新。

2025.06.16 | 跨模态合成新视角图像；策略依从型智能体抗攻击

【周末特辑】6月第3周最火AI论文 | 强化预训练提升语言模型推理能力；多语种分类器改善问答系统可信度。

2025.06.13 | 医学推理模型新范式；自动化构建软件工程数据集

2025.06.12 | 自信微调提升模型表现；视频生成模型高效优化。

2025.06.11 | LLM存在地缘政治偏见；RuleReasoner提升推理效率。

2025.06.10 | 强化学习改进语言模型；医学多模态模型提升推理能力。

About

Information

You Might Also Like

HuggingFace 每日AI论文速递

Episodes

2025.06.18 | MultiFinBen揭示金融模型局限；测试时计算提升LLM Agent性能。

2025.06.17 | MiniMax-M1提升推理性能；多模态模型认知测试创新。

2025.06.16 | 跨模态合成新视角图像；策略依从型智能体抗攻击

【周末特辑】6月第3周最火AI论文 | 强化预训练提升语言模型推理能力；多语种分类器改善问答系统可信度。

2025.06.13 | 医学推理模型新范式；自动化构建软件工程数据集

2025.06.12 | 自信微调提升模型表现；视频生成模型高效优化。

2025.06.11 | LLM存在地缘政治偏见；RuleReasoner提升推理效率。

2025.06.10 | 强化学习改进语言模型；医学多模态模型提升推理能力。

About

Information

You Might Also Like