HuggingFace 每日AI论文速递

duan

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. 2D AGO

    2025.04.18 | CLIMB提升领域模型表现;反蒸馏采样防止模型被盗用。

    本期的 15 篇论文如下: [00:23] 🗂 CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training(CLIMB:基于聚类的迭代数据混合引导预训练方法) [01:03] 🧪 Antidistillation Sampling(反蒸馏采样) [01:41] 🤝 A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis(小型LLM的策略协调框架在数据合成方面与大型LLM相媲美) [02:26] 🎬 Packing Input Frame Context in Next-Frame Prediction Models for Video Generation(视频生成中基于帧打包的下一帧预测模型) [03:02] 🤖 Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling(生成,但验证:通过回顾重采样减少视觉-语言模型中的幻觉) [03:43] 🧠 WORLDMEM: Long-term Consistent World Simulation with Memory(WORLDMEM:基于记忆的长期一致性世界模拟) [04:27] 🎬 VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models(VistaDPO:用于大型视频模型的分层时空直接偏好优化) [05:01] 🤖 NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation(NoisyRollout:利用数据增强强化视觉推理) [05:43] 🎨 DMM: Building a Versatile Image Generation Model via Distillation-Based Model Merging(DMM:构建基于蒸馏模型合并的通用图像生成模型) [06:20] 📊 ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering(ChartQAPro:一个更多样化和更具挑战性的图表问答基准) [07:07] 🤖 Exploring Expert Failures Improves LLM Agent Tuning(探索专家失败案例以提升LLM Agent的调优效果) [07:48] 🎨 InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework(InstantCharacter:使用可扩展的扩散Transformer框架个性化任何角色) [08:26] 📸 CCMNet: Leveraging Calibrated Color Correction Matrices for Cross-Camera Color Constancy(CCMNet:利用校准颜色校正矩阵实现跨相机色彩恒常性) [09:06] 🎬 FocusedAD: Character-centric Movie Audio Description(聚焦AD:以角色为中心的电影音频描述) [09:39] 🤔 Retrieval-Augmented Generation with Conflicting Evidence(检索增强生成与冲突证据) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  2. 3D AGO

    2025.04.17 | ColorBench测试VLM颜色理解;BitNet提升计算效率。

    本期的 11 篇论文如下: [00:27] 🎨 ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness(ColorBench:视觉语言模型能否看到并理解多彩世界?一个关于颜色感知、推理和鲁棒性的综合基准) [01:09] 💡 BitNet b1.58 2B4T Technical Report(BitNet b1.58 2B4T 技术报告) [01:50] 🎨 Cobra: Efficient Line Art COlorization with BRoAder References(Cobra:基于更广泛参考的高效线稿着色) [02:28] 🚀 AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference(AlayaDB:用于高效且有效的长文本LLM推理的数据基础) [03:05] 🗣 SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning(SIFT-50M:用于语音指令微调的大规模多语种数据集) [03:51] 🧰 ReTool: Reinforcement Learning for Strategic Tool Use in LLMs(ReTool:基于强化学习的LLM战略性工具使用) [04:31] 🚀 REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers(REPA-E:通过潜在扩散Transformer解锁变分自编码器的端到端调整) [05:09] 📹 Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting(Vivid4D:通过视频修复改进单目视频的4D重建) [05:51] 🤖 Robust and Fine-Grained Detection of AI Generated Texts(AI生成文本的稳健和细粒度检测) [06:34] 🧠 Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution(思想的合冲:用极小自由分解改进大型语言模型的思维链) [07:18] 🖼 BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting(BlockGaussian:基于自适应块的高效大规模场景新视角合成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    8 min
  3. 4D AGO

    2025.04.16 | Genius提升LLM推理能力;xVerify高效验证推理模型。

    本期的 15 篇论文如下: [00:22] 🧠 Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning(Genius:一种用于高级推理的通用且纯粹的无监督自训练框架) [01:06] ✅ xVerify: Efficient Answer Verifier for Reasoning Model Evaluations(xVerify:用于推理模型评估的高效答案验证器) [01:52] 🖼 Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding(Pixel-SAIL:用于像素级理解的单Transformer) [02:37] ✅ Heimdall: test-time scaling on the generative verification(海姆达尔:生成式验证的测试时扩展) [03:23] 🎨 Seedream 3.0 Technical Report(Seedream 3.0 技术报告) [04:07] 📊 How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients(指令和推理数据如何塑造后训练:基于层级梯度的数据质量分析) [04:54] 🎮 TextArena(文本竞技场:用于大型语言模型中智能行为训练与评估的竞争性文本游戏集合) [05:43] 🧠 The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer(简单性的可扩展性:使用单一Transformer的视觉-语言学习的实证分析) [06:22] 🤖 Efficient Process Reward Model Training via Active Learning(基于主动学习的高效过程奖励模型训练) [07:01] 🚀 Efficient Generative Model Training via Embedded Representation Warmup(基于嵌入表示预热的高效生成模型训练) [07:43] 🎥 NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors(NormalCrafter: 从视频扩散先验中学习时序一致的法线) [08:23] 🧠 A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce(LLM推理的极简方法:从拒绝采样到强化学习) [09:00] 🧮 DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning(DeepMath-103K:一个大规模、具有挑战性、经过净化且可验证的数学数据集,用于推进推理研究) [09:43] 🚗 Diffusion Distillation With Direct Preference Optimization For Efficient 3D LiDAR Scene Completion(基于直接偏好优化的扩散蒸馏,用于高效3D激光雷达场景补全) [10:25] 📹 PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild(PVUW 2025 挑战报告:复杂自然视频像素级理解进展) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    12 min
  4. 5D AGO

    2025.04.15 | 多模态模型性能提升;低资源推理加速优化

    本期的 15 篇论文如下: [00:23] 🖼 InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models(InternVL3:探索开源多模态模型的高级训练和测试时方案) [01:03] 🏠 PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters(PRIMA.CPP: 加速低资源家用集群上700亿参数规模大语言模型的推理) [01:46] 🖼 FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding(FUSION:用于深度跨模态理解的视觉-语言表征的完全集成) [02:26] 🤔 VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning(VL-Rethinker:通过强化学习激励视觉-语言模型的自我反思) [03:07] 🤖 Iterative Self-Training for Code Generation via Reinforced Re-Ranking(基于强化重排序的迭代自训练代码生成) [03:51] 🎬 Mavors: Multi-granularity Video Representation for Multimodal Large Language Model(Mavors:面向多模态大型语言模型的多粒度视频表征) [04:28] 🤖 AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories(AgentRewardBench:评估Web Agent轨迹的自动评估方法) [05:13] 🧠 S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models(S1-Bench:一个评估大型推理模型系统1思维能力的简单基准) [05:56] 🤔 Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability(我们是否已经统一了图像生成与理解?GPT-4o图像生成能力的一项实证研究) [06:42] 🤖 DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training(DUMP:基于强化学习的LLM后训练的自动化分布级别课程学习) [07:22] 🌍 SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users(SocioVerse:一个由LLM驱动的智能体和一千万真实用户池支持的社会模拟世界模型) [08:11] 🤖 Breaking the Data Barrier -- Building GUI Agents Through Task Generalization(打破数据壁垒——通过任务泛化构建GUI智能体) [08:56] 💡 TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning(TinyLLaVA-Video-R1:面向视频推理的小型多模态模型) [09:40] 🧪 LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models(LLM-SRBench:一个用于大型语言模型科学方程发现的新基准) [10:21] 🛡 EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety(EmoAgent:评估并保障人机交互中的心理健康安全) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  5. 6D AGO

    2025.04.14 | 经济高效视频生成;自回归图像生成扩展。

    本期的 13 篇论文如下: [00:24] 🎬 Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model(Seaweed-7B:一种经济高效的视频生成基础模型训练方法) [01:00] 🖼 GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation(GigaTok:将视觉标记器扩展到30亿参数以进行自回归图像生成) [01:42] 🎮 MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft(MineWorld:基于Minecraft的实时开源交互式世界模型) [02:25] 🖼 PixelFlow: Pixel-Space Generative Models with Flow(PixelFlow:基于Flow的像素空间生成模型) [03:05] 🤖 SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning(SQL-R1:通过强化学习训练自然语言到SQL的推理模型) [03:51] 🎨 FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation(FlexIP:用于定制图像生成的保持与个性动态控制) [04:30] 🎬 In-2-4D: Inbetweening from Two Single-View Images to 4D Generation(In-2-4D:从两张单视图图像到4D生成的补帧) [05:05] 🤔 ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance(ModernBERT还是DeBERTaV3?探究架构和数据对Transformer编码器模型性能的影响) [05:42] 🚀 Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs(盘古 Ultra:在昇腾NPU上突破稠密大型语言模型的极限) [06:21] 🤔 Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models(博士级大语言模型真的理解基础加法吗? 探究大语言模型中的规则学习与记忆) [07:11] 🛡 SAEs $\textit{Can}$ Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs(稀疏自编码器助力模型遗忘:用于大语言模型精确遗忘的动态稀疏自编码器防护) [07:52] 🤝 CoRAG: Collaborative Retrieval-Augmented Generation(CoRAG:协同检索增强生成) [08:29] 🤝 InteractVLM: 3D Interaction Reasoning from 2D Foundational Models(InteractVLM:基于2D基础模型的三维交互推理) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    10 min
  6. APR 11

    2025.04.11 | Kimi-VL模型表现优异;VCR-Bench评估推理瓶颈。

    本期的 14 篇论文如下: [00:22] 🧠 Kimi-VL Technical Report(Kimi-VL技术报告) [01:05] 🎬 VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning(VCR-Bench:一个用于视频链式思考推理的综合评估框架) [01:54] 🖼 MM-IFEngine: Towards Multimodal Instruction Following(MM-IFEngine: 面向多模态指令跟随) [02:35] 🖼 VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning(VisualCloze:一个基于视觉情境学习的通用图像生成框架) [03:15] 🤔 DeepSeek-R1 Thoughtology: Let's about LLM Reasoning(DeepSeek-R1 思维学:让我们来关于LLM的推理) [03:54] 🧩 HoloPart: Generative 3D Part Amodal Segmentation(HoloPart:生成式3D部件非模态分割) [04:36] 🤖 C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing(C3PO:面向测试时专家重混合的关键层、核心专家、协同路径优化) [05:11] 🤖 MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations(MOSAIC:用于多智能体模拟中内容传播和监管的社会人工智能建模) [05:58] 🖼 Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models(原生多模态模型的扩展法则) [06:30] 🧠 SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement(更少数据,更强性能:MCTS引导的样本选择用于数据高效的视觉推理自提升) [07:16] 🖼 Towards Visual Text Grounding of Multimodal Large Language Model(面向多模态大语言模型的视觉文本定位) [07:57] 🤖 MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular Detection(MonoPlace3D:学习用于单目3D检测的3D感知物体放置) [08:39] 🧭 Compass Control: Multi Object Orientation Control for Text-to-Image Generation(罗盘控制:用于文本到图像生成的多对象方向控制) [09:22] 📍 TAPNext: Tracking Any Point (TAP) as Next Token Prediction(TAPNext:将追踪任意点(TAP)视为下一个令牌预测) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min

    Ratings & Reviews

    5
    out of 5
    2 Ratings

    About

    每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

    You Might Also Like

    Content Restricted

    This episode can’t be played on the web in your country or region.

    To listen to explicit episodes, sign in.

    Stay up to date with this show

    Sign in or sign up to follow shows, save episodes, and get the latest updates.

    Select a country or region

    Africa, Middle East, and India

    Asia Pacific

    Europe

    Latin America and the Caribbean

    The United States and Canada