HuggingFace 每日AI论文速递

duan

5,0 (2)
TECHNOLOGIES
TOUS LES JOURS

每天10分钟，带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新，欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版，可在小红书搜索并关注【AI速递】

-58 MIN

2025.07.09 | 潜在推理提升LLM表达能力；SingLoRA优化低秩适应性能。

本期的 15 篇论文如下： [00:25] 🤔 A Survey on Latent Reasoning（潜在推理研究综述） [00:59] 💡 SingLoRA: Low Rank Adaptation Using a Single Matrix（SingLoRA：使用单矩阵的低秩适应） [01:47] 🧩 OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion（OmniPart：基于语义解耦和结构内聚的部件感知三维生成） [02:36] 🤖 CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization（CriticLean：评论引导的数学形式化强化学习） [03:17] 🤖 StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling（StreamVLN：基于慢速-快速上下文建模的流式视觉-语言导航） [03:50] 🫂 RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents（RLVER：基于可验证情感奖励的强化学习，用于培养共情智能体） [04:30] 🩺 MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos（MedGen：通过扩展细粒度标注的医学视频来解锁医学视频生成） [05:14] 🤖 Is Diversity All You Need for Scalable Robotic Manipulation?（可扩展的机器人操作是否只需要多样性？） [05:54] 🤖 Coding Triangle: How Does Large Language Model Understand Code?（代码三角形：大型语言模型如何理解代码？） [06:38] 🇪 Nile-Chat: Egyptian Language Models for Arabic and Latin Scripts（尼罗河Chat：用于阿拉伯语和拉丁语埃及语语言模型） [07:21] 🖱 GTA1: GUI Test-time Scaling Agent（GTA1：GUI测试时缩放代理） [08:00] 🧮 Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers（基于大语言模型的重排序器效率-效果再排序的FLOPs研究） [08:45] 🧬 PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs（PRING：重新思考从蛋白质对到图的蛋白质-蛋白质相互作用预测） [09:33] 🩻 SAMed-2: Selective Memory Enhanced Medical Segment Anything Model（SAMed-2：选择性记忆增强医学图像分割模型） [10:01] 🎬 Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation（Tora2：用于多实体视频生成的运动和外观定制扩散Transformer）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
-1 J

2025.07.08 | MemOS提升内存管理效率；MLM与CLM结合优化编码器训练。

本期的 15 篇论文如下： [00:21] 🧠 MemOS: A Memory OS for AI System（MemOS：面向人工智能系统的内存操作系统） [01:07] 🤔 Should We Still Pretrain Encoders with Masked Language Modeling?（我们是否还应该使用掩码语言模型预训练编码器？） [01:43] 🎥 4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture（4DSloMo：基于异步捕获的高速场景4D重建） [02:22] 🤖 DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge（DreamVLA：一个基于综合世界知识构想的视觉-语言-动作模型） [03:02] 🤖 Pre-Trained Policy Discriminators are General Reward Models（预训练策略判别器是通用奖励模型） [03:38] 🧠 BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset（BMMR：一个大规模双语多模态多学科推理数据集） [04:23] 🤖 RoboBrain 2.0 Technical Report（RoboBrain 2.0 技术报告） [05:04] 🧩 Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents（Easy Dataset：一个从非结构化文档中合成LLM微调数据的统一且可扩展的框架） [05:42] ✨ RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs（RefineX：通过专家指导的程序学习大规模优化预训练数据） [06:21] 🎬 StreamDiT: Real-Time Streaming Text-to-Video Generation（StreamDiT：实时流式文本到视频生成） [07:04] 📜 Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration（复兴文化遗产：一种全面的历史文献修复新方法） [07:49] 💡 OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding（OmniDraft：一种用于端侧推测解码的跨词汇、在线自适应 Drafter） [08:35] 🎨 ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation（ArtifactsBench：弥合LLM代码生成评估中的视觉交互鸿沟） [09:16] 📊 On the rankability of visual embeddings（论视觉嵌入的可排序性） [09:59] 🖼 VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents（VLM2Vec-V2：推进视频、图像和视觉文档的多模态嵌入）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
-2 J

2025.07.07 | GPT-4o在语义任务中表现良好；潜在空间模拟精度高。

本期的 4 篇论文如下： [00:27] 🖼 How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks（GPT-4o的视觉理解能力如何？在标准计算机视觉任务上评估多模态基础模型） [01:09] 🌌 Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation（迷失于潜在空间：用于物理模拟的潜在扩散模型实证研究） [01:45] 🇮 Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages（Eka-Eval：一个用于印度语言大型语言模型的综合评估框架） [02:25] ✍ LitBench: A Benchmark and Dataset for Reliable Evaluation of Creative Writing（LitBench：创意写作可靠评估的基准和数据集）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

4 min
-3 J

【周末特辑】7月第1周最火AI论文 | 多模态推理模型提升；短视频理解领先。

本期的 5 篇论文如下： [00:35] TOP1(🔥165) | 🧠 GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning（GLM-4.1V-Thinking：基于可扩展强化学习的通用多模态推理） [02:53] TOP2(🔥108) | 🎬 Kwai Keye-VL Technical Report（Kwai Keye-VL 技术报告） [05:17] TOP3(🔥67) | 🎨 LongAnimation: Long Animation Generation with Dynamic Global-Local Memory（LongAnimation：基于动态全局-局部记忆的长期动画生成） [07:40] TOP4(🔥67) | 🧭 WebSailor: Navigating Super-human Reasoning for Web Agent（WebSailor：为Web Agent导航超人推理） [10:00] TOP5(🔥58) | 🎨 BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing（BlenderFusion：基于3D的视觉编辑和生成式合成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13 min
-4 J

【月末特辑】6月最火AI论文 | LLM通过自我反思提升性能；MiniMax-M1高效扩展测试计算。

本期的 10 篇论文如下： [00:37] TOP1(🔥258) | 💡 Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning（反思、重试、奖励：通过强化学习实现LLM的自我提升） [02:51] TOP2(🔥249) | 💡 MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention（MiniMax-M1：利用闪电注意力高效扩展测试时计算） [05:24] TOP3(🔥240) | 🤖 Reinforcement Pre-Training（强化预训练） [07:54] TOP4(🔥165) | 🧠 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning（超越80/20法则：高熵少数Token驱动LLM推理的有效强化学习） [09:53] TOP5(🔥134) | 🕰 Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA（明日依旧为真吗？多语种常青问题分类以提升可信赖的问答系统） [12:24] TOP6(🔥132) | 🧠 ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models（ProRL：延长的强化学习拓展大型语言模型的推理边界） [14:50] TOP7(🔥126) | 🧠 Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models（自信即全部：基于语言模型的小样本强化学习微调） [16:36] TOP8(🔥116) | 🧲 Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights（拖拽式大语言模型：零样本提示到权重） [18:34] TOP9(🔥108) | 🤖 SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics（SmolVLA：一种用于经济高效型机器人的视觉-语言-动作模型） [21:05] TOP10(🔥107) | 🩺 Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning（灵枢：用于统一多模态医学理解与推理的通用基础模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

24 min
-5 J

2025.07.04 | WebSailor提升LLM推理能力；LangScene-X优化3D场景重建。

本期的 15 篇论文如下： [00:22] 🧭 WebSailor: Navigating Super-human Reasoning for Web Agent（WebSailor：为Web Agent导航超人推理） [00:59] 🖼 LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion（LangScene-X：通过TriMap视频扩散重建可泛化的3D语言嵌入场景） [01:44] 🧬 IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction（IntFold：用于通用和专用生物分子结构预测的可控基础模型） [02:35] 👂 Heeding the Inner Voice: Aligning ControlNet Training via Intermediate Features Feedback（倾听内心的声音：通过中间特征反馈对齐ControlNet训练） [03:17] 🤝 Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy（Skywork-Reward-V2：通过人机协同扩展偏好数据标注） [04:00] 🖼 Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers（基于图像的多模态推理：基础、方法与未来前沿） [04:38] 🧠 Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving（布尔巴基：用于定理证明的自生成和目标条件MDP） [05:12] 🧠 Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search（解耦规划与执行：一种用于深度搜索的分层推理框架） [05:47] 💡 Fast and Simplex: 2-Simplicial Attention in Triton（快速且简明：Triton中的2-单形注意力机制） [06:33] 🧐 Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers（大型语言模型能否识别科学研究中的关键局限性？人工智能研究论文的系统性评估） [07:16] 🧩 Selecting and Merging: Towards Adaptable and Scalable Named Entity Recognition with Large Language Models（选择与合并：面向具有大型语言模型的可适应和可扩展的命名实体识别） [08:12] 🤖 Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs（自校正基准：揭示并解决大型语言模型中的自校正盲点） [08:51] 💡 Energy-Based Transformers are Scalable Learners and Thinkers（基于能量的Transformer是可扩展的学习者和思考者） [09:33] ⚙ AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training（AsyncFlow：用于高效大语言模型后训练的异步流式强化学习框架） [10:16] 🚀 ZeCO: Zero Communication Overhead Sequence Parallelism for Linear Attention（ZeCO：线性注意力机制的零通信开销序列并行）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12 min
-5 J

2025.07.03 | 多模态模型提升短视频理解；动画生成保持颜色一致。

本期的 9 篇论文如下： [00:21] 🎬 Kwai Keye-VL Technical Report（Kwai Keye-VL 技术报告） [01:02] 🎨 LongAnimation: Long Animation Generation with Dynamic Global-Local Memory（LongAnimation：基于动态全局-局部记忆的长期动画生成） [01:50] 👁 Depth Anything at Any Condition（任意条件下的深度感知） [02:28] 🤖 A Survey on Vision-Language-Action Models: An Action Tokenization Perspective（视觉-语言-动作模型综述：一种动作Token化的视角） [03:11] 🪄 FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model（FreeMorph：基于扩散模型的免调参通用图像渐变） [03:51] 🖼 Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation（面向高效自回归图像生成的局部感知并行解码） [04:33] 🎬 STR-Match: Matching SpatioTemporal Relevance Score for Training-Free Video Editing（STR-Match: 匹配时空相关性得分的免训练视频编辑方法） [05:14] 📊 MARVIS: Modality Adaptive Reasoning over VISualizations（MARVIS：基于可视化的模态自适应推理） [05:51] 🗣 JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching（JAM-Flow：基于流匹配的联合音频-运动合成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

7 min
2 JUIL.

2025.07.02 | 多模态推理提升；双向嵌入优化

本期的 12 篇论文如下： [00:23] 💡 GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning（GLM-4.1V-Thinking：基于可扩展强化学习的通用多模态推理） [01:00] 🖼 MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings（MoCa：模态感知持续预训练提升双向多模态嵌入效果） [01:35] 🔬 SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks（SciArena：科学文献任务中基础模型的开放评估平台） [02:19] 🤔 Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning（数学推理能力是否能提升通用大语言模型的能力？理解大语言模型推理的迁移性） [02:59] 🎬 Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation（径向注意力：用于长视频生成的具有能量衰减的O(n log n)稀疏注意力机制） [03:37] 🤖 DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation（DiffuCoder：理解并改进用于代码生成的掩码扩散模型） [04:19] 🧠 HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context（HumanOmniV2：基于上下文理解到全模态推理） [04:53] 🧠 Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact（超越Token：从脑启发智能到通用人工智能的认知基础及其社会影响） [05:30] 💡 Data Efficacy for Language Model Training（语言模型训练中的数据效能） [06:05] 🎬 FreeLong++: Training-Free Long Video Generation via Multi-band SpectralFusion（FreeLong++：通过多频段频谱融合实现免训练长视频生成） [06:40] 🖼 IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering（IR3D-Bench：评估视觉-语言模型作为智能体进行逆向渲染的场景理解能力） [07:28] 🛡 Peccavi: Visual Paraphrase Attack Safe and Distortion Free Image Watermarking Technique for AI-Generated Images（Peccavi：一种针对AI生成图像的视觉释义攻击安全且无失真的图像水印技术）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

9 min

Tout afficher (324)

sur 5

2 notes

支持！！

16 févr.

Fergie.W

希望能一直做下去

Création

duan
Années d’activité

2024 - 2025
Épisodes

324
Classification

Tous publics
Site web de l’émission

HuggingFace 每日AI论文速递

Technologies

Technologies

Chaque semaine
Technologies

Technologies

Toutes les 2 semaines
Technologies

Technologies

Toutes les 2 semaines
Investissement

Investissement

Chaque semaine
Affaires

Affaires

Tous les jours
Technologies

Technologies

Chaque semaine
Affaires

Affaires

Tous les mois

HuggingFace 每日AI论文速递

2025.07.09 | 潜在推理提升LLM表达能力；SingLoRA优化低秩适应性能。

2025.07.08 | MemOS提升内存管理效率；MLM与CLM结合优化编码器训练。

2025.07.07 | GPT-4o在语义任务中表现良好；潜在空间模拟精度高。

【周末特辑】7月第1周最火AI论文 | 多模态推理模型提升；短视频理解领先。

【月末特辑】6月最火AI论文 | LLM通过自我反思提升性能；MiniMax-M1高效扩展测试计算。

2025.07.04 | WebSailor提升LLM推理能力；LangScene-X优化3D场景重建。

2025.07.03 | 多模态模型提升短视频理解；动画生成保持颜色一致。

2025.07.02 | 多模态推理提升；双向嵌入优化

Notes et avis

支持！！

À propos

Informations

Vous aimeriez peut‑être aussi

HuggingFace 每日AI论文速递

Épisodes

2025.07.09 | 潜在推理提升LLM表达能力；SingLoRA优化低秩适应性能。

2025.07.08 | MemOS提升内存管理效率；MLM与CLM结合优化编码器训练。

2025.07.07 | GPT-4o在语义任务中表现良好；潜在空间模拟精度高。

【周末特辑】7月第1周最火AI论文 | 多模态推理模型提升；短视频理解领先。

【月末特辑】6月最火AI论文 | LLM通过自我反思提升性能；MiniMax-M1高效扩展测试计算。

2025.07.04 | WebSailor提升LLM推理能力；LangScene-X优化3D场景重建。

2025.07.03 | 多模态模型提升短视频理解；动画生成保持颜色一致。

2025.07.02 | 多模态推理提升；双向嵌入优化

Notes et avis

À propos

Informations

Vous aimeriez peut‑être aussi