HuggingFace 每日AI论文速递

duan
HuggingFace 每日AI论文速递

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. -58 MIN

    2025.07.09 | 潜在推理提升LLM表达能力;SingLoRA优化低秩适应性能。

    本期的 15 篇论文如下: [00:25] 🤔 A Survey on Latent Reasoning(潜在推理研究综述) [00:59] 💡 SingLoRA: Low Rank Adaptation Using a Single Matrix(SingLoRA:使用单矩阵的低秩适应) [01:47] 🧩 OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion(OmniPart:基于语义解耦和结构内聚的部件感知三维生成) [02:36] 🤖 CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization(CriticLean:评论引导的数学形式化强化学习) [03:17] 🤖 StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling(StreamVLN:基于慢速-快速上下文建模的流式视觉-语言导航) [03:50] 🫂 RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents(RLVER:基于可验证情感奖励的强化学习,用于培养共情智能体) [04:30] 🩺 MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos(MedGen:通过扩展细粒度标注的医学视频来解锁医学视频生成) [05:14] 🤖 Is Diversity All You Need for Scalable Robotic Manipulation?(可扩展的机器人操作是否只需要多样性?) [05:54] 🤖 Coding Triangle: How Does Large Language Model Understand Code?(代码三角形:大型语言模型如何理解代码?) [06:38] 🇪 Nile-Chat: Egyptian Language Models for Arabic and Latin Scripts(尼罗河Chat:用于阿拉伯语和拉丁语埃及语语言模型) [07:21] 🖱 GTA1: GUI Test-time Scaling Agent(GTA1:GUI测试时缩放代理) [08:00] 🧮 Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers(基于大语言模型的重排序器效率-效果再排序的FLOPs研究) [08:45] 🧬 PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs(PRING:重新思考从蛋白质对到图的蛋白质-蛋白质相互作用预测) [09:33] 🩻 SAMed-2: Selective Memory Enhanced Medical Segment Anything Model(SAMed-2:选择性记忆增强医学图像分割模型) [10:01] 🎬 Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation(Tora2:用于多实体视频生成的运动和外观定制扩散Transformer) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  2. -1 J

    2025.07.08 | MemOS提升内存管理效率;MLM与CLM结合优化编码器训练。

    本期的 15 篇论文如下: [00:21] 🧠 MemOS: A Memory OS for AI System(MemOS:面向人工智能系统的内存操作系统) [01:07] 🤔 Should We Still Pretrain Encoders with Masked Language Modeling?(我们是否还应该使用掩码语言模型预训练编码器?) [01:43] 🎥 4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture(4DSloMo:基于异步捕获的高速场景4D重建) [02:22] 🤖 DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge(DreamVLA:一个基于综合世界知识构想的视觉-语言-动作模型) [03:02] 🤖 Pre-Trained Policy Discriminators are General Reward Models(预训练策略判别器是通用奖励模型) [03:38] 🧠 BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset(BMMR:一个大规模双语多模态多学科推理数据集) [04:23] 🤖 RoboBrain 2.0 Technical Report(RoboBrain 2.0 技术报告) [05:04] 🧩 Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents(Easy Dataset:一个从非结构化文档中合成LLM微调数据的统一且可扩展的框架) [05:42] ✨ RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs(RefineX:通过专家指导的程序学习大规模优化预训练数据) [06:21] 🎬 StreamDiT: Real-Time Streaming Text-to-Video Generation(StreamDiT:实时流式文本到视频生成) [07:04] 📜 Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration(复兴文化遗产:一种全面的历史文献修复新方法) [07:49] 💡 OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding(OmniDraft:一种用于端侧推测解码的跨词汇、在线自适应 Drafter) [08:35] 🎨 ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation(ArtifactsBench:弥合LLM代码生成评估中的视觉交互鸿沟) [09:16] 📊 On the rankability of visual embeddings(论视觉嵌入的可排序性) [09:59] 🖼 VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents(VLM2Vec-V2:推进视频、图像和视觉文档的多模态嵌入) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  3. -4 J

    【月末特辑】6月最火AI论文 | LLM通过自我反思提升性能;MiniMax-M1高效扩展测试计算。

    本期的 10 篇论文如下: [00:37] TOP1(🔥258) | 💡 Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning(反思、重试、奖励:通过强化学习实现LLM的自我提升) [02:51] TOP2(🔥249) | 💡 MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention(MiniMax-M1:利用闪电注意力高效扩展测试时计算) [05:24] TOP3(🔥240) | 🤖 Reinforcement Pre-Training(强化预训练) [07:54] TOP4(🔥165) | 🧠 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning(超越80/20法则:高熵少数Token驱动LLM推理的有效强化学习) [09:53] TOP5(🔥134) | 🕰 Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA(明日依旧为真吗?多语种常青问题分类以提升可信赖的问答系统) [12:24] TOP6(🔥132) | 🧠 ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models(ProRL:延长的强化学习拓展大型语言模型的推理边界) [14:50] TOP7(🔥126) | 🧠 Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models(自信即全部:基于语言模型的小样本强化学习微调) [16:36] TOP8(🔥116) | 🧲 Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights(拖拽式大语言模型:零样本提示到权重) [18:34] TOP9(🔥108) | 🤖 SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics(SmolVLA:一种用于经济高效型机器人的视觉-语言-动作模型) [21:05] TOP10(🔥107) | 🩺 Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning(灵枢:用于统一多模态医学理解与推理的通用基础模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    24 min
  4. -5 J

    2025.07.04 | WebSailor提升LLM推理能力;LangScene-X优化3D场景重建。

    本期的 15 篇论文如下: [00:22] 🧭 WebSailor: Navigating Super-human Reasoning for Web Agent(WebSailor:为Web Agent导航超人推理) [00:59] 🖼 LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion(LangScene-X:通过TriMap视频扩散重建可泛化的3D语言嵌入场景) [01:44] 🧬 IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction(IntFold:用于通用和专用生物分子结构预测的可控基础模型) [02:35] 👂 Heeding the Inner Voice: Aligning ControlNet Training via Intermediate Features Feedback(倾听内心的声音:通过中间特征反馈对齐ControlNet训练) [03:17] 🤝 Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy(Skywork-Reward-V2:通过人机协同扩展偏好数据标注) [04:00] 🖼 Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers(基于图像的多模态推理:基础、方法与未来前沿) [04:38] 🧠 Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving(布尔巴基:用于定理证明的自生成和目标条件MDP) [05:12] 🧠 Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search(解耦规划与执行:一种用于深度搜索的分层推理框架) [05:47] 💡 Fast and Simplex: 2-Simplicial Attention in Triton(快速且简明:Triton中的2-单形注意力机制) [06:33] 🧐 Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers(大型语言模型能否识别科学研究中的关键局限性?人工智能研究论文的系统性评估) [07:16] 🧩 Selecting and Merging: Towards Adaptable and Scalable Named Entity Recognition with Large Language Models(选择与合并:面向具有大型语言模型的可适应和可扩展的命名实体识别) [08:12] 🤖 Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs(自校正基准:揭示并解决大型语言模型中的自校正盲点) [08:51] 💡 Energy-Based Transformers are Scalable Learners and Thinkers(基于能量的Transformer是可扩展的学习者和思考者) [09:33] ⚙ AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training(AsyncFlow:用于高效大语言模型后训练的异步流式强化学习框架) [10:16] 🚀 ZeCO: Zero Communication Overhead Sequence Parallelism for Linear Attention(ZeCO:线性注意力机制的零通信开销序列并行) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    12 min
  5. 2 JUIL.

    2025.07.02 | 多模态推理提升;双向嵌入优化

    本期的 12 篇论文如下: [00:23] 💡 GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning(GLM-4.1V-Thinking:基于可扩展强化学习的通用多模态推理) [01:00] 🖼 MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings(MoCa:模态感知持续预训练提升双向多模态嵌入效果) [01:35] 🔬 SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks(SciArena:科学文献任务中基础模型的开放评估平台) [02:19] 🤔 Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning(数学推理能力是否能提升通用大语言模型的能力?理解大语言模型推理的迁移性) [02:59] 🎬 Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation(径向注意力:用于长视频生成的具有能量衰减的O(n log n)稀疏注意力机制) [03:37] 🤖 DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation(DiffuCoder:理解并改进用于代码生成的掩码扩散模型) [04:19] 🧠 HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context(HumanOmniV2:基于上下文理解到全模态推理) [04:53] 🧠 Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact(超越Token:从脑启发智能到通用人工智能的认知基础及其社会影响) [05:30] 💡 Data Efficacy for Language Model Training(语言模型训练中的数据效能) [06:05] 🎬 FreeLong++: Training-Free Long Video Generation via Multi-band SpectralFusion(FreeLong++:通过多频段频谱融合实现免训练长视频生成) [06:40] 🖼 IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering(IR3D-Bench:评估视觉-语言模型作为智能体进行逆向渲染的场景理解能力) [07:28] 🛡 Peccavi: Visual Paraphrase Attack Safe and Distortion Free Image Watermarking Technique for AI-Generated Images(Peccavi:一种针对AI生成图像的视觉释义攻击安全且无失真的图像水印技术) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    9 min

Notes et avis

5
sur 5
2 notes

À propos

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

Vous aimeriez peut‑être aussi

Pour écouter des épisodes au contenu explicite, connectez‑vous.

Recevez les dernières actualités sur cette émission

Connectez‑vous ou inscrivez‑vous pour suivre des émissions, enregistrer des épisodes et recevoir les dernières actualités.

Choisissez un pays ou une région

Afrique, Moyen‑Orient et Inde

Asie‑Pacifique

Europe

Amérique latine et Caraïbes

États‑Unis et Canada