HuggingFace 每日AI论文速递

duan
HuggingFace 每日AI论文速递

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. 1일 전

    2025.07.11 | 长视频推理效率提升;单图像定制模型防过拟合。

    本期的 15 篇论文如下: [00:25] 🎬 Scaling RL to Long Videos(强化学习驱动视觉语言模型扩展至长视频) [01:10] 🖼 T-LoRA: Single Image Diffusion Model Customization Without Overfitting(T-LoRA:无过拟合的单图像扩散模型定制) [01:49] 🖼 Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology(可追踪证据增强的视觉基础推理:评估与方法) [02:28] 🤖 OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding(OST-Bench:评估多模态大语言模型在在线时空场景理解中的能力) [03:06] 🎬 Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs(面向视频大语言模型的免训练时空令牌融合加速) [03:49] 🤖 PyVision: Agentic Vision with Dynamic Tooling(PyVision:基于动态工具的Agentic视觉) [04:29] 🎬 Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling(几何强制:结合视频扩散与3D表示以实现一致的世界建模) [05:12] 🚀 LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS(LangSplatV2:高达450+ FPS的高维3D语言高斯溅射) [05:48] 🧠 Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs(跳过一层还是循环它?预训练LLM的测试时深度自适应) [06:33] 🎬 A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality(长视频叙事生成研究综述:架构、一致性与电影质量) [07:15] 🤖 Token Bottleneck: One Token to Remember Dynamics(令牌瓶颈:用一个令牌记住动态) [07:54] 🤥 Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models(机器胡扯:刻画大型语言模型中涌现的对真相的漠视) [08:41] 🧠 Beyond the Linear Separability Ceiling(超越线性可分性上限) [09:16] 🌱 Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate(生长中的Transformer:基于冻结基底的模块化组合与逐层扩展) [09:53] 🧪 SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam?(科学大师:迈向通用科学AI智能体,第一部分。X-Master作为基础:我们能在人类的最后一场考试中领先吗?) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11분
  2. 2일 전

    2025.07.10 | 零样本运动生成突破;4K图像超分辨率提升。

    本期的 14 篇论文如下: [00:22] 🤸 Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data(趋向于零:基于百万级数据的零样本运动生成) [01:03] 🖼 4KAgent: Agentic Any Image to 4K Super-Resolution(4KAgent:将任意图像转化为4K超分辨率的智能体系统) [01:39] 🖼 Perception-Aware Policy Optimization for Multimodal Reasoning(多模态推理的感知感知策略优化) [02:24] 🧪 Rethinking Verification for LLM Code Generation: From Generation to Testing(重新思考LLM代码生成的验证:从生成到测试) [03:05] 🤔 A Systematic Analysis of Hybrid Linear Attention(混合线性注意力机制的系统性分析) [03:42] 🧠 First Return, Entropy-Eliciting Explore(首次回报,熵驱动探索) [04:23] 🤖 AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs(AutoTriton:基于大型语言模型中强化学习的自动Triton编程) [05:05] 🧩 Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving(通过解耦推理与证明来解决更具挑战性的国际数学奥林匹克竞赛题) [05:47] 🚗 A Survey on Vision-Language-Action Models for Autonomous Driving(面向自动驾驶的视觉-语言-动作模型综述) [06:29] 🧪 DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models(DiffSpectra:使用扩散模型从光谱中解析分子结构) [07:09] 🗣 ModelCitizens: Representing Community Voices in Online Safety(模范公民:在线安全中代表社区的声音) [07:50] 🤖 SRT-H: A Hierarchical Framework for Autonomous Surgery via Language Conditioned Imitation Learning(SRT-H:基于语言条件模仿学习的自主手术分层框架) [08:32] 🔬 Evaluating the Critical Risks of Amazon's Nova Premier under the Frontier Model Safety Framework(基于前沿模型安全框架评估亚马逊Nova Premier的关键风险) [09:21] 🧐 AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness(AdamMeme:自适应地探查多模态大型语言模型在有害性上的推理能力) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11분
  3. 3일 전

    2025.07.09 | 潜在推理提升LLM表达能力;SingLoRA优化低秩适应性能。

    本期的 15 篇论文如下: [00:25] 🤔 A Survey on Latent Reasoning(潜在推理研究综述) [00:59] 💡 SingLoRA: Low Rank Adaptation Using a Single Matrix(SingLoRA:使用单矩阵的低秩适应) [01:47] 🧩 OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion(OmniPart:基于语义解耦和结构内聚的部件感知三维生成) [02:36] 🤖 CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization(CriticLean:评论引导的数学形式化强化学习) [03:17] 🤖 StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling(StreamVLN:基于慢速-快速上下文建模的流式视觉-语言导航) [03:50] 🫂 RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents(RLVER:基于可验证情感奖励的强化学习,用于培养共情智能体) [04:30] 🩺 MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos(MedGen:通过扩展细粒度标注的医学视频来解锁医学视频生成) [05:14] 🤖 Is Diversity All You Need for Scalable Robotic Manipulation?(可扩展的机器人操作是否只需要多样性?) [05:54] 🤖 Coding Triangle: How Does Large Language Model Understand Code?(代码三角形:大型语言模型如何理解代码?) [06:38] 🇪 Nile-Chat: Egyptian Language Models for Arabic and Latin Scripts(尼罗河Chat:用于阿拉伯语和拉丁语埃及语语言模型) [07:21] 🖱 GTA1: GUI Test-time Scaling Agent(GTA1:GUI测试时缩放代理) [08:00] 🧮 Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers(基于大语言模型的重排序器效率-效果再排序的FLOPs研究) [08:45] 🧬 PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs(PRING:重新思考从蛋白质对到图的蛋白质-蛋白质相互作用预测) [09:33] 🩻 SAMed-2: Selective Memory Enhanced Medical Segment Anything Model(SAMed-2:选择性记忆增强医学图像分割模型) [10:01] 🎬 Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation(Tora2:用于多实体视频生成的运动和外观定制扩散Transformer) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11분
  4. 4일 전

    2025.07.08 | MemOS提升内存管理效率;MLM与CLM结合优化编码器训练。

    本期的 15 篇论文如下: [00:21] 🧠 MemOS: A Memory OS for AI System(MemOS:面向人工智能系统的内存操作系统) [01:07] 🤔 Should We Still Pretrain Encoders with Masked Language Modeling?(我们是否还应该使用掩码语言模型预训练编码器?) [01:43] 🎥 4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture(4DSloMo:基于异步捕获的高速场景4D重建) [02:22] 🤖 DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge(DreamVLA:一个基于综合世界知识构想的视觉-语言-动作模型) [03:02] 🤖 Pre-Trained Policy Discriminators are General Reward Models(预训练策略判别器是通用奖励模型) [03:38] 🧠 BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset(BMMR:一个大规模双语多模态多学科推理数据集) [04:23] 🤖 RoboBrain 2.0 Technical Report(RoboBrain 2.0 技术报告) [05:04] 🧩 Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents(Easy Dataset:一个从非结构化文档中合成LLM微调数据的统一且可扩展的框架) [05:42] ✨ RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs(RefineX:通过专家指导的程序学习大规模优化预训练数据) [06:21] 🎬 StreamDiT: Real-Time Streaming Text-to-Video Generation(StreamDiT:实时流式文本到视频生成) [07:04] 📜 Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration(复兴文化遗产:一种全面的历史文献修复新方法) [07:49] 💡 OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding(OmniDraft:一种用于端侧推测解码的跨词汇、在线自适应 Drafter) [08:35] 🎨 ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation(ArtifactsBench:弥合LLM代码生成评估中的视觉交互鸿沟) [09:16] 📊 On the rankability of visual embeddings(论视觉嵌入的可排序性) [09:59] 🖼 VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents(VLM2Vec-V2:推进视频、图像和视觉文档的多模态嵌入) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11분
  5. 7월 5일

    【月末特辑】6月最火AI论文 | LLM通过自我反思提升性能;MiniMax-M1高效扩展测试计算。

    本期的 10 篇论文如下: [00:37] TOP1(🔥258) | 💡 Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning(反思、重试、奖励:通过强化学习实现LLM的自我提升) [02:51] TOP2(🔥249) | 💡 MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention(MiniMax-M1:利用闪电注意力高效扩展测试时计算) [05:24] TOP3(🔥240) | 🤖 Reinforcement Pre-Training(强化预训练) [07:54] TOP4(🔥165) | 🧠 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning(超越80/20法则:高熵少数Token驱动LLM推理的有效强化学习) [09:53] TOP5(🔥134) | 🕰 Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA(明日依旧为真吗?多语种常青问题分类以提升可信赖的问答系统) [12:24] TOP6(🔥132) | 🧠 ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models(ProRL:延长的强化学习拓展大型语言模型的推理边界) [14:50] TOP7(🔥126) | 🧠 Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models(自信即全部:基于语言模型的小样本强化学习微调) [16:36] TOP8(🔥116) | 🧲 Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights(拖拽式大语言模型:零样本提示到权重) [18:34] TOP9(🔥108) | 🤖 SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics(SmolVLA:一种用于经济高效型机器人的视觉-语言-动作模型) [21:05] TOP10(🔥107) | 🩺 Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning(灵枢:用于统一多模态医学理解与推理的通用基础模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    24분

소개

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

좋아할 만한 다른 항목

무삭제판 에피소드를 청취하려면 로그인하십시오.

이 프로그램의 최신 정보 받기

프로그램을 팔로우하고, 에피소드를 저장하고, 최신 소식을 받아보려면 로그인하거나 가입하십시오.

국가 또는 지역 선택

아프리카, 중동 및 인도

아시아 태평양

유럽

라틴 아메리카 및 카리브해

미국 및 캐나다