HuggingFace 每日AI论文速递

duan

5,0 (2)
CÔNG NGHỆ
HẰNG NGÀY

每天10分钟，带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新，欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版，可在小红书搜索并关注【AI速递】

19 GIỜ TRƯỚC

2025.07.11 | 长视频推理效率提升；单图像定制模型防过拟合。

本期的 15 篇论文如下： [00:25] 🎬 Scaling RL to Long Videos（强化学习驱动视觉语言模型扩展至长视频） [01:10] 🖼 T-LoRA: Single Image Diffusion Model Customization Without Overfitting（T-LoRA：无过拟合的单图像扩散模型定制） [01:49] 🖼 Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology（可追踪证据增强的视觉基础推理：评估与方法） [02:28] 🤖 OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding（OST-Bench：评估多模态大语言模型在在线时空场景理解中的能力） [03:06] 🎬 Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs（面向视频大语言模型的免训练时空令牌融合加速） [03:49] 🤖 PyVision: Agentic Vision with Dynamic Tooling（PyVision：基于动态工具的Agentic视觉） [04:29] 🎬 Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling（几何强制：结合视频扩散与3D表示以实现一致的世界建模） [05:12] 🚀 LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS（LangSplatV2：高达450+ FPS的高维3D语言高斯溅射） [05:48] 🧠 Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs（跳过一层还是循环它？预训练LLM的测试时深度自适应） [06:33] 🎬 A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality（长视频叙事生成研究综述：架构、一致性与电影质量） [07:15] 🤖 Token Bottleneck: One Token to Remember Dynamics（令牌瓶颈：用一个令牌记住动态） [07:54] 🤥 Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models（机器胡扯：刻画大型语言模型中涌现的对真相的漠视） [08:41] 🧠 Beyond the Linear Separability Ceiling（超越线性可分性上限） [09:16] 🌱 Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate（生长中的Transformer：基于冻结基底的模块化组合与逐层扩展） [09:53] 🧪 SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam?（科学大师：迈向通用科学AI智能体，第一部分。X-Master作为基础：我们能在人类的最后一场考试中领先吗？）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 phút
1 NGÀY TRƯỚC

2025.07.10 | 零样本运动生成突破；4K图像超分辨率提升。

本期的 14 篇论文如下： [00:22] 🤸 Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data（趋向于零：基于百万级数据的零样本运动生成） [01:03] 🖼 4KAgent: Agentic Any Image to 4K Super-Resolution（4KAgent：将任意图像转化为4K超分辨率的智能体系统） [01:39] 🖼 Perception-Aware Policy Optimization for Multimodal Reasoning（多模态推理的感知感知策略优化） [02:24] 🧪 Rethinking Verification for LLM Code Generation: From Generation to Testing（重新思考LLM代码生成的验证：从生成到测试） [03:05] 🤔 A Systematic Analysis of Hybrid Linear Attention（混合线性注意力机制的系统性分析） [03:42] 🧠 First Return, Entropy-Eliciting Explore（首次回报，熵驱动探索） [04:23] 🤖 AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs（AutoTriton：基于大型语言模型中强化学习的自动Triton编程） [05:05] 🧩 Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving（通过解耦推理与证明来解决更具挑战性的国际数学奥林匹克竞赛题） [05:47] 🚗 A Survey on Vision-Language-Action Models for Autonomous Driving（面向自动驾驶的视觉-语言-动作模型综述） [06:29] 🧪 DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models（DiffSpectra：使用扩散模型从光谱中解析分子结构） [07:09] 🗣 ModelCitizens: Representing Community Voices in Online Safety（模范公民：在线安全中代表社区的声音） [07:50] 🤖 SRT-H: A Hierarchical Framework for Autonomous Surgery via Language Conditioned Imitation Learning（SRT-H：基于语言条件模仿学习的自主手术分层框架） [08:32] 🔬 Evaluating the Critical Risks of Amazon's Nova Premier under the Frontier Model Safety Framework（基于前沿模型安全框架评估亚马逊Nova Premier的关键风险） [09:21] 🧐 AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness（AdamMeme：自适应地探查多模态大型语言模型在有害性上的推理能力）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 phút
2 NGÀY TRƯỚC

2025.07.09 | 潜在推理提升LLM表达能力；SingLoRA优化低秩适应性能。

本期的 15 篇论文如下： [00:25] 🤔 A Survey on Latent Reasoning（潜在推理研究综述） [00:59] 💡 SingLoRA: Low Rank Adaptation Using a Single Matrix（SingLoRA：使用单矩阵的低秩适应） [01:47] 🧩 OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion（OmniPart：基于语义解耦和结构内聚的部件感知三维生成） [02:36] 🤖 CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization（CriticLean：评论引导的数学形式化强化学习） [03:17] 🤖 StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling（StreamVLN：基于慢速-快速上下文建模的流式视觉-语言导航） [03:50] 🫂 RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents（RLVER：基于可验证情感奖励的强化学习，用于培养共情智能体） [04:30] 🩺 MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos（MedGen：通过扩展细粒度标注的医学视频来解锁医学视频生成） [05:14] 🤖 Is Diversity All You Need for Scalable Robotic Manipulation?（可扩展的机器人操作是否只需要多样性？） [05:54] 🤖 Coding Triangle: How Does Large Language Model Understand Code?（代码三角形：大型语言模型如何理解代码？） [06:38] 🇪 Nile-Chat: Egyptian Language Models for Arabic and Latin Scripts（尼罗河Chat：用于阿拉伯语和拉丁语埃及语语言模型） [07:21] 🖱 GTA1: GUI Test-time Scaling Agent（GTA1：GUI测试时缩放代理） [08:00] 🧮 Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers（基于大语言模型的重排序器效率-效果再排序的FLOPs研究） [08:45] 🧬 PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs（PRING：重新思考从蛋白质对到图的蛋白质-蛋白质相互作用预测） [09:33] 🩻 SAMed-2: Selective Memory Enhanced Medical Segment Anything Model（SAMed-2：选择性记忆增强医学图像分割模型） [10:01] 🎬 Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation（Tora2：用于多实体视频生成的运动和外观定制扩散Transformer）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 phút
3 NGÀY TRƯỚC

2025.07.08 | MemOS提升内存管理效率；MLM与CLM结合优化编码器训练。

本期的 15 篇论文如下： [00:21] 🧠 MemOS: A Memory OS for AI System（MemOS：面向人工智能系统的内存操作系统） [01:07] 🤔 Should We Still Pretrain Encoders with Masked Language Modeling?（我们是否还应该使用掩码语言模型预训练编码器？） [01:43] 🎥 4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture（4DSloMo：基于异步捕获的高速场景4D重建） [02:22] 🤖 DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge（DreamVLA：一个基于综合世界知识构想的视觉-语言-动作模型） [03:02] 🤖 Pre-Trained Policy Discriminators are General Reward Models（预训练策略判别器是通用奖励模型） [03:38] 🧠 BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset（BMMR：一个大规模双语多模态多学科推理数据集） [04:23] 🤖 RoboBrain 2.0 Technical Report（RoboBrain 2.0 技术报告） [05:04] 🧩 Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents（Easy Dataset：一个从非结构化文档中合成LLM微调数据的统一且可扩展的框架） [05:42] ✨ RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs（RefineX：通过专家指导的程序学习大规模优化预训练数据） [06:21] 🎬 StreamDiT: Real-Time Streaming Text-to-Video Generation（StreamDiT：实时流式文本到视频生成） [07:04] 📜 Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration（复兴文化遗产：一种全面的历史文献修复新方法） [07:49] 💡 OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding（OmniDraft：一种用于端侧推测解码的跨词汇、在线自适应 Drafter） [08:35] 🎨 ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation（ArtifactsBench：弥合LLM代码生成评估中的视觉交互鸿沟） [09:16] 📊 On the rankability of visual embeddings（论视觉嵌入的可排序性） [09:59] 🖼 VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents（VLM2Vec-V2：推进视频、图像和视觉文档的多模态嵌入）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 phút
4 NGÀY TRƯỚC

2025.07.07 | GPT-4o在语义任务中表现良好；潜在空间模拟精度高。

本期的 4 篇论文如下： [00:27] 🖼 How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks（GPT-4o的视觉理解能力如何？在标准计算机视觉任务上评估多模态基础模型） [01:09] 🌌 Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation（迷失于潜在空间：用于物理模拟的潜在扩散模型实证研究） [01:45] 🇮 Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages（Eka-Eval：一个用于印度语言大型语言模型的综合评估框架） [02:25] ✍ LitBench: A Benchmark and Dataset for Reliable Evaluation of Creative Writing（LitBench：创意写作可靠评估的基准和数据集）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

4 phút
6 NGÀY TRƯỚC

【周末特辑】7月第1周最火AI论文 | 多模态推理模型提升；短视频理解领先。

本期的 5 篇论文如下： [00:35] TOP1(🔥165) | 🧠 GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning（GLM-4.1V-Thinking：基于可扩展强化学习的通用多模态推理） [02:53] TOP2(🔥108) | 🎬 Kwai Keye-VL Technical Report（Kwai Keye-VL 技术报告） [05:17] TOP3(🔥67) | 🎨 LongAnimation: Long Animation Generation with Dynamic Global-Local Memory（LongAnimation：基于动态全局-局部记忆的长期动画生成） [07:40] TOP4(🔥67) | 🧭 WebSailor: Navigating Super-human Reasoning for Web Agent（WebSailor：为Web Agent导航超人推理） [10:00] TOP5(🔥58) | 🎨 BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing（BlenderFusion：基于3D的视觉编辑和生成式合成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13 phút
5 THG 7

【月末特辑】6月最火AI论文 | LLM通过自我反思提升性能；MiniMax-M1高效扩展测试计算。

本期的 10 篇论文如下： [00:37] TOP1(🔥258) | 💡 Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning（反思、重试、奖励：通过强化学习实现LLM的自我提升） [02:51] TOP2(🔥249) | 💡 MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention（MiniMax-M1：利用闪电注意力高效扩展测试时计算） [05:24] TOP3(🔥240) | 🤖 Reinforcement Pre-Training（强化预训练） [07:54] TOP4(🔥165) | 🧠 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning（超越80/20法则：高熵少数Token驱动LLM推理的有效强化学习） [09:53] TOP5(🔥134) | 🕰 Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA（明日依旧为真吗？多语种常青问题分类以提升可信赖的问答系统） [12:24] TOP6(🔥132) | 🧠 ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models（ProRL：延长的强化学习拓展大型语言模型的推理边界） [14:50] TOP7(🔥126) | 🧠 Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models（自信即全部：基于语言模型的小样本强化学习微调） [16:36] TOP8(🔥116) | 🧲 Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights（拖拽式大语言模型：零样本提示到权重） [18:34] TOP9(🔥108) | 🤖 SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics（SmolVLA：一种用于经济高效型机器人的视觉-语言-动作模型） [21:05] TOP10(🔥107) | 🩺 Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning（灵枢：用于统一多模态医学理解与推理的通用基础模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

24 phút
4 THG 7

2025.07.04 | WebSailor提升LLM推理能力；LangScene-X优化3D场景重建。

本期的 15 篇论文如下： [00:22] 🧭 WebSailor: Navigating Super-human Reasoning for Web Agent（WebSailor：为Web Agent导航超人推理） [00:59] 🖼 LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion（LangScene-X：通过TriMap视频扩散重建可泛化的3D语言嵌入场景） [01:44] 🧬 IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction（IntFold：用于通用和专用生物分子结构预测的可控基础模型） [02:35] 👂 Heeding the Inner Voice: Aligning ControlNet Training via Intermediate Features Feedback（倾听内心的声音：通过中间特征反馈对齐ControlNet训练） [03:17] 🤝 Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy（Skywork-Reward-V2：通过人机协同扩展偏好数据标注） [04:00] 🖼 Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers（基于图像的多模态推理：基础、方法与未来前沿） [04:38] 🧠 Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving（布尔巴基：用于定理证明的自生成和目标条件MDP） [05:12] 🧠 Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search（解耦规划与执行：一种用于深度搜索的分层推理框架） [05:47] 💡 Fast and Simplex: 2-Simplicial Attention in Triton（快速且简明：Triton中的2-单形注意力机制） [06:33] 🧐 Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers（大型语言模型能否识别科学研究中的关键局限性？人工智能研究论文的系统性评估） [07:16] 🧩 Selecting and Merging: Towards Adaptable and Scalable Named Entity Recognition with Large Language Models（选择与合并：面向具有大型语言模型的可适应和可扩展的命名实体识别） [08:12] 🤖 Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs（自校正基准：揭示并解决大型语言模型中的自校正盲点） [08:51] 💡 Energy-Based Transformers are Scalable Learners and Thinkers（基于能量的Transformer是可扩展的学习者和思考者） [09:33] ⚙ AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training（AsyncFlow：用于高效大语言模型后训练的异步流式强化学习框架） [10:16] 🚀 ZeCO: Zero Communication Overhead Sequence Parallelism for Linear Attention（ZeCO：线性注意力机制的零通信开销序列并行）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12 phút

Xem tất cả (326)

2 Xếp hạng

支持！！

16 thg 2

Fergie.W

希望能一直做下去

Nhà sáng tạo

duan
Năm hoạt động

2024 - 2025
Tập

326
Xếp hạng

Sạch
Trang web chương trình

HuggingFace 每日AI论文速递

Công nghệ

Công nghệ

Hằng tuần
Công nghệ

Công nghệ

Hai tuần một lần
Công nghệ

Công nghệ

Hai tuần một lần
Đầu tư

Đầu tư

Hằng tuần
Công nghệ

Công nghệ

Hằng tuần
Đầu tư

Đầu tư

Hằng ngày
Kinh doanh

Kinh doanh

Hằng ngày

HuggingFace 每日AI论文速递

2025.07.11 | 长视频推理效率提升；单图像定制模型防过拟合。

2025.07.10 | 零样本运动生成突破；4K图像超分辨率提升。

2025.07.09 | 潜在推理提升LLM表达能力；SingLoRA优化低秩适应性能。

2025.07.08 | MemOS提升内存管理效率；MLM与CLM结合优化编码器训练。

2025.07.07 | GPT-4o在语义任务中表现良好；潜在空间模拟精度高。

【周末特辑】7月第1周最火AI论文 | 多模态推理模型提升；短视频理解领先。

【月末特辑】6月最火AI论文 | LLM通过自我反思提升性能；MiniMax-M1高效扩展测试计算。

2025.07.04 | WebSailor提升LLM推理能力；LangScene-X优化3D场景重建。

Xếp Hạng & Nhận Xét

支持！！

Giới Thiệu

Thông Tin

Có Thể Bạn Cũng Thích

HuggingFace 每日AI论文速递

Tập

2025.07.11 | 长视频推理效率提升；单图像定制模型防过拟合。

2025.07.10 | 零样本运动生成突破；4K图像超分辨率提升。

2025.07.09 | 潜在推理提升LLM表达能力；SingLoRA优化低秩适应性能。

2025.07.08 | MemOS提升内存管理效率；MLM与CLM结合优化编码器训练。

2025.07.07 | GPT-4o在语义任务中表现良好；潜在空间模拟精度高。

【周末特辑】7月第1周最火AI论文 | 多模态推理模型提升；短视频理解领先。

【月末特辑】6月最火AI论文 | LLM通过自我反思提升性能；MiniMax-M1高效扩展测试计算。

2025.07.04 | WebSailor提升LLM推理能力；LangScene-X优化3D场景重建。

Xếp Hạng & Nhận Xét

Giới Thiệu

Thông Tin

Có Thể Bạn Cũng Thích