HuggingFace 每日AI论文速递

duan

5.0 (2)
TECHNOLOGY
UPDATED DAILY

每天10分钟，带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新，欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版，可在小红书搜索并关注【AI速递】

18M AGO

2025.07.14 | 高效推理路径选择；压缩光场令牌渲染

本期的 14 篇论文如下： [00:22] 🧠 Test-Time Scaling with Reflective Generative Model（基于反射生成模型的测试时缩放） [00:59] 💡 CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering（CLiFT：用于计算高效和自适应神经渲染的压缩光场令牌） [01:34] 💻 NeuralOS: Towards Simulating Operating Systems via Neural Generative Models（NeuralOS：迈向通过神经生成模型模拟操作系统的方向） [02:19] 🧠 KV Cache Steering for Inducing Reasoning in Small Language Models（用于诱导小语言模型推理的KV缓存引导） [03:03] 🧠 Neural-Driven Image Editing（神经驱动的图像编辑） [03:42] 🎬 Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective（Lumos-1：基于统一模型视角的自回归视频生成） [04:27] 🧠 Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning（开放视觉推理器：迁移语言认知行为以实现视觉推理） [05:14] 🧩 From One to More: Contextual Part Latents for 3D Generation（从一到多：用于3D生成的上下文部件隐变量） [05:53] 🤖 One Token to Fool LLM-as-a-Judge（一个Token即可欺骗LLM法官） [06:32] 🖼 Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation（视觉基础模型作为自回归图像生成的有效视觉标记器） [07:16] 🔭 What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models（基础模型发现了什么？利用归纳偏置来探测世界模型） [08:00] 🚀 Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities（Gemini 2.5：通过高级推理、多模态、长上下文和下一代 Agent 能力推向新前沿） [08:48] 🚀 BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity（BlockFFN：面向终端侧加速友好的块级激活稀疏混合专家模型） [09:25] 😵 Robust Multimodal Large Language Models Against Modality Conflict（面向模态冲突的鲁棒多模态大语言模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10 min
1D AGO

【周末特辑】7月第2周最火AI论文 | 长视频推理框架创新；内存操作系统提升AI性能

本期的 5 篇论文如下： [00:42] TOP1(🔥109) | 🎬 Scaling RL to Long Videos（强化学习驱动视觉语言模型扩展至长视频） [02:54] TOP2(🔥106) | 🧠 MemOS: A Memory OS for AI System（MemOS：面向人工智能系统的内存操作系统） [05:19] TOP3(🔥91) | 🖼 T-LoRA: Single Image Diffusion Model Customization Without Overfitting（T-LoRA：无过拟合的单图像扩散模型定制） [07:51] TOP4(🔥88) | 💡 SingLoRA: Low Rank Adaptation Using a Single Matrix（SingLoRA：使用单矩阵的低秩适应） [09:41] TOP5(🔥72) | 🤔 Should We Still Pretrain Encoders with Masked Language Modeling?（我们是否还应该使用掩码语言模型预训练编码器？）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12 min
3D AGO

2025.07.11 | 长视频推理效率提升；单图像定制模型防过拟合。

本期的 15 篇论文如下： [00:25] 🎬 Scaling RL to Long Videos（强化学习驱动视觉语言模型扩展至长视频） [01:10] 🖼 T-LoRA: Single Image Diffusion Model Customization Without Overfitting（T-LoRA：无过拟合的单图像扩散模型定制） [01:49] 🖼 Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology（可追踪证据增强的视觉基础推理：评估与方法） [02:28] 🤖 OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding（OST-Bench：评估多模态大语言模型在在线时空场景理解中的能力） [03:06] 🎬 Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs（面向视频大语言模型的免训练时空令牌融合加速） [03:49] 🤖 PyVision: Agentic Vision with Dynamic Tooling（PyVision：基于动态工具的Agentic视觉） [04:29] 🎬 Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling（几何强制：结合视频扩散与3D表示以实现一致的世界建模） [05:12] 🚀 LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS（LangSplatV2：高达450+ FPS的高维3D语言高斯溅射） [05:48] 🧠 Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs（跳过一层还是循环它？预训练LLM的测试时深度自适应） [06:33] 🎬 A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality（长视频叙事生成研究综述：架构、一致性与电影质量） [07:15] 🤖 Token Bottleneck: One Token to Remember Dynamics（令牌瓶颈：用一个令牌记住动态） [07:54] 🤥 Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models（机器胡扯：刻画大型语言模型中涌现的对真相的漠视） [08:41] 🧠 Beyond the Linear Separability Ceiling（超越线性可分性上限） [09:16] 🌱 Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate（生长中的Transformer：基于冻结基底的模块化组合与逐层扩展） [09:53] 🧪 SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam?（科学大师：迈向通用科学AI智能体，第一部分。X-Master作为基础：我们能在人类的最后一场考试中领先吗？）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
4D AGO

2025.07.10 | 零样本运动生成突破；4K图像超分辨率提升。

本期的 14 篇论文如下： [00:22] 🤸 Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data（趋向于零：基于百万级数据的零样本运动生成） [01:03] 🖼 4KAgent: Agentic Any Image to 4K Super-Resolution（4KAgent：将任意图像转化为4K超分辨率的智能体系统） [01:39] 🖼 Perception-Aware Policy Optimization for Multimodal Reasoning（多模态推理的感知感知策略优化） [02:24] 🧪 Rethinking Verification for LLM Code Generation: From Generation to Testing（重新思考LLM代码生成的验证：从生成到测试） [03:05] 🤔 A Systematic Analysis of Hybrid Linear Attention（混合线性注意力机制的系统性分析） [03:42] 🧠 First Return, Entropy-Eliciting Explore（首次回报，熵驱动探索） [04:23] 🤖 AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs（AutoTriton：基于大型语言模型中强化学习的自动Triton编程） [05:05] 🧩 Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving（通过解耦推理与证明来解决更具挑战性的国际数学奥林匹克竞赛题） [05:47] 🚗 A Survey on Vision-Language-Action Models for Autonomous Driving（面向自动驾驶的视觉-语言-动作模型综述） [06:29] 🧪 DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models（DiffSpectra：使用扩散模型从光谱中解析分子结构） [07:09] 🗣 ModelCitizens: Representing Community Voices in Online Safety（模范公民：在线安全中代表社区的声音） [07:50] 🤖 SRT-H: A Hierarchical Framework for Autonomous Surgery via Language Conditioned Imitation Learning（SRT-H：基于语言条件模仿学习的自主手术分层框架） [08:32] 🔬 Evaluating the Critical Risks of Amazon's Nova Premier under the Frontier Model Safety Framework（基于前沿模型安全框架评估亚马逊Nova Premier的关键风险） [09:21] 🧐 AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness（AdamMeme：自适应地探查多模态大型语言模型在有害性上的推理能力）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
5D AGO

2025.07.09 | 潜在推理提升LLM表达能力；SingLoRA优化低秩适应性能。

本期的 15 篇论文如下： [00:25] 🤔 A Survey on Latent Reasoning（潜在推理研究综述） [00:59] 💡 SingLoRA: Low Rank Adaptation Using a Single Matrix（SingLoRA：使用单矩阵的低秩适应） [01:47] 🧩 OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion（OmniPart：基于语义解耦和结构内聚的部件感知三维生成） [02:36] 🤖 CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization（CriticLean：评论引导的数学形式化强化学习） [03:17] 🤖 StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling（StreamVLN：基于慢速-快速上下文建模的流式视觉-语言导航） [03:50] 🫂 RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents（RLVER：基于可验证情感奖励的强化学习，用于培养共情智能体） [04:30] 🩺 MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos（MedGen：通过扩展细粒度标注的医学视频来解锁医学视频生成） [05:14] 🤖 Is Diversity All You Need for Scalable Robotic Manipulation?（可扩展的机器人操作是否只需要多样性？） [05:54] 🤖 Coding Triangle: How Does Large Language Model Understand Code?（代码三角形：大型语言模型如何理解代码？） [06:38] 🇪 Nile-Chat: Egyptian Language Models for Arabic and Latin Scripts（尼罗河Chat：用于阿拉伯语和拉丁语埃及语语言模型） [07:21] 🖱 GTA1: GUI Test-time Scaling Agent（GTA1：GUI测试时缩放代理） [08:00] 🧮 Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers（基于大语言模型的重排序器效率-效果再排序的FLOPs研究） [08:45] 🧬 PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs（PRING：重新思考从蛋白质对到图的蛋白质-蛋白质相互作用预测） [09:33] 🩻 SAMed-2: Selective Memory Enhanced Medical Segment Anything Model（SAMed-2：选择性记忆增强医学图像分割模型） [10:01] 🎬 Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation（Tora2：用于多实体视频生成的运动和外观定制扩散Transformer）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
6D AGO

2025.07.08 | MemOS提升内存管理效率；MLM与CLM结合优化编码器训练。

本期的 15 篇论文如下： [00:21] 🧠 MemOS: A Memory OS for AI System（MemOS：面向人工智能系统的内存操作系统） [01:07] 🤔 Should We Still Pretrain Encoders with Masked Language Modeling?（我们是否还应该使用掩码语言模型预训练编码器？） [01:43] 🎥 4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture（4DSloMo：基于异步捕获的高速场景4D重建） [02:22] 🤖 DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge（DreamVLA：一个基于综合世界知识构想的视觉-语言-动作模型） [03:02] 🤖 Pre-Trained Policy Discriminators are General Reward Models（预训练策略判别器是通用奖励模型） [03:38] 🧠 BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset（BMMR：一个大规模双语多模态多学科推理数据集） [04:23] 🤖 RoboBrain 2.0 Technical Report（RoboBrain 2.0 技术报告） [05:04] 🧩 Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents（Easy Dataset：一个从非结构化文档中合成LLM微调数据的统一且可扩展的框架） [05:42] ✨ RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs（RefineX：通过专家指导的程序学习大规模优化预训练数据） [06:21] 🎬 StreamDiT: Real-Time Streaming Text-to-Video Generation（StreamDiT：实时流式文本到视频生成） [07:04] 📜 Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration（复兴文化遗产：一种全面的历史文献修复新方法） [07:49] 💡 OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding（OmniDraft：一种用于端侧推测解码的跨词汇、在线自适应 Drafter） [08:35] 🎨 ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation（ArtifactsBench：弥合LLM代码生成评估中的视觉交互鸿沟） [09:16] 📊 On the rankability of visual embeddings（论视觉嵌入的可排序性） [09:59] 🖼 VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents（VLM2Vec-V2：推进视频、图像和视觉文档的多模态嵌入）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11 min
JUL 7

2025.07.07 | GPT-4o在语义任务中表现良好；潜在空间模拟精度高。

本期的 4 篇论文如下： [00:27] 🖼 How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks（GPT-4o的视觉理解能力如何？在标准计算机视觉任务上评估多模态基础模型） [01:09] 🌌 Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation（迷失于潜在空间：用于物理模拟的潜在扩散模型实证研究） [01:45] 🇮 Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages（Eka-Eval：一个用于印度语言大型语言模型的综合评估框架） [02:25] ✍ LitBench: A Benchmark and Dataset for Reliable Evaluation of Creative Writing（LitBench：创意写作可靠评估的基准和数据集）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

4 min
JUL 6

【周末特辑】7月第1周最火AI论文 | 多模态推理模型提升；短视频理解领先。

本期的 5 篇论文如下： [00:35] TOP1(🔥165) | 🧠 GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning（GLM-4.1V-Thinking：基于可扩展强化学习的通用多模态推理） [02:53] TOP2(🔥108) | 🎬 Kwai Keye-VL Technical Report（Kwai Keye-VL 技术报告） [05:17] TOP3(🔥67) | 🎨 LongAnimation: Long Animation Generation with Dynamic Global-Local Memory（LongAnimation：基于动态全局-局部记忆的长期动画生成） [07:40] TOP4(🔥67) | 🧭 WebSailor: Navigating Super-human Reasoning for Web Agent（WebSailor：为Web Agent导航超人推理） [10:00] TOP5(🔥58) | 🎨 BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing（BlenderFusion：基于3D的视觉编辑和生成式合成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13 min

See All (328)

out of 5

2 Ratings

支持！！

Feb 16

Fergie.W

希望能一直做下去

Creator

duan
Years Active

2024 - 2025
Episodes

328
Rating

Clean
Show Website

HuggingFace 每日AI论文速递

Technology

Technology

Updated Biweekly
Technology

Technology

Updated Weekly
Technology

Technology

Updated Weekly
Technology

Technology

Updated Biweekly
Investing

Investing

Updated Weekly
Business

Business

Updated Daily
Business

Business

Updated Monthly

HuggingFace 每日AI论文速递

2025.07.14 | 高效推理路径选择；压缩光场令牌渲染

【周末特辑】7月第2周最火AI论文 | 长视频推理框架创新；内存操作系统提升AI性能

2025.07.11 | 长视频推理效率提升；单图像定制模型防过拟合。

2025.07.10 | 零样本运动生成突破；4K图像超分辨率提升。

2025.07.09 | 潜在推理提升LLM表达能力；SingLoRA优化低秩适应性能。

2025.07.08 | MemOS提升内存管理效率；MLM与CLM结合优化编码器训练。

2025.07.07 | GPT-4o在语义任务中表现良好；潜在空间模拟精度高。

【周末特辑】7月第1周最火AI论文 | 多模态推理模型提升；短视频理解领先。

Ratings & Reviews

支持！！

About

Information

You Might Also Like

HuggingFace 每日AI论文速递

Episodes

2025.07.14 | 高效推理路径选择；压缩光场令牌渲染

【周末特辑】7月第2周最火AI论文 | 长视频推理框架创新；内存操作系统提升AI性能

2025.07.11 | 长视频推理效率提升；单图像定制模型防过拟合。

2025.07.10 | 零样本运动生成突破；4K图像超分辨率提升。

2025.07.09 | 潜在推理提升LLM表达能力；SingLoRA优化低秩适应性能。

2025.07.08 | MemOS提升内存管理效率；MLM与CLM结合优化编码器训练。

2025.07.07 | GPT-4o在语义任务中表现良好；潜在空间模拟精度高。

【周末特辑】7月第1周最火AI论文 | 多模态推理模型提升；短视频理解领先。

Ratings & Reviews

About

Information

You Might Also Like