HuggingFace 每日AI论文速递

duan

5,0 (2)
TECNOLOGIA
DIÁRIO

每天10分钟，带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新，欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版，可在小红书搜索并关注【AI速递】

HÁ 16 H

2025.07.18 | 优化LLMs上下文；提升视觉语言模型效率

本期的 15 篇论文如下： [00:27] 🧮 A Survey of Context Engineering for Large Language Models（大型语言模型上下文工程综述） [01:16] 🧠 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning（VisionThink：基于强化学习的智能高效视觉语言模型） [02:08] 📸 $π^3$: Scalable Permutation-Equivariant Visual Geometry Learning（$\pi^3$：可扩展的置换等变视觉几何学习） [02:52] 🤖 The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner（模仿游戏：图灵机模仿器是长度泛化的推理器） [03:47] 🖼 AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning（AnyCap项目：一个用于可控全模态图像描述的统一框架、数据集和基准） [04:47] 🧑 Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models（Diffuman4D：基于时空扩散模型的稀疏视角视频的4D一致性人体视角合成） [05:34] 🎭 FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers（梦幻肖像：利用表情增强的扩散Transformer提升多角色肖像动画效果） [06:23] 🧠 MindJourney: Test-Time Scaling with World Models for Spatial Reasoning（心灵之旅：基于世界模型的测试时空域推理扩展） [07:17] 🔬 AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research（AbGen：评估大型语言模型在科学研究的消融实验设计与评估中的能力） [08:08] 🗣 Voxtral（Voxtral：多模态音频聊天模型） [08:55] 💡 Teach Old SAEs New Domain Tricks with Boosting（利用Boosting技术使旧的稀疏自编码器掌握新的领域技巧） [09:46] 💡 FLEXITOKENS: Flexible Tokenization for Evolving Language Models（FLEXITOKENS：用于演化语言模型的灵活分词） [10:49] 🎬 TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation（TLB-VFI：用于视频帧插值的时序感知潜在布朗桥扩散模型） [11:45] 🛡 Automating Steering for Safe Multimodal Large Language Models（多模态大语言模型安全自动导向） [12:25] ⚙ RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization（RiemannLoRA：一种用于无歧义LoRA优化的统一黎曼框架）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

14min
HÁ 1 DIA

2025.07.17 | RAG提升LLM推理；PhysX生成物理3D资产

本期的 13 篇论文如下： [00:26] 🧠 Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs（具身智能RAG与深度推理：LLM中RAG推理系统综述） [01:17] 🧱 PhysX: Physical-Grounded 3D Asset Generation（PhysX：基于物理的3D资产生成） [02:04] 🚗 MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding（MMHU：一个用于人类行为理解的大规模多模态基准） [03:05] 🚀 SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?（SWE-Perf：语言模型能否优化真实世界代码仓库的性能？） [04:00] 💃 MOSPA: Human Motion Generation Driven by Spatial Audio（MOSPA：空间音频驱动的人体动作生成） [04:57] 🏗 DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering（DrafterBench：用于土木工程中任务自动化的LLM基准测试） [05:58] 🤖 Seq vs Seq: An Open Suite of Paired Encoders and Decoders（序列模型对比：一个开放的配对编码器与解码器套件） [06:38] 🎬 AnyI2V: Animating Any Conditional Image with Motion Control（AnyI2V：通过运动控制动画化任何条件图像） [07:34] 🎯 SpatialTrackerV2: 3D Point Tracking Made Easy（SpatialTrackerV2：化繁为简的3D点追踪） [08:27] 🦎 Lizard: An Efficient Linearization Framework for Large Language Models（Lizard：一种用于大型语言模型的高效线性化框架） [09:14] 🧰 Replacing thinking with tool usage enables reasoning in small language models（以工具使用代替思考：小语言模型中的推理能力提升） [10:05] 🧙 AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles（CheckThat! 2025 挑战赛中的 AI 巫师：利用情感增强的 Transformer 嵌入改进新闻文章中的主观性检测） [10:51] 🧠 RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning（RLEP：基于经验回放的强化学习用于LLM推理）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12min
HÁ 2 DIAS

2025.07.16 | VLV自编码器降低训练成本；EXAONE 4.0增强推理能力。

本期的 8 篇论文如下： [00:28] 💡 Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models（视觉-语言-视觉自编码器：从扩散模型中进行可扩展的知识蒸馏） [01:27] 🤖 EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes（EXAONE 4.0：融合非推理与推理模式的统一大型语言模型） [02:24] ⚖ Scaling Laws for Optimal Data Mixtures（最优数据混合的缩放定律） [03:12] 🔬 Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers（多模态基础模型能理解示意图吗？基于科学论文的信息检索问答实证研究） [03:58] 🤝 AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs（AgentsNet: 多智能体LLM中的协同与合作推理） [04:50] 🦠 LLMalMorph: On The Feasibility of Generating Variant Malware using Large-Language-Models（LLM变种重塑：基于大型语言模型生成恶意软件变体的可行性研究） [05:38] 🤖 OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique（OpenCodeReasoning-II：一种基于自我评价的简单测试时缩放方法） [06:25] 🧠 Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs（根植于预训练，受微调影响：LLM中认知偏差的起源案例研究）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

8min
HÁ 3 DIAS

2025.07.15 | 数据集支持虚拟人生成；强化学习需防数据污染。

本期的 12 篇论文如下： [00:24] 🗣 SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation（SpeakerVid-5M：用于视听二元交互式虚拟人生成的大规模高质量数据集） [01:12] 🤔 Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination（推理还是记忆？数据污染导致强化学习结果不可靠） [02:03] 🤖 EmbRACE-3K: Embodied Reasoning and Action in Complex Environments（EmbRACE-3K：复杂环境中的具身推理与行动） [03:02] 🤔 REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once（REST：通过同时提问多个问题来压力测试大型推理模型） [03:56] 🧮 Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation（递归混合：学习动态递归深度以实现自适应Token级别计算） [04:46] 🧠 LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers（LayerCake：大语言模型层内的Token感知对比解码） [05:39] ⚖ CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards（CompassJudger-2：基于可验证奖励的通用判别模型） [06:27] 🎬 MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second（MoVieS：一秒内实现运动感知的四维动态视角合成） [07:18] 🧮 A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning（数学大型语言模型的实用两阶段方案：通过监督微调最大化准确率，通过强化学习优化效率） [08:05] 🇰 From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation（从KMMLU-Redux到KMMLU-Pro：用于LLM评估的专业韩国基准套件） [09:08] 🖼 DreamPoster: A Unified Framework for Image-Conditioned Generative Poster Design（DreamPoster：一个用于图像条件生成海报设计的统一框架） [09:54] 🖼 Favicon Trojans: Executable Steganography Via Ico Alpha Channel Exploitation（Favicon木马：通过ICO Alpha通道利用实现的可执行隐写术）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11min
HÁ 4 DIAS

2025.07.14 | 高效推理路径选择；压缩光场令牌渲染

本期的 14 篇论文如下： [00:22] 🧠 Test-Time Scaling with Reflective Generative Model（基于反射生成模型的测试时缩放） [00:59] 💡 CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering（CLiFT：用于计算高效和自适应神经渲染的压缩光场令牌） [01:34] 💻 NeuralOS: Towards Simulating Operating Systems via Neural Generative Models（NeuralOS：迈向通过神经生成模型模拟操作系统的方向） [02:19] 🧠 KV Cache Steering for Inducing Reasoning in Small Language Models（用于诱导小语言模型推理的KV缓存引导） [03:03] 🧠 Neural-Driven Image Editing（神经驱动的图像编辑） [03:42] 🎬 Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective（Lumos-1：基于统一模型视角的自回归视频生成） [04:27] 🧠 Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning（开放视觉推理器：迁移语言认知行为以实现视觉推理） [05:14] 🧩 From One to More: Contextual Part Latents for 3D Generation（从一到多：用于3D生成的上下文部件隐变量） [05:53] 🤖 One Token to Fool LLM-as-a-Judge（一个Token即可欺骗LLM法官） [06:32] 🖼 Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation（视觉基础模型作为自回归图像生成的有效视觉标记器） [07:16] 🔭 What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models（基础模型发现了什么？利用归纳偏置来探测世界模型） [08:00] 🚀 Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities（Gemini 2.5：通过高级推理、多模态、长上下文和下一代 Agent 能力推向新前沿） [08:48] 🚀 BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity（BlockFFN：面向终端侧加速友好的块级激活稀疏混合专家模型） [09:25] 😵 Robust Multimodal Large Language Models Against Modality Conflict（面向模态冲突的鲁棒多模态大语言模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10min
HÁ 6 DIAS

【周末特辑】7月第2周最火AI论文 | 长视频推理框架创新；内存操作系统提升AI性能

本期的 5 篇论文如下： [00:42] TOP1(🔥109) | 🎬 Scaling RL to Long Videos（强化学习驱动视觉语言模型扩展至长视频） [02:54] TOP2(🔥106) | 🧠 MemOS: A Memory OS for AI System（MemOS：面向人工智能系统的内存操作系统） [05:19] TOP3(🔥91) | 🖼 T-LoRA: Single Image Diffusion Model Customization Without Overfitting（T-LoRA：无过拟合的单图像扩散模型定制） [07:51] TOP4(🔥88) | 💡 SingLoRA: Low Rank Adaptation Using a Single Matrix（SingLoRA：使用单矩阵的低秩适应） [09:41] TOP5(🔥72) | 🤔 Should We Still Pretrain Encoders with Masked Language Modeling?（我们是否还应该使用掩码语言模型预训练编码器？）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12min
11 DE JUL.

2025.07.11 | 长视频推理效率提升；单图像定制模型防过拟合。

本期的 15 篇论文如下： [00:25] 🎬 Scaling RL to Long Videos（强化学习驱动视觉语言模型扩展至长视频） [01:10] 🖼 T-LoRA: Single Image Diffusion Model Customization Without Overfitting（T-LoRA：无过拟合的单图像扩散模型定制） [01:49] 🖼 Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology（可追踪证据增强的视觉基础推理：评估与方法） [02:28] 🤖 OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding（OST-Bench：评估多模态大语言模型在在线时空场景理解中的能力） [03:06] 🎬 Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs（面向视频大语言模型的免训练时空令牌融合加速） [03:49] 🤖 PyVision: Agentic Vision with Dynamic Tooling（PyVision：基于动态工具的Agentic视觉） [04:29] 🎬 Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling（几何强制：结合视频扩散与3D表示以实现一致的世界建模） [05:12] 🚀 LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS（LangSplatV2：高达450+ FPS的高维3D语言高斯溅射） [05:48] 🧠 Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs（跳过一层还是循环它？预训练LLM的测试时深度自适应） [06:33] 🎬 A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality（长视频叙事生成研究综述：架构、一致性与电影质量） [07:15] 🤖 Token Bottleneck: One Token to Remember Dynamics（令牌瓶颈：用一个令牌记住动态） [07:54] 🤥 Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models（机器胡扯：刻画大型语言模型中涌现的对真相的漠视） [08:41] 🧠 Beyond the Linear Separability Ceiling（超越线性可分性上限） [09:16] 🌱 Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate（生长中的Transformer：基于冻结基底的模块化组合与逐层扩展） [09:53] 🧪 SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam?（科学大师：迈向通用科学AI智能体，第一部分。X-Master作为基础：我们能在人类的最后一场考试中领先吗？）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11min
10 DE JUL.

2025.07.10 | 零样本运动生成突破；4K图像超分辨率提升。

本期的 14 篇论文如下： [00:22] 🤸 Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data（趋向于零：基于百万级数据的零样本运动生成） [01:03] 🖼 4KAgent: Agentic Any Image to 4K Super-Resolution（4KAgent：将任意图像转化为4K超分辨率的智能体系统） [01:39] 🖼 Perception-Aware Policy Optimization for Multimodal Reasoning（多模态推理的感知感知策略优化） [02:24] 🧪 Rethinking Verification for LLM Code Generation: From Generation to Testing（重新思考LLM代码生成的验证：从生成到测试） [03:05] 🤔 A Systematic Analysis of Hybrid Linear Attention（混合线性注意力机制的系统性分析） [03:42] 🧠 First Return, Entropy-Eliciting Explore（首次回报，熵驱动探索） [04:23] 🤖 AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs（AutoTriton：基于大型语言模型中强化学习的自动Triton编程） [05:05] 🧩 Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving（通过解耦推理与证明来解决更具挑战性的国际数学奥林匹克竞赛题） [05:47] 🚗 A Survey on Vision-Language-Action Models for Autonomous Driving（面向自动驾驶的视觉-语言-动作模型综述） [06:29] 🧪 DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models（DiffSpectra：使用扩散模型从光谱中解析分子结构） [07:09] 🗣 ModelCitizens: Representing Community Voices in Online Safety（模范公民：在线安全中代表社区的声音） [07:50] 🤖 SRT-H: A Hierarchical Framework for Autonomous Surgery via Language Conditioned Imitation Learning（SRT-H：基于语言条件模仿学习的自主手术分层框架） [08:32] 🔬 Evaluating the Critical Risks of Amazon's Nova Premier under the Frontier Model Safety Framework（基于前沿模型安全框架评估亚马逊Nova Premier的关键风险） [09:21] 🧐 AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness（AdamMeme：自适应地探查多模态大型语言模型在有害性上的推理能力）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11min

Ver tudo (332)

de 5

2 avaliações

支持！！

16 de fev.

Fergie.W

希望能一直做下去

Criado por

duan
Anos de atividade

2024 - 2025
Episódios

332
Classificação

Livre
Site do podcast

HuggingFace 每日AI论文速递

Tecnologia

Tecnologia

Semanal
Tecnologia

Tecnologia

Semanal
Tecnologia

Tecnologia

Semanal
Tecnologia

Tecnologia

Semanal
Investimentos

Investimentos

Semanal
Negócios

Negócios

Diário
Lazer

Lazer

Semanal

HuggingFace 每日AI论文速递

2025.07.18 | 优化LLMs上下文；提升视觉语言模型效率

2025.07.17 | RAG提升LLM推理；PhysX生成物理3D资产

2025.07.16 | VLV自编码器降低训练成本；EXAONE 4.0增强推理能力。

2025.07.15 | 数据集支持虚拟人生成；强化学习需防数据污染。

2025.07.14 | 高效推理路径选择；压缩光场令牌渲染

【周末特辑】7月第2周最火AI论文 | 长视频推理框架创新；内存操作系统提升AI性能

2025.07.11 | 长视频推理效率提升；单图像定制模型防过拟合。

2025.07.10 | 零样本运动生成突破；4K图像超分辨率提升。

Classificações e avaliações

支持！！

Sobre

Informações

Você também pode gostar de

HuggingFace 每日AI论文速递

Episódios

2025.07.18 | 优化LLMs上下文；提升视觉语言模型效率

2025.07.17 | RAG提升LLM推理；PhysX生成物理3D资产

2025.07.16 | VLV自编码器降低训练成本；EXAONE 4.0增强推理能力。

2025.07.15 | 数据集支持虚拟人生成；强化学习需防数据污染。

2025.07.14 | 高效推理路径选择；压缩光场令牌渲染

【周末特辑】7月第2周最火AI论文 | 长视频推理框架创新；内存操作系统提升AI性能

2025.07.11 | 长视频推理效率提升；单图像定制模型防过拟合。

2025.07.10 | 零样本运动生成突破；4K图像超分辨率提升。

Classificações e avaliações

Sobre

Informações

Você também pode gostar de