HuggingFace 每日AI论文速递

duan
HuggingFace 每日AI论文速递

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. HÁ 16 H

    2025.07.18 | 优化LLMs上下文;提升视觉语言模型效率

    本期的 15 篇论文如下: [00:27] 🧮 A Survey of Context Engineering for Large Language Models(大型语言模型上下文工程综述) [01:16] 🧠 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning(VisionThink:基于强化学习的智能高效视觉语言模型) [02:08] 📸 $π^3$: Scalable Permutation-Equivariant Visual Geometry Learning($\pi^3$:可扩展的置换等变视觉几何学习) [02:52] 🤖 The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner(模仿游戏:图灵机模仿器是长度泛化的推理器) [03:47] 🖼 AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning(AnyCap项目:一个用于可控全模态图像描述的统一框架、数据集和基准) [04:47] 🧑 Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models(Diffuman4D:基于时空扩散模型的稀疏视角视频的4D一致性人体视角合成) [05:34] 🎭 FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers(梦幻肖像:利用表情增强的扩散Transformer提升多角色肖像动画效果) [06:23] 🧠 MindJourney: Test-Time Scaling with World Models for Spatial Reasoning(心灵之旅:基于世界模型的测试时空域推理扩展) [07:17] 🔬 AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research(AbGen:评估大型语言模型在科学研究的消融实验设计与评估中的能力) [08:08] 🗣 Voxtral(Voxtral:多模态音频聊天模型) [08:55] 💡 Teach Old SAEs New Domain Tricks with Boosting(利用Boosting技术使旧的稀疏自编码器掌握新的领域技巧) [09:46] 💡 FLEXITOKENS: Flexible Tokenization for Evolving Language Models(FLEXITOKENS:用于演化语言模型的灵活分词) [10:49] 🎬 TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation(TLB-VFI:用于视频帧插值的时序感知潜在布朗桥扩散模型) [11:45] 🛡 Automating Steering for Safe Multimodal Large Language Models(多模态大语言模型安全自动导向) [12:25] ⚙ RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization(RiemannLoRA:一种用于无歧义LoRA优化的统一黎曼框架) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    14min
  2. HÁ 1 DIA

    2025.07.17 | RAG提升LLM推理;PhysX生成物理3D资产

    本期的 13 篇论文如下: [00:26] 🧠 Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs(具身智能RAG与深度推理:LLM中RAG推理系统综述) [01:17] 🧱 PhysX: Physical-Grounded 3D Asset Generation(PhysX:基于物理的3D资产生成) [02:04] 🚗 MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding(MMHU:一个用于人类行为理解的大规模多模态基准) [03:05] 🚀 SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?(SWE-Perf:语言模型能否优化真实世界代码仓库的性能?) [04:00] 💃 MOSPA: Human Motion Generation Driven by Spatial Audio(MOSPA:空间音频驱动的人体动作生成) [04:57] 🏗 DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering(DrafterBench:用于土木工程中任务自动化的LLM基准测试) [05:58] 🤖 Seq vs Seq: An Open Suite of Paired Encoders and Decoders(序列模型对比:一个开放的配对编码器与解码器套件) [06:38] 🎬 AnyI2V: Animating Any Conditional Image with Motion Control(AnyI2V:通过运动控制动画化任何条件图像) [07:34] 🎯 SpatialTrackerV2: 3D Point Tracking Made Easy(SpatialTrackerV2:化繁为简的3D点追踪) [08:27] 🦎 Lizard: An Efficient Linearization Framework for Large Language Models(Lizard:一种用于大型语言模型的高效线性化框架) [09:14] 🧰 Replacing thinking with tool usage enables reasoning in small language models(以工具使用代替思考:小语言模型中的推理能力提升) [10:05] 🧙 AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles(CheckThat! 2025 挑战赛中的 AI 巫师:利用情感增强的 Transformer 嵌入改进新闻文章中的主观性检测) [10:51] 🧠 RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning(RLEP:基于经验回放的强化学习用于LLM推理) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    12min
  3. HÁ 2 DIAS

    2025.07.16 | VLV自编码器降低训练成本;EXAONE 4.0增强推理能力。

    本期的 8 篇论文如下: [00:28] 💡 Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models(视觉-语言-视觉自编码器:从扩散模型中进行可扩展的知识蒸馏) [01:27] 🤖 EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes(EXAONE 4.0:融合非推理与推理模式的统一大型语言模型) [02:24] ⚖ Scaling Laws for Optimal Data Mixtures(最优数据混合的缩放定律) [03:12] 🔬 Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers(多模态基础模型能理解示意图吗?基于科学论文的信息检索问答实证研究) [03:58] 🤝 AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs(AgentsNet: 多智能体LLM中的协同与合作推理) [04:50] 🦠 LLMalMorph: On The Feasibility of Generating Variant Malware using Large-Language-Models(LLM变种重塑:基于大型语言模型生成恶意软件变体的可行性研究) [05:38] 🤖 OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique(OpenCodeReasoning-II:一种基于自我评价的简单测试时缩放方法) [06:25] 🧠 Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs(根植于预训练,受微调影响:LLM中认知偏差的起源案例研究) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    8min
  4. HÁ 3 DIAS

    2025.07.15 | 数据集支持虚拟人生成;强化学习需防数据污染。

    本期的 12 篇论文如下: [00:24] 🗣 SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation(SpeakerVid-5M:用于视听二元交互式虚拟人生成的大规模高质量数据集) [01:12] 🤔 Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination(推理还是记忆?数据污染导致强化学习结果不可靠) [02:03] 🤖 EmbRACE-3K: Embodied Reasoning and Action in Complex Environments(EmbRACE-3K:复杂环境中的具身推理与行动) [03:02] 🤔 REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once(REST:通过同时提问多个问题来压力测试大型推理模型) [03:56] 🧮 Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation(递归混合:学习动态递归深度以实现自适应Token级别计算) [04:46] 🧠 LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers(LayerCake:大语言模型层内的Token感知对比解码) [05:39] ⚖ CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards(CompassJudger-2:基于可验证奖励的通用判别模型) [06:27] 🎬 MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second(MoVieS:一秒内实现运动感知的四维动态视角合成) [07:18] 🧮 A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning(数学大型语言模型的实用两阶段方案:通过监督微调最大化准确率,通过强化学习优化效率) [08:05] 🇰 From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation(从KMMLU-Redux到KMMLU-Pro:用于LLM评估的专业韩国基准套件) [09:08] 🖼 DreamPoster: A Unified Framework for Image-Conditioned Generative Poster Design(DreamPoster:一个用于图像条件生成海报设计的统一框架) [09:54] 🖼 Favicon Trojans: Executable Steganography Via Ico Alpha Channel Exploitation(Favicon木马:通过ICO Alpha通道利用实现的可执行隐写术) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11min
  5. HÁ 4 DIAS

    2025.07.14 | 高效推理路径选择;压缩光场令牌渲染

    本期的 14 篇论文如下: [00:22] 🧠 Test-Time Scaling with Reflective Generative Model(基于反射生成模型的测试时缩放) [00:59] 💡 CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering(CLiFT:用于计算高效和自适应神经渲染的压缩光场令牌) [01:34] 💻 NeuralOS: Towards Simulating Operating Systems via Neural Generative Models(NeuralOS:迈向通过神经生成模型模拟操作系统的方向) [02:19] 🧠 KV Cache Steering for Inducing Reasoning in Small Language Models(用于诱导小语言模型推理的KV缓存引导) [03:03] 🧠 Neural-Driven Image Editing(神经驱动的图像编辑) [03:42] 🎬 Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective(Lumos-1:基于统一模型视角的自回归视频生成) [04:27] 🧠 Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning(开放视觉推理器:迁移语言认知行为以实现视觉推理) [05:14] 🧩 From One to More: Contextual Part Latents for 3D Generation(从一到多:用于3D生成的上下文部件隐变量) [05:53] 🤖 One Token to Fool LLM-as-a-Judge(一个Token即可欺骗LLM法官) [06:32] 🖼 Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation(视觉基础模型作为自回归图像生成的有效视觉标记器) [07:16] 🔭 What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models(基础模型发现了什么?利用归纳偏置来探测世界模型) [08:00] 🚀 Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities(Gemini 2.5:通过高级推理、多模态、长上下文和下一代 Agent 能力推向新前沿) [08:48] 🚀 BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity(BlockFFN:面向终端侧加速友好的块级激活稀疏混合专家模型) [09:25] 😵 Robust Multimodal Large Language Models Against Modality Conflict(面向模态冲突的鲁棒多模态大语言模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    10min
  6. 11 DE JUL.

    2025.07.11 | 长视频推理效率提升;单图像定制模型防过拟合。

    本期的 15 篇论文如下: [00:25] 🎬 Scaling RL to Long Videos(强化学习驱动视觉语言模型扩展至长视频) [01:10] 🖼 T-LoRA: Single Image Diffusion Model Customization Without Overfitting(T-LoRA:无过拟合的单图像扩散模型定制) [01:49] 🖼 Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology(可追踪证据增强的视觉基础推理:评估与方法) [02:28] 🤖 OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding(OST-Bench:评估多模态大语言模型在在线时空场景理解中的能力) [03:06] 🎬 Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs(面向视频大语言模型的免训练时空令牌融合加速) [03:49] 🤖 PyVision: Agentic Vision with Dynamic Tooling(PyVision:基于动态工具的Agentic视觉) [04:29] 🎬 Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling(几何强制:结合视频扩散与3D表示以实现一致的世界建模) [05:12] 🚀 LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS(LangSplatV2:高达450+ FPS的高维3D语言高斯溅射) [05:48] 🧠 Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs(跳过一层还是循环它?预训练LLM的测试时深度自适应) [06:33] 🎬 A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality(长视频叙事生成研究综述:架构、一致性与电影质量) [07:15] 🤖 Token Bottleneck: One Token to Remember Dynamics(令牌瓶颈:用一个令牌记住动态) [07:54] 🤥 Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models(机器胡扯:刻画大型语言模型中涌现的对真相的漠视) [08:41] 🧠 Beyond the Linear Separability Ceiling(超越线性可分性上限) [09:16] 🌱 Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate(生长中的Transformer:基于冻结基底的模块化组合与逐层扩展) [09:53] 🧪 SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam?(科学大师:迈向通用科学AI智能体,第一部分。X-Master作为基础:我们能在人类的最后一场考试中领先吗?) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11min
  7. 10 DE JUL.

    2025.07.10 | 零样本运动生成突破;4K图像超分辨率提升。

    本期的 14 篇论文如下: [00:22] 🤸 Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data(趋向于零:基于百万级数据的零样本运动生成) [01:03] 🖼 4KAgent: Agentic Any Image to 4K Super-Resolution(4KAgent:将任意图像转化为4K超分辨率的智能体系统) [01:39] 🖼 Perception-Aware Policy Optimization for Multimodal Reasoning(多模态推理的感知感知策略优化) [02:24] 🧪 Rethinking Verification for LLM Code Generation: From Generation to Testing(重新思考LLM代码生成的验证:从生成到测试) [03:05] 🤔 A Systematic Analysis of Hybrid Linear Attention(混合线性注意力机制的系统性分析) [03:42] 🧠 First Return, Entropy-Eliciting Explore(首次回报,熵驱动探索) [04:23] 🤖 AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs(AutoTriton:基于大型语言模型中强化学习的自动Triton编程) [05:05] 🧩 Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving(通过解耦推理与证明来解决更具挑战性的国际数学奥林匹克竞赛题) [05:47] 🚗 A Survey on Vision-Language-Action Models for Autonomous Driving(面向自动驾驶的视觉-语言-动作模型综述) [06:29] 🧪 DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models(DiffSpectra:使用扩散模型从光谱中解析分子结构) [07:09] 🗣 ModelCitizens: Representing Community Voices in Online Safety(模范公民:在线安全中代表社区的声音) [07:50] 🤖 SRT-H: A Hierarchical Framework for Autonomous Surgery via Language Conditioned Imitation Learning(SRT-H:基于语言条件模仿学习的自主手术分层框架) [08:32] 🔬 Evaluating the Critical Risks of Amazon's Nova Premier under the Frontier Model Safety Framework(基于前沿模型安全框架评估亚马逊Nova Premier的关键风险) [09:21] 🧐 AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness(AdamMeme:自适应地探查多模态大型语言模型在有害性上的推理能力) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11min

Classificações e avaliações

5
de 5
2 avaliações

Sobre

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

Você também pode gostar de

Para ouvir episódios explícitos, inicie sessão.

Fique por dentro deste podcast

Inicie sessão ou crie uma conta para seguir podcasts, salvar episódios e receber as atualizações mais recentes.

Selecionar um país ou região

África, Oriente Médio e Índia

Ásia‑Pacífico

Europa

América Latina e Caribe

Estados Unidos e Canadá