HuggingFace 每日AI论文速递

duan
HuggingFace 每日AI论文速递

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. -23 Ч

    2025.07.22 | MiroMind-M1提升数学推理;GUI-G$^2$高斯奖励助GUI定位。

    本期的 15 篇论文如下: [00:25] 🧮 MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization(MiroMind-M1:通过上下文感知多阶段策略优化实现数学推理的开源进展) [01:00] 🎯 GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding(GUI-G$^2$: 用于GUI定位的高斯奖励建模) [01:42] ⛓ The Invisible Leash: Why RLVR May Not Escape Its Origin(隐形束缚:RLVR为何难以摆脱其起源) [02:53] 🏗 WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization(WebShaper:通过信息寻求形式化实现代理式数据合成) [03:20] 🤖 NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining(无需人工:自主高质量图像编辑三元组挖掘) [04:23] 🛠 Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling(鲁棒的3D遮罩部件级编辑:基于正则化得分蒸馏采样的3D高斯泼溅) [05:15] 🧠 SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction(SeC:通过渐进式概念构建推进复杂视频对象分割) [06:19] 🤖 GR-3 Technical Report(GR-3技术报告) [07:08] 🤖 Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos(Being-H0:基于大规模人类视频的视觉-语言-动作预训练) [08:12] 💡 Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR(稳定知识,促进推理:RLVR的双令牌约束) [09:12] 🧠 Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding(迈向视频思维测试:一个用于高级视频推理和理解的综合基准) [09:52] 📉 Inverse Scaling in Test-Time Compute(测试时计算中的逆向扩展) [10:32] 💡 Gaussian Splatting with Discretized SDF for Relightable Assets(基于离散化SDF的高斯泼溅技术,用于可重光照资产) [11:24] 🧠 STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models(STITCH:口语语言模型中基于分块推理的同步思考与表达) [12:13] ⏩ Streaming 4D Visual Geometry Transformer(流式4D视觉几何Transformer) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    13 мин.
  2. -1 ДН.

    2025.07.21 | dLLM新型安全漏洞,现有防御不足;俄语语音合成,数据与标注是核心。

    本期的 10 篇论文如下: [00:20] 😈 The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs(隐藏在面具后的恶魔:扩散大语言模型的一种新兴安全漏洞) [01:12] 🎤 A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models(解决俄语语音生成模型中语音与韵律挑战的数据中心框架) [02:07] 🧩 Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning(Franca:用于可扩展视觉表示学习的嵌套套娃聚类) [02:49] 🚀 Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models(Mono-InternVL-1.5:迈向更经济、更快速的单体多模态大语言模型) [03:24] 🎨 CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models(CSD-VAR:视觉自回归模型中的内容-风格分解) [04:27] 🚀 RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services(RedOne:揭示社交网络服务中领域专用LLM的后训练) [05:08] 🤝 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities(逆向强化学习与大语言模型后训练的结合:基础、进展与机遇) [05:41] 🚫 Mitigating Object Hallucinations via Sentence-Level Early Intervention(通过句子级早期干预缓解物体幻觉) [06:20] ⚡ The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations(生成式能源竞技场 (GEA):在大型语言模型 (LLM) 人工评估中融入能源意识) [07:41] 📈 Quantitative Risk Management in Volatile Markets with an Expectile-Based Framework for the FTSE Index(波动市场中基于期望分位数框架的定量风险管理:以富时指数为例) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    9 мин.
  3. -4 ДН.

    2025.07.18 | 优化LLMs上下文;提升视觉语言模型效率

    本期的 15 篇论文如下: [00:27] 🧮 A Survey of Context Engineering for Large Language Models(大型语言模型上下文工程综述) [01:16] 🧠 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning(VisionThink:基于强化学习的智能高效视觉语言模型) [02:08] 📸 $π^3$: Scalable Permutation-Equivariant Visual Geometry Learning($\pi^3$:可扩展的置换等变视觉几何学习) [02:52] 🤖 The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner(模仿游戏:图灵机模仿器是长度泛化的推理器) [03:47] 🖼 AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning(AnyCap项目:一个用于可控全模态图像描述的统一框架、数据集和基准) [04:47] 🧑 Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models(Diffuman4D:基于时空扩散模型的稀疏视角视频的4D一致性人体视角合成) [05:34] 🎭 FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers(梦幻肖像:利用表情增强的扩散Transformer提升多角色肖像动画效果) [06:23] 🧠 MindJourney: Test-Time Scaling with World Models for Spatial Reasoning(心灵之旅:基于世界模型的测试时空域推理扩展) [07:17] 🔬 AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research(AbGen:评估大型语言模型在科学研究的消融实验设计与评估中的能力) [08:08] 🗣 Voxtral(Voxtral:多模态音频聊天模型) [08:55] 💡 Teach Old SAEs New Domain Tricks with Boosting(利用Boosting技术使旧的稀疏自编码器掌握新的领域技巧) [09:46] 💡 FLEXITOKENS: Flexible Tokenization for Evolving Language Models(FLEXITOKENS:用于演化语言模型的灵活分词) [10:49] 🎬 TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation(TLB-VFI:用于视频帧插值的时序感知潜在布朗桥扩散模型) [11:45] 🛡 Automating Steering for Safe Multimodal Large Language Models(多模态大语言模型安全自动导向) [12:25] ⚙ RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization(RiemannLoRA:一种用于无歧义LoRA优化的统一黎曼框架) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    14 мин.
  4. -5 ДН.

    2025.07.17 | RAG提升LLM推理;PhysX生成物理3D资产

    本期的 13 篇论文如下: [00:26] 🧠 Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs(具身智能RAG与深度推理:LLM中RAG推理系统综述) [01:17] 🧱 PhysX: Physical-Grounded 3D Asset Generation(PhysX:基于物理的3D资产生成) [02:04] 🚗 MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding(MMHU:一个用于人类行为理解的大规模多模态基准) [03:05] 🚀 SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?(SWE-Perf:语言模型能否优化真实世界代码仓库的性能?) [04:00] 💃 MOSPA: Human Motion Generation Driven by Spatial Audio(MOSPA:空间音频驱动的人体动作生成) [04:57] 🏗 DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering(DrafterBench:用于土木工程中任务自动化的LLM基准测试) [05:58] 🤖 Seq vs Seq: An Open Suite of Paired Encoders and Decoders(序列模型对比:一个开放的配对编码器与解码器套件) [06:38] 🎬 AnyI2V: Animating Any Conditional Image with Motion Control(AnyI2V:通过运动控制动画化任何条件图像) [07:34] 🎯 SpatialTrackerV2: 3D Point Tracking Made Easy(SpatialTrackerV2:化繁为简的3D点追踪) [08:27] 🦎 Lizard: An Efficient Linearization Framework for Large Language Models(Lizard:一种用于大型语言模型的高效线性化框架) [09:14] 🧰 Replacing thinking with tool usage enables reasoning in small language models(以工具使用代替思考:小语言模型中的推理能力提升) [10:05] 🧙 AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles(CheckThat! 2025 挑战赛中的 AI 巫师:利用情感增强的 Transformer 嵌入改进新闻文章中的主观性检测) [10:51] 🧠 RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning(RLEP:基于经验回放的强化学习用于LLM推理) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    12 мин.
  5. -6 ДН.

    2025.07.16 | VLV自编码器降低训练成本;EXAONE 4.0增强推理能力。

    本期的 8 篇论文如下: [00:28] 💡 Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models(视觉-语言-视觉自编码器:从扩散模型中进行可扩展的知识蒸馏) [01:27] 🤖 EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes(EXAONE 4.0:融合非推理与推理模式的统一大型语言模型) [02:24] ⚖ Scaling Laws for Optimal Data Mixtures(最优数据混合的缩放定律) [03:12] 🔬 Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers(多模态基础模型能理解示意图吗?基于科学论文的信息检索问答实证研究) [03:58] 🤝 AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs(AgentsNet: 多智能体LLM中的协同与合作推理) [04:50] 🦠 LLMalMorph: On The Feasibility of Generating Variant Malware using Large-Language-Models(LLM变种重塑:基于大型语言模型生成恶意软件变体的可行性研究) [05:38] 🤖 OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique(OpenCodeReasoning-II:一种基于自我评价的简单测试时缩放方法) [06:25] 🧠 Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs(根植于预训练,受微调影响:LLM中认知偏差的起源案例研究) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    8 мин.
  6. 16 ИЮЛ.

    2025.07.15 | 数据集支持虚拟人生成;强化学习需防数据污染。

    本期的 12 篇论文如下: [00:24] 🗣 SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation(SpeakerVid-5M:用于视听二元交互式虚拟人生成的大规模高质量数据集) [01:12] 🤔 Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination(推理还是记忆?数据污染导致强化学习结果不可靠) [02:03] 🤖 EmbRACE-3K: Embodied Reasoning and Action in Complex Environments(EmbRACE-3K:复杂环境中的具身推理与行动) [03:02] 🤔 REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once(REST:通过同时提问多个问题来压力测试大型推理模型) [03:56] 🧮 Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation(递归混合:学习动态递归深度以实现自适应Token级别计算) [04:46] 🧠 LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers(LayerCake:大语言模型层内的Token感知对比解码) [05:39] ⚖ CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards(CompassJudger-2:基于可验证奖励的通用判别模型) [06:27] 🎬 MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second(MoVieS:一秒内实现运动感知的四维动态视角合成) [07:18] 🧮 A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning(数学大型语言模型的实用两阶段方案:通过监督微调最大化准确率,通过强化学习优化效率) [08:05] 🇰 From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation(从KMMLU-Redux到KMMLU-Pro:用于LLM评估的专业韩国基准套件) [09:08] 🖼 DreamPoster: A Unified Framework for Image-Conditioned Generative Poster Design(DreamPoster:一个用于图像条件生成海报设计的统一框架) [09:54] 🖼 Favicon Trojans: Executable Steganography Via Ico Alpha Channel Exploitation(Favicon木马:通过ICO Alpha通道利用实现的可执行隐写术) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 мин.
  7. 14 ИЮЛ.

    2025.07.14 | 高效推理路径选择;压缩光场令牌渲染

    本期的 14 篇论文如下: [00:22] 🧠 Test-Time Scaling with Reflective Generative Model(基于反射生成模型的测试时缩放) [00:59] 💡 CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering(CLiFT:用于计算高效和自适应神经渲染的压缩光场令牌) [01:34] 💻 NeuralOS: Towards Simulating Operating Systems via Neural Generative Models(NeuralOS:迈向通过神经生成模型模拟操作系统的方向) [02:19] 🧠 KV Cache Steering for Inducing Reasoning in Small Language Models(用于诱导小语言模型推理的KV缓存引导) [03:03] 🧠 Neural-Driven Image Editing(神经驱动的图像编辑) [03:42] 🎬 Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective(Lumos-1:基于统一模型视角的自回归视频生成) [04:27] 🧠 Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning(开放视觉推理器:迁移语言认知行为以实现视觉推理) [05:14] 🧩 From One to More: Contextual Part Latents for 3D Generation(从一到多:用于3D生成的上下文部件隐变量) [05:53] 🤖 One Token to Fool LLM-as-a-Judge(一个Token即可欺骗LLM法官) [06:32] 🖼 Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation(视觉基础模型作为自回归图像生成的有效视觉标记器) [07:16] 🔭 What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models(基础模型发现了什么?利用归纳偏置来探测世界模型) [08:00] 🚀 Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities(Gemini 2.5:通过高级推理、多模态、长上下文和下一代 Agent 能力推向新前沿) [08:48] 🚀 BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity(BlockFFN:面向终端侧加速友好的块级激活稀疏混合专家模型) [09:25] 😵 Robust Multimodal Large Language Models Against Modality Conflict(面向模态冲突的鲁棒多模态大语言模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    10 мин.

Оценки и отзывы

5
из 5
Оценок: 2

Об этом подкасте

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

Вам может также понравиться

Чтобы прослушивать выпуски с ненормативным контентом, войдите в систему.

Следите за новостями подкаста

Войдите в систему или зарегистрируйтесь, чтобы следить за подкастами, сохранять выпуски и получать последние обновления.

Выберите страну или регион

Африка, Ближний Восток и Индия

Азиатско-Тихоокеанский регион

Европа

Латинская Америка и страны Карибского бассейна

США и Канада