HuggingFace 每日AI论文速递

duan
HuggingFace 每日AI论文速递

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. قبل ١٠ ساعات

    2025.07.21 | dLLM新型安全漏洞,现有防御不足;俄语语音合成,数据与标注是核心。

    本期的 10 篇论文如下: [00:20] 😈 The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs(隐藏在面具后的恶魔:扩散大语言模型的一种新兴安全漏洞) [01:12] 🎤 A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models(解决俄语语音生成模型中语音与韵律挑战的数据中心框架) [02:07] 🧩 Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning(Franca:用于可扩展视觉表示学习的嵌套套娃聚类) [02:49] 🚀 Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models(Mono-InternVL-1.5:迈向更经济、更快速的单体多模态大语言模型) [03:24] 🎨 CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models(CSD-VAR:视觉自回归模型中的内容-风格分解) [04:27] 🚀 RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services(RedOne:揭示社交网络服务中领域专用LLM的后训练) [05:08] 🤝 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities(逆向强化学习与大语言模型后训练的结合:基础、进展与机遇) [05:41] 🚫 Mitigating Object Hallucinations via Sentence-Level Early Intervention(通过句子级早期干预缓解物体幻觉) [06:20] ⚡ The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations(生成式能源竞技场 (GEA):在大型语言模型 (LLM) 人工评估中融入能源意识) [07:41] 📈 Quantitative Risk Management in Volatile Markets with an Expectile-Based Framework for the FTSE Index(波动市场中基于期望分位数框架的定量风险管理:以富时指数为例) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    ٩ من الدقائق
  2. قبل ٣ أيام

    2025.07.18 | 优化LLMs上下文;提升视觉语言模型效率

    本期的 15 篇论文如下: [00:27] 🧮 A Survey of Context Engineering for Large Language Models(大型语言模型上下文工程综述) [01:16] 🧠 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning(VisionThink:基于强化学习的智能高效视觉语言模型) [02:08] 📸 $π^3$: Scalable Permutation-Equivariant Visual Geometry Learning($\pi^3$:可扩展的置换等变视觉几何学习) [02:52] 🤖 The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner(模仿游戏:图灵机模仿器是长度泛化的推理器) [03:47] 🖼 AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning(AnyCap项目:一个用于可控全模态图像描述的统一框架、数据集和基准) [04:47] 🧑 Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models(Diffuman4D:基于时空扩散模型的稀疏视角视频的4D一致性人体视角合成) [05:34] 🎭 FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers(梦幻肖像:利用表情增强的扩散Transformer提升多角色肖像动画效果) [06:23] 🧠 MindJourney: Test-Time Scaling with World Models for Spatial Reasoning(心灵之旅:基于世界模型的测试时空域推理扩展) [07:17] 🔬 AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research(AbGen:评估大型语言模型在科学研究的消融实验设计与评估中的能力) [08:08] 🗣 Voxtral(Voxtral:多模态音频聊天模型) [08:55] 💡 Teach Old SAEs New Domain Tricks with Boosting(利用Boosting技术使旧的稀疏自编码器掌握新的领域技巧) [09:46] 💡 FLEXITOKENS: Flexible Tokenization for Evolving Language Models(FLEXITOKENS:用于演化语言模型的灵活分词) [10:49] 🎬 TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation(TLB-VFI:用于视频帧插值的时序感知潜在布朗桥扩散模型) [11:45] 🛡 Automating Steering for Safe Multimodal Large Language Models(多模态大语言模型安全自动导向) [12:25] ⚙ RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization(RiemannLoRA:一种用于无歧义LoRA优化的统一黎曼框架) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    ١٤ من الدقائق
  3. قبل ٤ أيام

    2025.07.17 | RAG提升LLM推理;PhysX生成物理3D资产

    本期的 13 篇论文如下: [00:26] 🧠 Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs(具身智能RAG与深度推理:LLM中RAG推理系统综述) [01:17] 🧱 PhysX: Physical-Grounded 3D Asset Generation(PhysX:基于物理的3D资产生成) [02:04] 🚗 MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding(MMHU:一个用于人类行为理解的大规模多模态基准) [03:05] 🚀 SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?(SWE-Perf:语言模型能否优化真实世界代码仓库的性能?) [04:00] 💃 MOSPA: Human Motion Generation Driven by Spatial Audio(MOSPA:空间音频驱动的人体动作生成) [04:57] 🏗 DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering(DrafterBench:用于土木工程中任务自动化的LLM基准测试) [05:58] 🤖 Seq vs Seq: An Open Suite of Paired Encoders and Decoders(序列模型对比:一个开放的配对编码器与解码器套件) [06:38] 🎬 AnyI2V: Animating Any Conditional Image with Motion Control(AnyI2V:通过运动控制动画化任何条件图像) [07:34] 🎯 SpatialTrackerV2: 3D Point Tracking Made Easy(SpatialTrackerV2:化繁为简的3D点追踪) [08:27] 🦎 Lizard: An Efficient Linearization Framework for Large Language Models(Lizard:一种用于大型语言模型的高效线性化框架) [09:14] 🧰 Replacing thinking with tool usage enables reasoning in small language models(以工具使用代替思考:小语言模型中的推理能力提升) [10:05] 🧙 AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles(CheckThat! 2025 挑战赛中的 AI 巫师:利用情感增强的 Transformer 嵌入改进新闻文章中的主观性检测) [10:51] 🧠 RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning(RLEP:基于经验回放的强化学习用于LLM推理) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    ١٢ من الدقائق
  4. قبل ٥ أيام

    2025.07.16 | VLV自编码器降低训练成本;EXAONE 4.0增强推理能力。

    本期的 8 篇论文如下: [00:28] 💡 Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models(视觉-语言-视觉自编码器:从扩散模型中进行可扩展的知识蒸馏) [01:27] 🤖 EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes(EXAONE 4.0:融合非推理与推理模式的统一大型语言模型) [02:24] ⚖ Scaling Laws for Optimal Data Mixtures(最优数据混合的缩放定律) [03:12] 🔬 Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers(多模态基础模型能理解示意图吗?基于科学论文的信息检索问答实证研究) [03:58] 🤝 AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs(AgentsNet: 多智能体LLM中的协同与合作推理) [04:50] 🦠 LLMalMorph: On The Feasibility of Generating Variant Malware using Large-Language-Models(LLM变种重塑:基于大型语言模型生成恶意软件变体的可行性研究) [05:38] 🤖 OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique(OpenCodeReasoning-II:一种基于自我评价的简单测试时缩放方法) [06:25] 🧠 Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs(根植于预训练,受微调影响:LLM中认知偏差的起源案例研究) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    ٨ من الدقائق
  5. قبل ٦ أيام

    2025.07.15 | 数据集支持虚拟人生成;强化学习需防数据污染。

    本期的 12 篇论文如下: [00:24] 🗣 SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation(SpeakerVid-5M:用于视听二元交互式虚拟人生成的大规模高质量数据集) [01:12] 🤔 Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination(推理还是记忆?数据污染导致强化学习结果不可靠) [02:03] 🤖 EmbRACE-3K: Embodied Reasoning and Action in Complex Environments(EmbRACE-3K:复杂环境中的具身推理与行动) [03:02] 🤔 REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once(REST:通过同时提问多个问题来压力测试大型推理模型) [03:56] 🧮 Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation(递归混合:学习动态递归深度以实现自适应Token级别计算) [04:46] 🧠 LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers(LayerCake:大语言模型层内的Token感知对比解码) [05:39] ⚖ CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards(CompassJudger-2:基于可验证奖励的通用判别模型) [06:27] 🎬 MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second(MoVieS:一秒内实现运动感知的四维动态视角合成) [07:18] 🧮 A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning(数学大型语言模型的实用两阶段方案:通过监督微调最大化准确率,通过强化学习优化效率) [08:05] 🇰 From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation(从KMMLU-Redux到KMMLU-Pro:用于LLM评估的专业韩国基准套件) [09:08] 🖼 DreamPoster: A Unified Framework for Image-Conditioned Generative Poster Design(DreamPoster:一个用于图像条件生成海报设计的统一框架) [09:54] 🖼 Favicon Trojans: Executable Steganography Via Ico Alpha Channel Exploitation(Favicon木马:通过ICO Alpha通道利用实现的可执行隐写术) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    ١١ من الدقائق
  6. ١٤ يوليو

    2025.07.14 | 高效推理路径选择;压缩光场令牌渲染

    本期的 14 篇论文如下: [00:22] 🧠 Test-Time Scaling with Reflective Generative Model(基于反射生成模型的测试时缩放) [00:59] 💡 CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering(CLiFT:用于计算高效和自适应神经渲染的压缩光场令牌) [01:34] 💻 NeuralOS: Towards Simulating Operating Systems via Neural Generative Models(NeuralOS:迈向通过神经生成模型模拟操作系统的方向) [02:19] 🧠 KV Cache Steering for Inducing Reasoning in Small Language Models(用于诱导小语言模型推理的KV缓存引导) [03:03] 🧠 Neural-Driven Image Editing(神经驱动的图像编辑) [03:42] 🎬 Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective(Lumos-1:基于统一模型视角的自回归视频生成) [04:27] 🧠 Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning(开放视觉推理器:迁移语言认知行为以实现视觉推理) [05:14] 🧩 From One to More: Contextual Part Latents for 3D Generation(从一到多:用于3D生成的上下文部件隐变量) [05:53] 🤖 One Token to Fool LLM-as-a-Judge(一个Token即可欺骗LLM法官) [06:32] 🖼 Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation(视觉基础模型作为自回归图像生成的有效视觉标记器) [07:16] 🔭 What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models(基础模型发现了什么?利用归纳偏置来探测世界模型) [08:00] 🚀 Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities(Gemini 2.5:通过高级推理、多模态、长上下文和下一代 Agent 能力推向新前沿) [08:48] 🚀 BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity(BlockFFN:面向终端侧加速友好的块级激活稀疏混合专家模型) [09:25] 😵 Robust Multimodal Large Language Models Against Modality Conflict(面向模态冲突的鲁棒多模态大语言模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    ١٠ من الدقائق

التقييمات والمراجعات

٥
من ٥
‫٢ من التقييمات‬

حول

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

قد يعجبك أيضًا

للاستماع إلى حلقات ذات محتوى فاضح، قم بتسجيل الدخول.

اطلع على آخر مستجدات هذا البرنامج

قم بتسجيل الدخول أو التسجيل لمتابعة البرامج وحفظ الحلقات والحصول على آخر التحديثات.

تحديد بلد أو منطقة

أفريقيا والشرق الأوسط، والهند

آسيا والمحيط الهادئ

أوروبا

أمريكا اللاتينية والكاريبي

الولايات المتحدة وكندا