HuggingFace 每日AI论文速递

duan
HuggingFace 每日AI论文速递

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. 2시간 전

    2025.07.21 | dLLM新型安全漏洞,现有防御不足;俄语语音合成,数据与标注是核心。

    本期的 10 篇论文如下: [00:20] 😈 The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs(隐藏在面具后的恶魔:扩散大语言模型的一种新兴安全漏洞) [01:12] 🎤 A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models(解决俄语语音生成模型中语音与韵律挑战的数据中心框架) [02:07] 🧩 Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning(Franca:用于可扩展视觉表示学习的嵌套套娃聚类) [02:49] 🚀 Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models(Mono-InternVL-1.5:迈向更经济、更快速的单体多模态大语言模型) [03:24] 🎨 CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models(CSD-VAR:视觉自回归模型中的内容-风格分解) [04:27] 🚀 RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services(RedOne:揭示社交网络服务中领域专用LLM的后训练) [05:08] 🤝 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities(逆向强化学习与大语言模型后训练的结合:基础、进展与机遇) [05:41] 🚫 Mitigating Object Hallucinations via Sentence-Level Early Intervention(通过句子级早期干预缓解物体幻觉) [06:20] ⚡ The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations(生成式能源竞技场 (GEA):在大型语言模型 (LLM) 人工评估中融入能源意识) [07:41] 📈 Quantitative Risk Management in Volatile Markets with an Expectile-Based Framework for the FTSE Index(波动市场中基于期望分位数框架的定量风险管理:以富时指数为例) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    9분
  2. 3일 전

    2025.07.18 | 优化LLMs上下文;提升视觉语言模型效率

    本期的 15 篇论文如下: [00:27] 🧮 A Survey of Context Engineering for Large Language Models(大型语言模型上下文工程综述) [01:16] 🧠 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning(VisionThink:基于强化学习的智能高效视觉语言模型) [02:08] 📸 $π^3$: Scalable Permutation-Equivariant Visual Geometry Learning($\pi^3$:可扩展的置换等变视觉几何学习) [02:52] 🤖 The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner(模仿游戏:图灵机模仿器是长度泛化的推理器) [03:47] 🖼 AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning(AnyCap项目:一个用于可控全模态图像描述的统一框架、数据集和基准) [04:47] 🧑 Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models(Diffuman4D:基于时空扩散模型的稀疏视角视频的4D一致性人体视角合成) [05:34] 🎭 FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers(梦幻肖像:利用表情增强的扩散Transformer提升多角色肖像动画效果) [06:23] 🧠 MindJourney: Test-Time Scaling with World Models for Spatial Reasoning(心灵之旅:基于世界模型的测试时空域推理扩展) [07:17] 🔬 AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research(AbGen:评估大型语言模型在科学研究的消融实验设计与评估中的能力) [08:08] 🗣 Voxtral(Voxtral:多模态音频聊天模型) [08:55] 💡 Teach Old SAEs New Domain Tricks with Boosting(利用Boosting技术使旧的稀疏自编码器掌握新的领域技巧) [09:46] 💡 FLEXITOKENS: Flexible Tokenization for Evolving Language Models(FLEXITOKENS:用于演化语言模型的灵活分词) [10:49] 🎬 TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation(TLB-VFI:用于视频帧插值的时序感知潜在布朗桥扩散模型) [11:45] 🛡 Automating Steering for Safe Multimodal Large Language Models(多模态大语言模型安全自动导向) [12:25] ⚙ RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization(RiemannLoRA:一种用于无歧义LoRA优化的统一黎曼框架) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    14분
  3. 4일 전

    2025.07.17 | RAG提升LLM推理;PhysX生成物理3D资产

    本期的 13 篇论文如下: [00:26] 🧠 Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs(具身智能RAG与深度推理:LLM中RAG推理系统综述) [01:17] 🧱 PhysX: Physical-Grounded 3D Asset Generation(PhysX:基于物理的3D资产生成) [02:04] 🚗 MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding(MMHU:一个用于人类行为理解的大规模多模态基准) [03:05] 🚀 SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?(SWE-Perf:语言模型能否优化真实世界代码仓库的性能?) [04:00] 💃 MOSPA: Human Motion Generation Driven by Spatial Audio(MOSPA:空间音频驱动的人体动作生成) [04:57] 🏗 DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering(DrafterBench:用于土木工程中任务自动化的LLM基准测试) [05:58] 🤖 Seq vs Seq: An Open Suite of Paired Encoders and Decoders(序列模型对比:一个开放的配对编码器与解码器套件) [06:38] 🎬 AnyI2V: Animating Any Conditional Image with Motion Control(AnyI2V:通过运动控制动画化任何条件图像) [07:34] 🎯 SpatialTrackerV2: 3D Point Tracking Made Easy(SpatialTrackerV2:化繁为简的3D点追踪) [08:27] 🦎 Lizard: An Efficient Linearization Framework for Large Language Models(Lizard:一种用于大型语言模型的高效线性化框架) [09:14] 🧰 Replacing thinking with tool usage enables reasoning in small language models(以工具使用代替思考:小语言模型中的推理能力提升) [10:05] 🧙 AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles(CheckThat! 2025 挑战赛中的 AI 巫师:利用情感增强的 Transformer 嵌入改进新闻文章中的主观性检测) [10:51] 🧠 RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning(RLEP:基于经验回放的强化学习用于LLM推理) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    12분
  4. 5일 전

    2025.07.16 | VLV自编码器降低训练成本;EXAONE 4.0增强推理能力。

    本期的 8 篇论文如下: [00:28] 💡 Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models(视觉-语言-视觉自编码器:从扩散模型中进行可扩展的知识蒸馏) [01:27] 🤖 EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes(EXAONE 4.0:融合非推理与推理模式的统一大型语言模型) [02:24] ⚖ Scaling Laws for Optimal Data Mixtures(最优数据混合的缩放定律) [03:12] 🔬 Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers(多模态基础模型能理解示意图吗?基于科学论文的信息检索问答实证研究) [03:58] 🤝 AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs(AgentsNet: 多智能体LLM中的协同与合作推理) [04:50] 🦠 LLMalMorph: On The Feasibility of Generating Variant Malware using Large-Language-Models(LLM变种重塑:基于大型语言模型生成恶意软件变体的可行性研究) [05:38] 🤖 OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique(OpenCodeReasoning-II:一种基于自我评价的简单测试时缩放方法) [06:25] 🧠 Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs(根植于预训练,受微调影响:LLM中认知偏差的起源案例研究) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    8분
  5. 6일 전

    2025.07.15 | 数据集支持虚拟人生成;强化学习需防数据污染。

    本期的 12 篇论文如下: [00:24] 🗣 SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation(SpeakerVid-5M:用于视听二元交互式虚拟人生成的大规模高质量数据集) [01:12] 🤔 Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination(推理还是记忆?数据污染导致强化学习结果不可靠) [02:03] 🤖 EmbRACE-3K: Embodied Reasoning and Action in Complex Environments(EmbRACE-3K:复杂环境中的具身推理与行动) [03:02] 🤔 REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once(REST:通过同时提问多个问题来压力测试大型推理模型) [03:56] 🧮 Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation(递归混合:学习动态递归深度以实现自适应Token级别计算) [04:46] 🧠 LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers(LayerCake:大语言模型层内的Token感知对比解码) [05:39] ⚖ CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards(CompassJudger-2:基于可验证奖励的通用判别模型) [06:27] 🎬 MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second(MoVieS:一秒内实现运动感知的四维动态视角合成) [07:18] 🧮 A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning(数学大型语言模型的实用两阶段方案:通过监督微调最大化准确率,通过强化学习优化效率) [08:05] 🇰 From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation(从KMMLU-Redux到KMMLU-Pro:用于LLM评估的专业韩国基准套件) [09:08] 🖼 DreamPoster: A Unified Framework for Image-Conditioned Generative Poster Design(DreamPoster:一个用于图像条件生成海报设计的统一框架) [09:54] 🖼 Favicon Trojans: Executable Steganography Via Ico Alpha Channel Exploitation(Favicon木马:通过ICO Alpha通道利用实现的可执行隐写术) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11분
  6. 7월 14일

    2025.07.14 | 高效推理路径选择;压缩光场令牌渲染

    本期的 14 篇论文如下: [00:22] 🧠 Test-Time Scaling with Reflective Generative Model(基于反射生成模型的测试时缩放) [00:59] 💡 CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering(CLiFT:用于计算高效和自适应神经渲染的压缩光场令牌) [01:34] 💻 NeuralOS: Towards Simulating Operating Systems via Neural Generative Models(NeuralOS:迈向通过神经生成模型模拟操作系统的方向) [02:19] 🧠 KV Cache Steering for Inducing Reasoning in Small Language Models(用于诱导小语言模型推理的KV缓存引导) [03:03] 🧠 Neural-Driven Image Editing(神经驱动的图像编辑) [03:42] 🎬 Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective(Lumos-1:基于统一模型视角的自回归视频生成) [04:27] 🧠 Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning(开放视觉推理器:迁移语言认知行为以实现视觉推理) [05:14] 🧩 From One to More: Contextual Part Latents for 3D Generation(从一到多:用于3D生成的上下文部件隐变量) [05:53] 🤖 One Token to Fool LLM-as-a-Judge(一个Token即可欺骗LLM法官) [06:32] 🖼 Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation(视觉基础模型作为自回归图像生成的有效视觉标记器) [07:16] 🔭 What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models(基础模型发现了什么?利用归纳偏置来探测世界模型) [08:00] 🚀 Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities(Gemini 2.5:通过高级推理、多模态、长上下文和下一代 Agent 能力推向新前沿) [08:48] 🚀 BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity(BlockFFN:面向终端侧加速友好的块级激活稀疏混合专家模型) [09:25] 😵 Robust Multimodal Large Language Models Against Modality Conflict(面向模态冲突的鲁棒多模态大语言模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    10분

평가 및 리뷰

5
최고 5점
2개의 평가

소개

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

좋아할 만한 다른 항목

무삭제판 에피소드를 청취하려면 로그인하십시오.

이 프로그램의 최신 정보 받기

프로그램을 팔로우하고, 에피소드를 저장하고, 최신 소식을 받아보려면 로그인하거나 가입하십시오.

국가 또는 지역 선택

아프리카, 중동 및 인도

아시아 태평양

유럽

라틴 아메리카 및 카리브해

미국 및 캐나다