HuggingFace 每日AI论文速递

duan
HuggingFace 每日AI论文速递

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. 10 HR AGO

    2025.06.25 | AnimaX提升3D非生物体动画效果;Matrix-Game优化游戏世界模型。

    本期的 15 篇论文如下: [00:25] 🤖 AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models(AnimaX:利用联合视频-姿态扩散模型为3D非生物体赋予动画效果) [01:11] 🎮 Matrix-Game: Interactive World Foundation Model(矩阵游戏:交互式世界基础模型) [01:50] 🧠 GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning(GRPO-CARE:一致性感知的多模态推理强化学习) [02:33] 💡 Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs(Skywork-SWE:揭示LLM在软件工程领域的数据扩展法则) [03:18] 🖼 ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing(ScaleCap:通过双模态去偏实现推理时可扩展的图像描述) [03:58] 🤔 Can Large Language Models Capture Human Annotator Disagreements?(大型语言模型能否捕捉人类标注者的分歧?) [04:49] 🛠 SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications(SWE-SQL:揭示大型语言模型在解决真实应用中用户SQL问题上的途径) [05:37] 🎨 JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent(JarvisArt:通过智能照片修饰代理释放人类艺术创造力) [06:21] 🧠 SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning(SRFT:一种用于推理的监督和强化微调的单阶段方法) [07:04] 🎬 SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution(SimpleGVR:一种用于潜在级联视频超分辨率的简单基线) [07:41] 🖼 Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low CFG Scales(频域指导助力低CFG规模下的高保真采样) [08:22] 🤖 Unified Vision-Language-Action Model(统一的视觉-语言-动作模型) [08:59] 🤔 Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study(为什么开源LLM在数据分析中表现不佳?一项系统的实证研究) [09:33] 🗣 Lost in the Mix: Evaluating LLM Understanding of Code-Switched Text(迷失在混合中:评估大型语言模型对语码转换文本的理解) [10:08] 🔊 USAD: Universal Speech and Audio Representation via Distillation(USAD:通过知识蒸馏实现的通用语音和音频表征) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  2. 1 DAY AGO

    2025.06.24 | 法线光照新方法提升细节;多模态生成模型表现优异。

    本期的 15 篇论文如下: [00:24] 💡 Light of Normals: Unified Feature Representation for Universal Photometric Stereo(法线光照:用于通用光度立体的统一特征表示) [01:00] 🎨 OmniGen2: Exploration to Advanced Multimodal Generation(OmniGen2:迈向更高级的多模态生成探索) [01:39] ✍ LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning(LongWriter-Zero:通过强化学习掌握超长文本生成) [02:17] 🎭 Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset(幻影数据:面向通用主题一致性视频生成数据集) [02:58] 🧠 RLPR: Extrapolating RLVR to General Domains without Verifiers(RLPR:将RLVR推广到无验证器的一般领域) [03:36] 🧠 ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs(ReasonFlux-PRM:LLM中用于长链思维推理的轨迹感知PRM) [04:11] 🤖 OAgents: An Empirical Study of Building Effective Agents(OAgents:构建有效智能体的实证研究) [04:52] 🖼 Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations(视觉即方言:通过文本对齐表征统一视觉理解与生成) [05:31] 🎬 VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory(VMem:基于Surfel索引视图记忆的交互式一致视频场景生成) [06:06] 🧑 LettinGo: Explore User Profile Generation for Recommendation System(LettinGo:探索用于推荐系统的用户画像生成) [06:48] 🔀 ReDit: Reward Dithering for Improved LLM Policy Optimization(ReDit:通过奖励抖动改进LLM策略优化) [07:29] 💡 FinCoT: Grounding Chain-of-Thought in Expert Financial Reasoning(FinCoT:将思维链扎根于专家金融推理) [08:08] 🎬 ViDAR: Video Diffusion-Aware 4D Reconstruction From Monocular Inputs(ViDAR:基于视频扩散的单目输入四维重建) [08:47] 🖼 Auto-Regressively Generating Multi-View Consistent Images(自回归生成多视角一致性图像) [09:35] 💡 SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation(SlimMoE:通过专家精简和知识蒸馏实现大型MoE模型的结构化压缩) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  3. 2 DAYS AGO

    2025.06.23 | DnD降低计算开销;视觉引导提升RAG性能。

    本期的 12 篇论文如下: [00:23] 🧲 Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights(拖拽式大语言模型:零样本提示到权重) [01:04] 🖼 Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding(视觉引导分块:增强RAG的多模态文档理解方案) [01:49] 🔀 PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models(PAROAttention:视觉生成模型中高效稀疏和量化注意力的模式感知重排序) [02:30] 🤖 VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning(VIKI-R:通过强化学习协调具身多智能体合作) [03:08] 🎮 Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition(Hunyuan-GameCraft:基于混合历史条件的高动态交互式游戏视频生成) [03:48] 🖼 DreamCube: 3D Panorama Generation via Multi-plane Synchronization(DreamCube:基于多平面同步的3D全景图生成) [04:26] 🖼 Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details(Hunyuan3D 2.5:迈向具有极致细节的高保真3D资产生成) [05:06] 💽 InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding(InfiniPot-V:面向流视频理解的内存约束KV缓存压缩) [05:48] 🖼 Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material(Hunyuan3D 2.1:从图像到具有生产级PBR材质的高保真3D资产) [06:36] 🧠 UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation(UniFork:探索模态对齐以实现统一的多模态理解与生成) [07:16] ⚖ Reranking-based Generation for Unbiased Perspective Summarization(基于重排序生成方法的无偏视角摘要) [07:52] 🚗 Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation(基于交错自回归运动和场景生成的长期交通仿真) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    9 min
  4. 6 DAYS AGO

    2025.06.19 | SEKAI数据集提升视频生成;原型推理增强LLM泛化能力。

    本期的 15 篇论文如下: [00:22] 🌍 Sekai: A Video Dataset towards World Exploration(Sekai:一个面向世界探索的视频数据集) [01:02] 💡 ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs(原型推理:作为大型语言模型中通用推理基础的原型) [01:43] 💡 GenRecal: Generation after Recalibration from Large to Small Vision-Language Models(GenRecal:从大型到小型视觉-语言模型的重校准后生成) [02:24] 🗣 BUT System for the MLC-SLM Challenge(用于MLC-SLM挑战赛的BUT系统) [03:10] 🤖 Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence(具身Web智能体:连接物理与数字领域,实现集成智能) [03:57] 💡 Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation(自由形式生成中基于语义感知的开放式R1训练奖励) [04:43] 🔬 SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification(SciVer:评估多模态科学声明验证中的基础模型) [05:26] 🚀 Truncated Proximal Policy Optimization(截断近端策略优化) [06:04] 🖼 PictSure: Pretraining Embeddings Matters for In-Context Learning Image Classifiers(PictSure:预训练嵌入对上下文学习图像分类器的影响) [06:37] 🖼 CoMemo: LVLMs Need Image Context with Image Memory(CoMemo:LVLM需要带有图像记忆的图像上下文) [07:21] 🤖 SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence(群体智能代理:迈向基于群体智能的全自动代理系统生成) [08:01] 🧠 MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models(MoTE:面向内存高效的大型多模态模型的三元专家混合) [08:45] 🛡 OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents(OS-Harm:衡量计算机使用Agent安全性的基准) [09:34] 🏞 ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies(ImmerseGen:基于代理引导的、使用Alpha纹理代理的沉浸式世界生成) [10:09] 🤝 FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models(FedNano:面向预训练多模态大语言模型的轻量级联邦调优) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  5. 18 JUN

    2025.06.18 | MultiFinBen揭示金融模型局限;测试时计算提升LLM Agent性能。

    本期的 15 篇论文如下: [00:23] 📊 MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation(MultiFinBen:一个多语言、多模态和难度感知的金融领域大语言模型评估基准) [01:03] 🤖 Scaling Test-time Compute for LLM Agents(扩展LLM Agent的测试时计算) [01:38] 🎼 CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following(CMI-Bench:一个评估音乐指令跟随的综合性基准) [02:16] 💬 LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs(LongLLaDA:解锁扩散语言模型中的长文本能力) [02:57] 🤔 Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs(基于可验证奖励的强化学习隐式地激励基础大语言模型中的正确推理) [03:40] 🧠 Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team(Xolver: 像奥林匹克团队一样利用整体经验进行多智能体推理) [04:20] 🗣 Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model(Stream-Omni:与大型语言-视觉-语音模型的同时多模态交互) [05:02] ⚕ Efficient Medical VIE via Reinforcement Learning(基于强化学习的高效医学视觉信息抽取) [05:40] 🤔 Reasoning with Exploration: An Entropy Perspective(基于探索的推理:一个熵的视角) [06:18] 🧠 QFFT, Question-Free Fine-Tuning for Adaptive Reasoning(QFFT:用于自适应推理的无问题微调) [06:52] 🎨 Align Your Flow: Scaling Continuous-Time Flow Map Distillation(对齐你的流:扩展连续时间流映射蒸馏) [07:27] 🧪 Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure(大语言模型能否为算法问题生成高质量测试用例?TestCase-Eval:容错覆盖和暴露的系统性评估) [08:07] 🤖 Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees(有保证的猜测:一种基于语言建模的CISC到RISC代码转换方法,并提供测试保证) [08:58] 🛠 CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios(CRITICTOOL:评估大型语言模型在工具调用错误场景中的自我批判能力) [09:38] 📊 xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations(xbench:通过与职业对齐的真实世界评估追踪Agent的生产力提升) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  6. 17 JUN

    2025.06.17 | MiniMax-M1提升推理性能;多模态模型认知测试创新。

    本期的 15 篇论文如下: [00:22] 💡 MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention(MiniMax-M1:利用闪电注意力高效扩展测试时计算) [01:00] 🔬 Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning(科学家的首次考试:通过感知、理解和推理来探索多模态大型语言模型的认知能力) [01:47] 🧐 DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents(DeepResearch Bench:一个面向深度研究Agent的综合性评测基准) [02:28] 🧠 DoTA-RAG: Dynamic of Thought Aggregation RAG(思想动态聚合RAG:一种用于大规模网络知识索引的检索增强生成系统) [03:08] 🧠 Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning(Ego-R1:用于超长第一视角视频推理的工具链式思考) [03:52] 💡 Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency(等等,我们不需要“等等”!移除思考Token提升推理效率) [04:28] 🤖 TaskCraft: Automated Generation of Agentic Tasks(任务工坊:自动化生成自主Agent任务) [05:04] 🤯 Discrete Diffusion in Large Language and Multimodal Models: A Survey(大型语言和多模态模型中的离散扩散:一项综述) [05:42] 🪞 Test3R: Learning to Reconstruct 3D at Test Time(Test3R:测试时学习三维重建) [06:25] 🖼 VGR: Visual Grounded Reasoning(VGR:视觉基础推理) [07:06] 🤖 PersonaFeedback: A Large-scale Human-annotated Benchmark For Personalization(PersonaFeedback:一个大规模的人工标注的个性化基准) [07:50] 🤖 From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding(从真实到合成:通过属性化基础生成数百万条多样化且复杂的用户指令) [08:32] 🤖 BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models(BridgeVLA: 基于输入-输出对齐的视觉-语言模型高效3D操作学习) [09:11] 🧠 Language Surgery in Multilingual Large Language Models(多语言大型语言模型中的语言手术) [09:44] 🤖 AI Agent Behavioral Science(人工智能体行为科学) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min

About

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

You Might Also Like

To listen to explicit episodes, sign in.

Stay up to date with this show

Sign in or sign up to follow shows, save episodes and get the latest updates.

Select a country or region

Africa, Middle East, and India

Asia Pacific

Europe

Latin America and the Caribbean

The United States and Canada