HuggingFace 每日AI论文速递

duan
HuggingFace 每日AI论文速递

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. HACE 23 H

    2025.06.24 | 法线光照新方法提升细节;多模态生成模型表现优异。

    本期的 15 篇论文如下: [00:24] 💡 Light of Normals: Unified Feature Representation for Universal Photometric Stereo(法线光照:用于通用光度立体的统一特征表示) [01:00] 🎨 OmniGen2: Exploration to Advanced Multimodal Generation(OmniGen2:迈向更高级的多模态生成探索) [01:39] ✍ LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning(LongWriter-Zero:通过强化学习掌握超长文本生成) [02:17] 🎭 Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset(幻影数据:面向通用主题一致性视频生成数据集) [02:58] 🧠 RLPR: Extrapolating RLVR to General Domains without Verifiers(RLPR:将RLVR推广到无验证器的一般领域) [03:36] 🧠 ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs(ReasonFlux-PRM:LLM中用于长链思维推理的轨迹感知PRM) [04:11] 🤖 OAgents: An Empirical Study of Building Effective Agents(OAgents:构建有效智能体的实证研究) [04:52] 🖼 Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations(视觉即方言:通过文本对齐表征统一视觉理解与生成) [05:31] 🎬 VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory(VMem:基于Surfel索引视图记忆的交互式一致视频场景生成) [06:06] 🧑 LettinGo: Explore User Profile Generation for Recommendation System(LettinGo:探索用于推荐系统的用户画像生成) [06:48] 🔀 ReDit: Reward Dithering for Improved LLM Policy Optimization(ReDit:通过奖励抖动改进LLM策略优化) [07:29] 💡 FinCoT: Grounding Chain-of-Thought in Expert Financial Reasoning(FinCoT:将思维链扎根于专家金融推理) [08:08] 🎬 ViDAR: Video Diffusion-Aware 4D Reconstruction From Monocular Inputs(ViDAR:基于视频扩散的单目输入四维重建) [08:47] 🖼 Auto-Regressively Generating Multi-View Consistent Images(自回归生成多视角一致性图像) [09:35] 💡 SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation(SlimMoE:通过专家精简和知识蒸馏实现大型MoE模型的结构化压缩) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  2. HACE 2 DÍAS

    2025.06.23 | DnD降低计算开销;视觉引导提升RAG性能。

    本期的 12 篇论文如下: [00:23] 🧲 Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights(拖拽式大语言模型:零样本提示到权重) [01:04] 🖼 Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding(视觉引导分块:增强RAG的多模态文档理解方案) [01:49] 🔀 PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models(PAROAttention:视觉生成模型中高效稀疏和量化注意力的模式感知重排序) [02:30] 🤖 VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning(VIKI-R:通过强化学习协调具身多智能体合作) [03:08] 🎮 Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition(Hunyuan-GameCraft:基于混合历史条件的高动态交互式游戏视频生成) [03:48] 🖼 DreamCube: 3D Panorama Generation via Multi-plane Synchronization(DreamCube:基于多平面同步的3D全景图生成) [04:26] 🖼 Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details(Hunyuan3D 2.5:迈向具有极致细节的高保真3D资产生成) [05:06] 💽 InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding(InfiniPot-V:面向流视频理解的内存约束KV缓存压缩) [05:48] 🖼 Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material(Hunyuan3D 2.1:从图像到具有生产级PBR材质的高保真3D资产) [06:36] 🧠 UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation(UniFork:探索模态对齐以实现统一的多模态理解与生成) [07:16] ⚖ Reranking-based Generation for Unbiased Perspective Summarization(基于重排序生成方法的无偏视角摘要) [07:52] 🚗 Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation(基于交错自回归运动和场景生成的长期交通仿真) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    9 min
  3. HACE 6 DÍAS

    2025.06.19 | SEKAI数据集提升视频生成;原型推理增强LLM泛化能力。

    本期的 15 篇论文如下: [00:22] 🌍 Sekai: A Video Dataset towards World Exploration(Sekai:一个面向世界探索的视频数据集) [01:02] 💡 ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs(原型推理:作为大型语言模型中通用推理基础的原型) [01:43] 💡 GenRecal: Generation after Recalibration from Large to Small Vision-Language Models(GenRecal:从大型到小型视觉-语言模型的重校准后生成) [02:24] 🗣 BUT System for the MLC-SLM Challenge(用于MLC-SLM挑战赛的BUT系统) [03:10] 🤖 Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence(具身Web智能体:连接物理与数字领域,实现集成智能) [03:57] 💡 Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation(自由形式生成中基于语义感知的开放式R1训练奖励) [04:43] 🔬 SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification(SciVer:评估多模态科学声明验证中的基础模型) [05:26] 🚀 Truncated Proximal Policy Optimization(截断近端策略优化) [06:04] 🖼 PictSure: Pretraining Embeddings Matters for In-Context Learning Image Classifiers(PictSure:预训练嵌入对上下文学习图像分类器的影响) [06:37] 🖼 CoMemo: LVLMs Need Image Context with Image Memory(CoMemo:LVLM需要带有图像记忆的图像上下文) [07:21] 🤖 SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence(群体智能代理:迈向基于群体智能的全自动代理系统生成) [08:01] 🧠 MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models(MoTE:面向内存高效的大型多模态模型的三元专家混合) [08:45] 🛡 OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents(OS-Harm:衡量计算机使用Agent安全性的基准) [09:34] 🏞 ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies(ImmerseGen:基于代理引导的、使用Alpha纹理代理的沉浸式世界生成) [10:09] 🤝 FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models(FedNano:面向预训练多模态大语言模型的轻量级联邦调优) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  4. 18 JUN

    2025.06.18 | MultiFinBen揭示金融模型局限;测试时计算提升LLM Agent性能。

    本期的 15 篇论文如下: [00:23] 📊 MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation(MultiFinBen:一个多语言、多模态和难度感知的金融领域大语言模型评估基准) [01:03] 🤖 Scaling Test-time Compute for LLM Agents(扩展LLM Agent的测试时计算) [01:38] 🎼 CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following(CMI-Bench:一个评估音乐指令跟随的综合性基准) [02:16] 💬 LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs(LongLLaDA:解锁扩散语言模型中的长文本能力) [02:57] 🤔 Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs(基于可验证奖励的强化学习隐式地激励基础大语言模型中的正确推理) [03:40] 🧠 Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team(Xolver: 像奥林匹克团队一样利用整体经验进行多智能体推理) [04:20] 🗣 Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model(Stream-Omni:与大型语言-视觉-语音模型的同时多模态交互) [05:02] ⚕ Efficient Medical VIE via Reinforcement Learning(基于强化学习的高效医学视觉信息抽取) [05:40] 🤔 Reasoning with Exploration: An Entropy Perspective(基于探索的推理:一个熵的视角) [06:18] 🧠 QFFT, Question-Free Fine-Tuning for Adaptive Reasoning(QFFT:用于自适应推理的无问题微调) [06:52] 🎨 Align Your Flow: Scaling Continuous-Time Flow Map Distillation(对齐你的流:扩展连续时间流映射蒸馏) [07:27] 🧪 Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure(大语言模型能否为算法问题生成高质量测试用例?TestCase-Eval:容错覆盖和暴露的系统性评估) [08:07] 🤖 Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees(有保证的猜测:一种基于语言建模的CISC到RISC代码转换方法,并提供测试保证) [08:58] 🛠 CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios(CRITICTOOL:评估大型语言模型在工具调用错误场景中的自我批判能力) [09:38] 📊 xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations(xbench:通过与职业对齐的真实世界评估追踪Agent的生产力提升) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  5. 17 JUN

    2025.06.17 | MiniMax-M1提升推理性能;多模态模型认知测试创新。

    本期的 15 篇论文如下: [00:22] 💡 MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention(MiniMax-M1:利用闪电注意力高效扩展测试时计算) [01:00] 🔬 Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning(科学家的首次考试:通过感知、理解和推理来探索多模态大型语言模型的认知能力) [01:47] 🧐 DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents(DeepResearch Bench:一个面向深度研究Agent的综合性评测基准) [02:28] 🧠 DoTA-RAG: Dynamic of Thought Aggregation RAG(思想动态聚合RAG:一种用于大规模网络知识索引的检索增强生成系统) [03:08] 🧠 Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning(Ego-R1:用于超长第一视角视频推理的工具链式思考) [03:52] 💡 Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency(等等,我们不需要“等等”!移除思考Token提升推理效率) [04:28] 🤖 TaskCraft: Automated Generation of Agentic Tasks(任务工坊:自动化生成自主Agent任务) [05:04] 🤯 Discrete Diffusion in Large Language and Multimodal Models: A Survey(大型语言和多模态模型中的离散扩散:一项综述) [05:42] 🪞 Test3R: Learning to Reconstruct 3D at Test Time(Test3R:测试时学习三维重建) [06:25] 🖼 VGR: Visual Grounded Reasoning(VGR:视觉基础推理) [07:06] 🤖 PersonaFeedback: A Large-scale Human-annotated Benchmark For Personalization(PersonaFeedback:一个大规模的人工标注的个性化基准) [07:50] 🤖 From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding(从真实到合成:通过属性化基础生成数百万条多样化且复杂的用户指令) [08:32] 🤖 BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models(BridgeVLA: 基于输入-输出对齐的视觉-语言模型高效3D操作学习) [09:11] 🧠 Language Surgery in Multilingual Large Language Models(多语言大型语言模型中的语言手术) [09:44] 🤖 AI Agent Behavioral Science(人工智能体行为科学) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  6. 17 JUN

    2025.06.16 | 跨模态合成新视角图像;策略依从型智能体抗攻击

    本期的 15 篇论文如下: [00:23] 🖼 Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation(基于跨模态注意力提炼的对齐新视角图像与几何体合成) [01:02] 🛡 Effective Red-Teaming of Policy-Adherent Agents(有效对抗策略依从型智能体) [01:39] 🔄 The Diffusion Duality(扩散二元性) [02:20] 🤖 LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?(LiveCodeBench Pro:奥林匹克竞赛奖牌获得者如何评价大型语言模型在算法竞赛中的表现?) [03:09] 🧠 pLSTM: parallelizable Linear Source Transition Mark networks(pLSTM:可并行化的线性源转移马尔可夫网络) [03:50] 🖼 A High-Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation(高质量的图文交错生成数据集与可靠评估) [04:36] 🧠 Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache(超越同质注意力:通过傅里叶近似KV缓存实现内存高效的LLM) [05:16] 🤖 SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending(SkillBlender: 面向通用人形机器人全身Loco-操作的技能融合) [06:00] 🧠 SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning(SwS:基于自感知弱点驱动的问题合成,用于提升大型语言模型在强化学习中的推理能力) [06:42] 🛡 Detecting Harmful Memes with Decoupled Understanding and Guided CoT Reasoning(利用解耦理解和引导式CoT推理检测有害模因) [07:17] 🎬 DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO(DeepVideo-R1:通过难度感知回归GRPO进行视频强化微调) [07:59] ⚙ Configurable Preference Tuning with Rubric-Guided Synthetic Data(基于规则引导合成数据的可配置偏好调整) [08:41] 👁 ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs(ViCrit:一种用于VLM中视觉感知的可验证强化学习代理任务) [09:29] 🔄 A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data(一种利用TTS合成数据增强ASR的自精炼框架) [10:16] 🔍 Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings(稠密检索器在简单查询上可能失效:揭示嵌入的粒度困境) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min

Calificaciones y reseñas

5
de 5
2 calificaciones

Acerca de

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

También te podría interesar

Para escuchar episodios explícitos, inicia sesión.

Mantente al día con este programa

Inicia sesión o regístrate para seguir programas, guardar episodios y enterarte de las últimas novedades.

Elige un país o región

Africa, Oriente Medio e India

Asia-Pacífico

Europa

Latinoamérica y el Caribe

Estados Unidos y Canadá