HuggingFace 每日AI论文速递

2025.08.14 | 分子推理框架提升性能;视频身份控制轻量高效

本期的 15 篇论文如下:

[00:17] 🧪 Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery(Mol-R1:迈向分子发现中的显式长链思维推理)

[00:38] ✨ Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation(Stand-In:视频生成中轻量级即插即用的身份控制)

[01:06] 🎥 Story2Board: A Training-Free Approach for Expressive Storyboard Generation(Story2Board:一种富有表现力的故事板生成免训练方法)

[01:32] 🛡 AWorld: Dynamic Multi-Agent System with Stable Maneuvering for Robust GAIA Problem Solving(AWorld:具有稳定操控能力的动态多智能体系统,用于鲁棒的GAIA问题解决)

[01:59] ⚡ Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing(扩散大语言模型通过离散扩散强制实现超越自回归的推理速度)

[02:21] 🪄 Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation(Echo-4o:利用GPT-4o合成图像的力量改进图像生成)

[02:51] 🧠 Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory(感知、聆听、记忆与推理:一种具备长期记忆的多模态智能体)

[03:21] 🤝 Learning to Align, Aligning to Learn: A Unified Approach for Self-Optimized Alignment(学习对齐,对齐以学习:一种自优化对齐的统一方法)

[03:48] 🚧 MathReal: We Keep It Real! A Real Scene Benchmark for Evaluating Math Reasoning in Multimodal Large Language Models(MathReal:我们来真的!一个用于评估多模态大语言模型数学推理能力的真实场景基准)

[04:12] 💡 Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models(Cooper:大型语言模型强化学习中策略与奖励模型的协同优化)

[04:32] 👻 IAG: Input-aware Backdoor Attack on VLMs for Visual Grounding(IAG:针对视觉定位中VLMs的输入感知后门攻击)

[04:59] 💡 Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models(噪声超网络:均摊扩散模型中的测试时计算量)

[05:21] 💻 VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models(VisCodex:通过融合视觉和编码模型实现统一多模态代码生成)

[05:47] ✨ GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors(GSFixer:利用参考引导的视频扩散先验改进3D高斯泼溅)

[06:13] ✨ CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing(CannyEdit:选择性Canny控制与双提示引导的免训练图像编辑)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递