HuggingFace 每日AI论文速递

2025.08.13 | 多模态AI突破;3D世界生成

本期的 15 篇论文如下:

[00:22] 🤖 WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent(WebWatcher:突破视觉-语言深度研究智能体的新前沿)

[00:45] 🌎 Matrix-3D: Omnidirectional Explorable 3D World Generation(Matrix-3D:全向可探索三维世界生成)

[01:17] 🚀 Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL(超越十回合:通过大规模异步强化学习解锁长周期智能体搜索)

[01:43] 🕺 CharacterShot: Controllable and Consistent 4D Character Animation(CharacterShot:可控且一致的4D角色动画)

[02:05] ⏳ Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models(时间即特征:利用扩散语言模型中的时序动态)

[02:29] 🔍 HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches(HierSearch:一个整合本地和网络搜索的分层企业深度搜索框架)

[02:55] 🧊 VertexRegen: Mesh Generation with Continuous Level of Detail(VertexRegen:连续细节层次的网格生成)

[03:16] 🎯 Test-Time Reinforcement Learning for GUI Grounding via Region Consistency(基于区域一致性的GUI定位测试时强化学习)

[03:43] ⏱ Train Long, Think Short: Curriculum Learning for Efficient Reasoning(长程训练,短程思考:高效推理的课程学习)

[04:05] 🎓 Aryabhata: An exam-focused language model for JEE Math(Aryabhata:一个专注于JEE数学考试的语言模型)

[04:30] 🖼 UNCAGE: Contrastive Attention Guidance for Masked Generative Transformers in Text-to-Image Generation(UNCAGE:文本到图像生成中掩码生成式Transformer的对比注意力引导)

[04:52] 🧠 Democratizing Diplomacy: A Harness for Evaluating Any Large Language Model on Full-Press Diplomacy(民主化外交:一个评估任意大型语言模型在《外交》游戏中表现的工具)

[05:20] 👋 Towards Affordance-Aware Robotic Dexterous Grasping with Human-like Priors(迈向融合类人先验的可供性感知机器人灵巧抓取)

[05:45] 📈 Adversarial Video Promotion Against Text-to-Video Retrieval(针对文本到视频检索的对抗性视频推广)

[06:10] 🎬 Cut2Next: Generating Next Shot via In-Context Tuning(Cut2Next:通过上下文调优生成下一镜头)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递