HuggingFace 每日AI论文速递

2025.10.01 | 自对弈零标注训练;MCP代理深度评测

本期的 15 篇论文如下:

[00:20] 🎮 Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play(Vision-Zero:基于策略化博弈自对弈的可扩展视觉语言模型自我提升)

[00:59] 🔥 MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use(MCPMark:面向真实且全面的MCP应用场景的压力测试基准)

[01:36] 🐣 The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain(幼龙破壳: Transformer 与大脑模型之间缺失的环节)

[02:10] 🤥 TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning(TruthRL:通过强化学习激励大模型说真话)

[02:55] 🌊 OceanGym: A Benchmark Environment for Underwater Embodied Agents(OceanGym:面向水下具身智能体的综合基准环境)

[03:41] ⚡ DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder(DC-VideoGen:基于深度压缩视频自编码器的高效视频生成)

[04:14] 🔍 Who's Your Judge? On the Detectability of LLM-Generated Judgments(谁是你的评审?大模型生成评审意见的检测性研究)

[04:59] ✂ Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning(赢得剪枝豪赌:统一样本-令牌剪枝的高效监督微调新方法)

[05:45] 👁 Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training(未见先识:从语言预训练解密大模型视觉先验)

[06:24] 🧠 Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training(思维火花!后训练阶段推理模型中涌现的专用注意力头)

[07:09] 🧪 VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications(VitaBench:面向真实场景多功能交互任务的LLM智能体评测基准)

[07:42] ⚡ dParallel: Learnable Parallel Decoding for dLLMs(dParallel:面向扩散大语言模型的可学习并行解码)

[08:28] 🎯 IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance(IMG:通过隐式多模态引导校准扩散模型)

[09:15] 🎬 MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation(MotionRAG:基于运动检索增强的图像到视频生成)

[10:12] 🐬 Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention(基于离散唇部语义与多尺度全局-局部注意力的高效视听语音分离)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递