HuggingFace 每日AI论文速递

2025.09.23 | 少78条示范让AI飙73.5%;免掩膜视频插主体超Pika

本期的 15 篇论文如下:

[00:21] 🚀 LIMI: Less is More for Agency(LIMI:少即是多,打造AI智能体)

[00:55] 🎬 OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models(无需掩膜的视频任意主体插入:基于扩散Transformer模型)

[01:28] 🧩 OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System(OnePiece:面向工业级级联排序系统的上下文工程与推理融合框架)

[02:19] 🌐 Qwen3-Omni Technical Report(Qwen3-Omni技术报告:首个无性能损耗的全模态大模型)

[02:55] 🎬 TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs(TempSamp-R1:面向视频时序定位任务的高效离策略强化微调框架)

[03:28] 📐 GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning(GeoPQA:弥合多模态大模型几何推理中的视觉感知鸿沟)

[04:15] 🎯 DiffusionNFT: Online Diffusion Reinforcement with Forward Process(DiffusionNFT:基于前向过程在线扩散强化学习)

[05:05] 🤖 ByteWrist: A Parallel Robotic Wrist Enabling Flexible and Anthropomorphic Motion for Confined Spaces(ByteWrist:面向狭窄空间的可穿戴并行机器人腕关节)

[05:42] 💬 EpiCache: Episodic KV Cache Management for Long Conversational Question Answering(EpiCache:面向长对话问答的情景式KV缓存管理)

[06:24] 🧠 SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?(SWE-Bench Pro:AI智能体能攻克长周期软件工程难题吗?)

[07:01] 🧠 FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions(FlagEval发现报告:大推理模型在可自动验证文本与视觉问题上的初步测评)

[08:05] 🎬 VideoFrom3D: 3D Scene Video Generation via Complementary Image and Video Diffusion Models(VideoFrom3D:基于互补图像与视频扩散模型的3D场景视频生成)

[08:53] 🧪 ARE: Scaling Up Agent Environments and Evaluations(ARE:扩展智能体环境与评测规模)

[09:28] 🧩 QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models(QWHA:面向大模型量化部署的沃尔什-哈达玛参数高效微调方法)

[10:17] 🔍 Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels(从token与参数双视角解析监督微调对模型知识的影响)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递