本期的 15 篇论文如下:
[00:25] 🤖 AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models(AnimaX:利用联合视频-姿态扩散模型为3D非生物体赋予动画效果)
[01:11] 🎮 Matrix-Game: Interactive World Foundation Model(矩阵游戏:交互式世界基础模型)
[01:50] 🧠 GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning(GRPO-CARE:一致性感知的多模态推理强化学习)
[02:33] 💡 Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs(Skywork-SWE:揭示LLM在软件工程领域的数据扩展法则)
[03:18] 🖼 ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing(ScaleCap:通过双模态去偏实现推理时可扩展的图像描述)
[03:58] 🤔 Can Large Language Models Capture Human Annotator Disagreements?(大型语言模型能否捕捉人类标注者的分歧?)
[04:49] 🛠 SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications(SWE-SQL:揭示大型语言模型在解决真实应用中用户SQL问题上的途径)
[05:37] 🎨 JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent(JarvisArt:通过智能照片修饰代理释放人类艺术创造力)
[06:21] 🧠 SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning(SRFT:一种用于推理的监督和强化微调的单阶段方法)
[07:04] 🎬 SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution(SimpleGVR:一种用于潜在级联视频超分辨率的简单基线)
[07:41] 🖼 Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low CFG Scales(频域指导助力低CFG规模下的高保真采样)
[08:22] 🤖 Unified Vision-Language-Action Model(统一的视觉-语言-动作模型)
[08:59] 🤔 Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study(为什么开源LLM在数据分析中表现不佳?一项系统的实证研究)
[09:33] 🗣 Lost in the Mix: Evaluating LLM Understanding of Code-Switched Text(迷失在混合中:评估大型语言模型对语码转换文本的理解)
[10:08] 🔊 USAD: Universal Speech and Audio Representation via Distillation(USAD:通过知识蒸馏实现的通用语音和音频表征)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
Information
- Show
- FrequencyUpdated daily
- Published26 June 2025 at 00:00 UTC
- Length11 min
- RatingClean