本期的 15 篇论文如下:
[00:24] ⚖ Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning(Pref-GRPO:基于成对偏好奖励的GRPO用于稳定的文本到图像强化学习)
[00:57] 🧠 rStar2-Agent: Agentic Reasoning Technical Report(rStar2-Agent:智能体推理技术报告)
[01:28] 🎨 USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning(USO: 通过解耦和奖励学习的统一风格与主题驱动生成)
[01:56] 🚀 AWorld: Orchestrating the Training Recipe for Agentic AI(AWorld:编排智能体AI的训练配方)
[02:26] 🎯 TCIA: A Task-Centric Instruction Augmentation Method for Instruction Finetuning(TCIA:一种用于指令微调的任务中心式指令增强方法)
[02:54] 🧠 Mixture of Contexts for Long Video Generation(上下文混合用于长视频生成)
[03:17] 🧠 CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification(CogVLA:基于指令驱动路由与稀疏化的认知对齐视觉-语言-动作模型)
[03:51] 🔍 MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers(MCP-Bench: 通过MCP服务器使用复杂现实世界任务对工具使用LLM代理进行基准测试)
[04:23] 🎨 OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning(OneReward:通过多任务人类偏好学习实现统一的掩码引导图像生成)
[04:54] 🛡 Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection(扭转局面:通过秩一安全注入实现轻量级对齐增强)
[05:21] 🧠 Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD(大型语言模型中的说服动态:使用DuET-PD研究知识和安全方面的鲁棒性和适应性)
[05:56] 💃 Dress&Dance: Dress up and Dance as You Like It - Technical Preview(着装与舞蹈:随心着装与舞蹈 - 技术预览)
[06:18] 🎯 OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language Models(OnGoal:在大型语言模型多轮对话中跟踪和可视化对话目标)
[06:42] 📷 Multi-View 3D Point Tracking(多视图3D点跟踪)
[07:10] 🎭 FakeParts: a New Family of AI-Generated DeepFakes(FakeParts:一种新型AI生成的深度伪造家族)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
المعلومات
- البرنامج
- معدل البثيتم التحديث يوميًا
- تاريخ النشر٢٩ أغسطس ٢٠٢٥ في ١١:٠٠ م UTC
- مدة الحلقة٨ من الدقائق
- التقييمملائم