本期的 14 篇论文如下:
[00:25] 🧠 Self-Rewarding Vision-Language Model via Reasoning Decomposition(通过推理分解的自奖励视觉语言模型)
[00:49] 🔍 Beyond Transcription: Mechanistic Interpretability in ASR(超越转录:自动语音识别中的机械可解释性)
[01:22] 🤖 Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies(离散扩散VLA:将离散扩散引入视觉-语言-动作策略中的动作解码)
[01:52] 🧠 CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning(CODA:基于解耦强化学习的双脑计算机代理协调大脑与小脑)
[02:19] 🤖 MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation(MIDAS:通过实时自回归视频生成的多模态交互式数字人合成)
[02:51] 🔮 Predicting the Order of Upcoming Tokens Improves Language Modeling(预测未来token顺序提升语言建模效果)
[03:20] 💓 Gaze into the Heart: A Multi-View Video Dataset for rPPG and Health Biomarkers Estimation(凝视心脏:用于rPPG和健康生物标志物估计的多视角视频数据集)
[03:52] ⚡ Diffusion Language Models Know the Answer Before Decoding(扩散语言模型在解码前就知道答案)
[04:16] 👁 Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents(当心第三只眼!MLLM驱动的智能手机代理中的隐私意识基准测试)
[04:38] 🎧 AudioStory: Generating Long-Form Narrative Audio with Large Language Models(AudioStory:使用大型语言模型生成长篇叙事音频)
[05:01] 🧠 StepWiser: Stepwise Generative Judges for Wiser Reasoning(StepWiser:逐步生成式评判器以实现更明智的推理)
[05:25] 🔄 Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference(驯服混沌:异构与解耦大语言模型推理的协调自动扩展)
[05:53] 💃 MotionFlux: Efficient Text-Guided Motion Generation through Rectified Flow Matching and Preference Alignment(MotionFlux:基于整流流匹配和偏好优化的高效文本引导运动生成)
[06:18] 📊 DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis(DeepScholar-Bench:用于生成式研究综合的实时基准与自动化评估)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
المعلومات
- البرنامج
- معدل البثيتم التحديث يوميًا
- تاريخ النشر٢٨ أغسطس ٢٠٢٥ في ١١:٠٠ م UTC
- مدة الحلقة٧ من الدقائق
- التقييمملائم