本期的 15 篇论文如下:
[00:21] 🛡 OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows(OS-Sentinel:在真实工作流中通过混合验证提升移动GUI代理安全性)
[01:13] 🧠 ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning(ThinkMorph:多模态交错思维链中的涌现特性)
[01:49] ⚔ INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats(INT对决FP:细粒度低比特量化格式的综合研究)
[02:38] 🤖 $π_\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models(π_RL:面向流式视觉-语言-动作模型的在线强化学习微调)
[03:26] 🚀 Continuous Autoregressive Language Models(连续自回归语言模型)
[03:54] 🧭 Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning(Spatial-SSRL:通过自监督强化学习增强空间理解)
[04:37] 🎯 HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration(HyperClick:通过不确定性校准推动可靠GUI定位)
[05:15] 🎯 Defeating the Training-Inference Mismatch via FP16(用FP16打败训练-推理失配)
[05:52] 🪜 Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals(分阶段DMD:在子区间内做分数匹配实现少步分布匹配蒸馏)
[06:28] 🧭 Revisiting Multimodal Positional Encoding in Vision-Language Models(再探视觉-语言模型中的多模态位置编码)
[07:09] ⚡ Higher-order Linear Attention(高阶线性注意力机制)
[07:55] 🌐 Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model(双流扩散助力世界模型增强视觉-语言-动作模型)
[08:36] 🔬 The Denario project: Deep knowledge AI agents for scientific discovery(Denario项目:面向科学发现的深度知识AI智能体)
[09:14] 🎯 Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning(面向具身决策的多模态大模型视觉后门攻击:对比触发学习方法)
[09:51] 🏙 Mask-to-Height: A YOLOv11-Based Architecture for Joint Building Instance Segmentation and Height Classification from Satellite Imagery(Mask-to-Height:基于YOLOv11的联合建筑实例分割与高度分类架构)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
資訊
- 節目
- 頻率每日更新
- 發佈時間2025年11月3日 下午11:00 [UTC]
- 長度11 分鐘
- 年齡分級兒少適宜
