HuggingFace 每日AI论文速递

2025.11.03 | OS-Sentinel实时守护手机操作安全;ThinkMorph让小模型边想边画

本期的 15 篇论文如下:

[00:21] 🛡 OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows(OS-Sentinel:在真实工作流中通过混合验证提升移动GUI代理安全性)

[01:13] 🧠 ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning(ThinkMorph:多模态交错思维链中的涌现特性)

[01:49] ⚔ INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats(INT对决FP:细粒度低比特量化格式的综合研究)

[02:38] 🤖 $π_\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models(π_RL:面向流式视觉-语言-动作模型的在线强化学习微调)

[03:26] 🚀 Continuous Autoregressive Language Models(连续自回归语言模型)

[03:54] 🧭 Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning(Spatial-SSRL:通过自监督强化学习增强空间理解)

[04:37] 🎯 HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration(HyperClick:通过不确定性校准推动可靠GUI定位)

[05:15] 🎯 Defeating the Training-Inference Mismatch via FP16(用FP16打败训练-推理失配)

[05:52] 🪜 Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals(分阶段DMD:在子区间内做分数匹配实现少步分布匹配蒸馏)

[06:28] 🧭 Revisiting Multimodal Positional Encoding in Vision-Language Models(再探视觉-语言模型中的多模态位置编码)

[07:09] ⚡ Higher-order Linear Attention(高阶线性注意力机制)

[07:55] 🌐 Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model(双流扩散助力世界模型增强视觉-语言-动作模型)

[08:36] 🔬 The Denario project: Deep knowledge AI agents for scientific discovery(Denario项目:面向科学发现的深度知识AI智能体)

[09:14] 🎯 Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning(面向具身决策的多模态大模型视觉后门攻击:对比触发学习方法)

[09:51] 🏙 Mask-to-Height: A YOLOv11-Based Architecture for Joint Building Instance Segmentation and Height Classification from Satellite Imagery(Mask-to-Height:基于YOLOv11的联合建筑实例分割与高度分类架构)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递