HuggingFace 每日AI论文速递

2025.10.28 | Point Transformer无标对齐长空间;代码递归统一粗细粒度

本期的 15 篇论文如下:

[00:23] 🎼 Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations(Concerto:2D-3D联合自监督学习涌现空间表征)

[01:06] 🧩 ReCode: Unify Plan and Action for Universal Granularity Control(ReCode:用递归代码统一规划与行动,实现通用粒度控制)

[01:44] 🤖 A Survey of Data Agents: Emerging Paradigm or Overstated Hype?(数据智能体全景透视:新范式还是泡沫?)

[02:23] 🌾 FARMER: Flow AutoRegressive Transformer over Pixels(基于像素流自回归变换器的可逆生成模型)

[03:07] 🤖 VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting(VITA-E:能同时看、听、说、做的自然具身交互框架)

[03:45] 🎭 Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation(前瞻锚定:在音频驱动人体动画中保持角色身份)

[04:17] 🤖 ACG: Action Coherence Guidance for Flow-based VLA models(面向流式VLA模型的动作连贯性引导)

[04:56] 🔍 $\text{E}^2\text{Rank}$: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker(E²Rank:你的文本嵌入也能成为高效列表级重排器)

[05:40] 🌐 Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences(全模态奖励模型:用自由格式偏好迈向通用奖励建模)

[06:30] 🔍 PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity(PixelRefer:任意粒度时空目标指代的统一框架)

[07:06] 🧠 Knocking-Heads Attention(敲头注意力:让多头彼此“敲一敲”)

[07:42] 🧩 IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction(IGGT:面向语义三维重建的实例锚定几何Transformer)

[08:30] 🎯 The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation(多选一最优:用max@k优化将强化学习与Best-of-N采样对齐)

[09:14] 🥯 LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation(LightBagel:面向统一多模态理解与生成的轻量级双重融合框架)

[09:51] 🧠 LimRank: Less is More for Reasoning-Intensive Information Reranking(LimRank:少即是多的推理密集型信息重排序)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递