HuggingFace 每日AI论文速递

duan

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. 21 ชม. ที่แล้ว

    2025.07.31 | ScreenCoder自动化UI转代码;Falcon-H1混合架构,提升长序列效率。

    本期的 9 篇论文如下: [00:22] 💻 ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents(ScreenCoder:模块化多模态智能体赋能前端视觉代码生成) [01:02] 🚀 Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance(Falcon-H1:重塑效率与性能的混合架构语言模型系列) [01:33] 💥 BANG: Dividing 3D Assets via Generative Exploded Dynamics(BANG:基于生成式爆炸动态的三维资产分解) [02:17] 🧠 VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning(VL-Cogito:面向高级多模态推理的渐进式课程强化学习) [02:51] 🚁 Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision(弱监督下航空影像车辆检测器在未知领域的适配) [03:34] 🧩 Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation(迈向指代性音视频分割中的全模态表达与推理) [04:04] 🚀 Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning(基于强化学习的大语言模型高效差分隐私微调) [04:56] 🛠 Repair-R1: Better Test Before Repair(Repair-R1:修复前先测试,效果更佳) [05:33] 🌍 MetaCLIP 2: A Worldwide Scaling Recipe(MetaCLIP 2:全球规模化训练方案) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    6 นาที
  2. 1 วันที่แล้ว

    2025.07.30 | 混元世界从文字像素生成沉浸3D世界;X-Omni用强化学习提升图像生成质量。

    本期的 8 篇论文如下: [00:23] 🌍 HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels(混元世界 1.0:从文字或像素生成沉浸式、可探索、可交互的3D世界) [00:56] ✨ X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again(X-Omni:强化学习让离散自回归图像生成模型再展辉煌) [01:59] 🚀 CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning(CUDA-L1:通过对比强化学习改进CUDA优化) [02:43] ✨ MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge(MaPPO:结合先验知识的最大后验偏好优化) [03:32] 🐾 AnimalClue: Recognizing Animals by their Traces(AnimalClue:通过痕迹识别动物) [04:04] 🏃 MOVE: Motion-Guided Few-Shot Video Object Segmentation(MOVE:运动引导的少样本视频目标分割) [04:31] 🤥 MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions(MoHoBench:通过无法回答的视觉问题评估多模态大语言模型的诚实性) [04:59] 🐘 Evaluating Deep Learning Models for African Wildlife Image Classification: From DenseNet to Vision Transformers(评估用于非洲野生动物图像分类的深度学习模型:从DenseNet到视觉Transformer) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    6 นาที
  3. 2 วันที่แล้ว

    2025.07.29 | ARPO提升LLM工具交互性能;ARC-Hunyuan-Video-7B深耕短视频理解。

    本期的 15 篇论文如下: [00:23] 🤖 Agentic Reinforced Policy Optimization(智能体强化策略优化) [00:55] 🧠 ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts(ARC-Hunyuan-Video-7B:真实世界短视频的结构化理解) [01:35] 🚀 Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning(Rep-MTL:释放表示层任务显著性在多任务学习中的力量) [02:03] 🌐 Reconstructing 4D Spatial Intelligence: A Survey(重建4D空间智能:一项综述) [02:55] 💡 SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment(SmallThinker:原生为本地部署而训练的高效大型语言模型家族) [03:35] 🚀 A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence(自进化智能体综述:通往人工超级智能之路) [04:17] ⚖ Geometric-Mean Policy Optimization(几何平均策略优化) [04:59] 🎯 Region-based Cluster Discrimination for Visual Representation Learning(面向视觉表征学习的区域聚类判别) [05:38] ✨ GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset(GPT-IMAGE-EDIT-1.5M:一个百万规模的GPT生成图像数据集) [06:18] 🚀 UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities(UloRL:一种提升大型语言模型推理能力的超长输出强化学习方法) [06:47] ⚡ Met$^2$Net: A Decoupled Two-Stage Spatio-Temporal Forecasting Model for Complex Meteorological Systems(Met$^2$Net:一种针对复杂气象系统的解耦两阶段时空预测模型) [07:18] ✨ ForCenNet: Foreground-Centric Network for Document Image Rectification(ForCenNet:面向前景的文档图像矫正网络) [07:52] 🎨 ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment(ScenePainter:基于概念关系对齐的语义一致永续三维场景生成) [08:43] 🏆 Music Arena: Live Evaluation for Text-to-Music(Music Arena:文本到音乐的实时评估) [09:13] 🎶 JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment(JAM:一个具有细粒度可控性和审美对齐的微型基于流的歌曲生成器) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 นาที
  4. 6 วันที่แล้ว

    2025.07.25 | GSPO解决大模型训练崩溃;MUR提升LLM推理效率。

    本期的 15 篇论文如下: [00:24] 🚀 Group Sequence Policy Optimization(组序列策略优化) [00:53] 🧠 MUR: Momentum Uncertainty guided Reasoning for Large Language Models(MUR:面向大型语言模型的动量不确定性引导推理) [01:30] 🧠 LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization(LAPO:内化推理效率的长度自适应策略优化) [02:09] 🎬 Captain Cinema: Towards Short Movie Generation(电影队长:迈向短片电影生成) [02:58] 📈 TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation(TTS-VAR:一种用于视觉自回归生成的测试时缩放框架) [03:36] 🌍 EarthCrafter: Scalable 3D Earth Generation via Dual-Sparse Latent Diffusion(EarthCrafter:通过双稀疏潜在扩散实现可扩展三维地球生成) [04:23] 💡 Hierarchical Budget Policy Optimization for Adaptive Reasoning(用于自适应推理的分层预算策略优化) [04:48] 🔄 DriftMoE: A Mixture of Experts Approach to Handle Concept Drifts(DriftMoE:一种处理概念漂移的混合专家方法) [05:17] 🚀 Technical Report of TeleChat2, TeleChat2.5 and T1(TeleChat2、TeleChat2.5和T1技术报告) [06:00] 📈 DMOSpeech 2: Reinforcement Learning for Duration Prediction in Metric-Optimized Speech Synthesis(DMOSpeech 2:度量优化语音合成中时长预测的强化学习) [06:31] ✨ A New Pair of GloVes(新一代GloVe模型) [07:10] 🚀 GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface(GLiNER2:一个高效多任务模式驱动的信息抽取系统) [07:38] ⚡ TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance(TeEFusion:融合文本嵌入以蒸馏无分类器引导) [08:22] ⚕ SegDT: A Diffusion Transformer-Based Segmentation Model for Medical Imaging(SegDT:一个基于扩散Transformer的医学影像分割模型) [08:52] 🧩 Discovering and using Spelke segments(发现与应用 Spelke 分割) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    10 นาที
  5. 25 ก.ค.

    2025.07.24 | MLLMs视觉感知仍不足;Yume模型可生成交互虚拟世界。

    本期的 9 篇论文如下: [00:23] 👁 Pixels, Patterns, but No Poetry: To See The World like Humans(像素、模式,却无诗意:像人类一样感知世界) [00:56] 🌌 Yume: An Interactive World Generation Model(Yume:交互式世界生成模型) [01:29] ✨ DesignLab: Designing Slides Through Iterative Detection and Correction(DesignLab:通过迭代检测与修正进行幻灯片设计) [02:14] 🧠 Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning(一个领域能否助益其他领域?一项以数据为中心的多领域强化学习推理研究) [02:59] ✅ Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny(Re:Form:在LLM中利用强化学习减少可扩展形式化软件验证中的人类先验——基于Dafny的初步研究) [03:35] 🔍 RAVine: Reality-Aligned Evaluation for Agentic Search(RAVine:面向代理式搜索的现实对齐评估) [04:13] ⚡ Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention(Ultra3D:采用部分注意力的高效高保真3D生成) [04:59] ✨ Elevating 3D Models: High-Quality Texture and Geometry Refinement from a Low-Quality Model(提升3D模型:从低质量模型实现高质量纹理与几何精修) [05:31] 🔍 Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed(寻找多莉:文本到图像扩散模型中的记忆化比假设的局部性更低) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    7 นาที
  6. 24 ก.ค.

    2025.07.23 | TIM模型突破LLM上下文限制;Step-Audio 2提升多模态语音对话。

    本期的 15 篇论文如下: [00:24] ♾ Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning(超越上下文限制:用于长程推理的潜意识线索) [01:05] 🔊 Step-Audio 2 Technical Report(Step-Audio 2 技术报告) [01:41] 🚀 MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning(MegaScience:推动科学推理后训练数据集的前沿) [02:23] ⚡ Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers(上采样重要区域:用于加速扩散Transformer的区域自适应潜在采样) [03:17] 🧠 Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning(面向视觉-语言慢思考推理的半离线策略强化学习) [03:56] 🧩 Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning(Zebra-CoT:一个用于交错式视觉语言推理的数据集) [04:36] 🤔 ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning(ThinkAct:基于强化视觉潜在规划的视觉-语言-动作推理) [05:03] 🤖 Experience is the Best Teacher: Grounding VLMs for Robotics through Self-Generated Memory(经验是最好的老师:通过自生成记忆将视觉语言模型应用于机器人领域) [05:56] ✨ HOComp: Interaction-Aware Human-Object Composition(HOComp:交互感知的人物-物体合成) [06:54] 🧐 RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback(RefCritic:利用精炼反馈训练长思维链评论模型) [07:36] 🚀 Task-Specific Zero-shot Quantization-Aware Training for Object Detection(面向目标检测的任务特异性零样本量化感知训练) [08:06] 🔍 SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search(SPAR: 基于LLM代理的学术论文检索,增强学术搜索能力) [08:35] ⚠ Does More Inference-Time Compute Really Help Robustness?(推理时计算量增加真的有助于提升鲁棒性吗?) [09:16] 🧭 Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning(概念消融微调:引导域外泛化) [10:02] 🧠 ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting(ObjectGS:基于高斯泼溅的对象感知场景重建与场景理解) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 นาที

เกี่ยวกับ

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

รายการที่คุณน่าจะชอบ