HuggingFace 每日AI论文速递

duan
HuggingFace 每日AI论文速递

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. 1 DAY AGO

    2025.01.17 | OmniThink提升机器写作深度与新颖性,扩散模型推理扩展提升生成质量。

    本期的 12 篇论文如下: [00:26] 🧠 OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking(OmniThink:通过思考扩展机器写作的知识边界) [01:06] 🔍 Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps(扩散模型推理时扩展:超越去噪步骤的扩展) [01:37] 🩺 Exploring the Inquiry-Diagnosis Relationship with Advanced Patient Simulators(探索高级患者模拟器中的问诊与诊断关系) [02:09] 🎨 SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces(SynthLight:基于扩散模型的人像重光照技术——通过重新渲染合成人脸学习) [02:48] 🤖 FAST: Efficient Action Tokenization for Vision-Language-Action Models(FAST:视觉-语言-动作模型的高效动作标记化方法) [03:23] 🔍 Learnings from Scaling Visual Tokenizers for Reconstruction and Generation(从视觉分词器的扩展中学习重建与生成) [04:01] 🧠 Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models(迈向大型推理模型:基于大语言模型的强化推理研究综述) [04:35] 🧹 The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating Large Language Models(堆:一个无污染的多语言代码数据集用于评估大型语言模型) [05:15] 🤖 RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation(RLHS:通过事后模拟缓解RLHF中的错位问题) [05:54] 🎨 AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation(AnyStory:面向统一单主体与多主体个性化的文本到图像生成) [06:36] 🎨 CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation(CaPa:用于高效4K纹理网格生成的雕刻与绘制合成框架) [07:18] 🎥 Do generative video models learn physical principles from watching videos?(生成视频模型是否通过观看视频学习物理原理?) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    8 min
  2. 2 DAYS AGO

    2025.01.16 | MMDocIR推动多模态检索标准化,CityDreamer4D创新4D城市生成模型。

    本期的 9 篇论文如下: [00:25] 📊 MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents(MMDocIR:长文档多模态检索的基准测试) [01:06] 🏙 CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities(CityDreamer4D:无界4D城市的组合生成模型) [01:49] 🎥 RepVideo: Rethinking Cross-Layer Representation for Video Generation(RepVideo:重新思考视频生成中的跨层表示) [02:30] 📚 Towards Best Practices for Open Datasets for LLM Training(面向LLM训练的最佳开放数据集实践) [03:11] 🎵 XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework(XMusic:迈向通用且可控的符号音乐生成框架) [03:46] 🔒 Trusted Machine Learning Models Unlock Private Inference for Problems Currently Infeasible with Cryptography(可信机器学习模型解锁当前密码学无法解决的隐私推理问题) [04:23] 🔍 Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding(参数倒置图像金字塔网络用于视觉感知与多模态理解) [05:03] 🎨 Multimodal LLMs Can Reason about Aesthetics in Zero-Shot(多模态大语言模型在零样本条件下对美学的推理能力) [05:39] 🎥 Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion(Ouroboros-Diffusion:探索无调优长视频扩散中的一致内容生成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    7 min
  3. 3 DAYS AGO

    2025.01.15 | MiniMax-01扩展基础模型处理长上下文,填充符在T2I模型中影响图像生成。

    本期的 15 篇论文如下: [00:23] ⚡ MiniMax-01: Scaling Foundation Models with Lightning Attention(MiniMax-01:基于闪电注意力机制扩展基础模型) [01:04] 🖼 Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models(填充符:T2I模型中填充符的机制分析) [01:44] 🎨 MangaNinja: Line Art Colorization with Precise Reference Following(MangaNinja:基于精确参考跟随的线稿上色) [02:21] 🧬 A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following(基于指令跟随的多模态AI副驾驶用于单细胞分析) [02:57] 🎥 Diffusion Adversarial Post-Training for One-Step Video Generation(扩散对抗后训练用于一步视频生成) [03:35] 🎲 PokerBench: Training Large Language Models to become Professional Poker Players(PokerBench:训练大型语言模型成为专业扑克玩家) [04:11] 🎨 FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors(FramePainter:赋予交互式图像编辑视频扩散先验) [04:52] 🎨 Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens(使用紧凑的文本感知一维标记实现文本到图像掩码生成模型的民主化) [05:30] 🔍 Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks(Omni-RGPT:通过标记统一图像和视频的区域级理解) [06:07] 🔍 Enhancing Automated Interpretability with Output-Centric Feature Descriptions(通过输出中心特征描述增强自动可解释性) [06:49] 📚 OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training(OpenCSG中文语料库:一系列用于大语言模型训练的高质量中文数据集) [07:27] 📹 Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding(Tarsier2:从详细视频描述到全面视频理解的大型视觉语言模型进阶) [08:04] 🤔 HALoGEN: Fantastic LLM Hallucinations and Where to Find Them(HALoGEN:大型语言模型的幻觉及其发现之处) [08:43] 🤖 Potential and Perils of Large Language Models as Judges of Unstructured Textual Data(大型语言模型作为非结构化文本数据评判者的潜力与风险) [09:23] 🚫 AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages(AfriHate:非洲语言中仇恨言论和侮辱性语言的多语言数据集集合) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    10 min
  4. 4 DAYS AGO

    2025.01.14 | 数学推理提升,内存开销减少

    本期的 11 篇论文如下: [00:24] 📊 The Lessons of Developing Process Reward Models in Mathematical Reasoning(数学推理中过程奖励模型开发的经验教训) [01:10] 🧠 Tensor Product Attention Is All You Need(张量积注意力机制是关键) [01:53] 🤖 $\text{Transformer}^2$: Self-adaptive LLMs(Transformer²:自适应大型语言模型) [02:34] 🎥 VideoAuteur: Towards Long Narrative Video Generation(视频导演:面向长篇叙事视频生成) [03:22] 🌐 WebWalker: Benchmarking LLMs in Web Traversal(WebWalker:在网页遍历中评估大语言模型) [04:08] 🩺 O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning(O1复现之旅 -- 第三部分:医疗推理的推理时间扩展) [04:50] 🗣 MinMo: A Multimodal Large Language Model for Seamless Voice Interaction(MinMo:一种用于无缝语音交互的多模态大型语言模型) [05:41] 🔧 SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training(SPAM:带动量重置的尖峰感知Adam优化器用于稳定LLM训练) [06:25] 🩺 BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature(BIOMEDICA:一个开放的生物医学图像-文本档案、数据集及从科学文献中衍生出的视觉语言模型) [07:15] 🧪 ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning(ChemAgent:大型语言模型中自更新库提升化学推理能力) [07:51] 🌐 UnCommon Objects in 3D(三维中的不常见物体) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    9 min
  5. 5 DAYS AGO

    2025.01.13 | OmniManip实现通用机器人操作,VideoRAG提升视频检索生成性能。

    本期的 10 篇论文如下: [00:24] 🤖 OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints(OmniManip:通过以对象为中心的交互原语作为空间约束实现通用机器人操作) [01:02] 🎥 VideoRAG: Retrieval-Augmented Generation over Video Corpus(VideoRAG:基于视频语料库的检索增强生成) [01:38] 🎥 OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?(OVO-Bench:你的视频大语言模型离现实世界在线视频理解还有多远?) [02:26] 🧠 LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs(LlamaV-o1:重新思考大语言模型中的逐步视觉推理) [03:01] 🧠 Enabling Scalable Oversight via Self-Evolving Critic(通过自进化批评实现可扩展监督) [03:34] 🎥 ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning(ConceptMaster:无需测试时调优的扩散变换器模型上的多概念视频定制) [04:09] 🎥 Multi-subject Open-set Personalization in Video Generation(多主体开放集个性化视频生成) [04:47] 🔍 ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding(ReFocus:视觉编辑作为结构化图像理解的思维链) [05:23] 🤖 Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains(多智能体微调:通过多样化推理链实现自我改进) [06:00] 🦠 Infecting Generative AI With Viruses(感染生成式人工智能的病毒) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    7 min

About

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

You Might Also Like

To listen to explicit episodes, sign in.

Stay up to date with this show

Sign in or sign up to follow shows, save episodes, and get the latest updates.

Select a country or region

Africa, Middle East, and India

Asia Pacific

Europe

Latin America and the Caribbean

The United States and Canada