HuggingFace 每日AI论文速递

duan
HuggingFace 每日AI论文速递

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. 1 DAY AGO

    2025.02.21 | AI代理评估新框架,LLM学科表现差异显著。

    本期的 20 篇论文如下: [00:26] 🧠 MLGym: A New Framework and Benchmark for Advancing AI Research Agents(MLGym:推进AI研究代理的新框架与基准) [01:18] 📚 SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines(SuperGPQA:扩展LLM评估至285个研究生学科) [02:04] 🌐 SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features(SigLIP 2:多语言视觉-语言编码器的语义理解、定位与密集特征改进) [02:52] 🧠 How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?(在不损害大型语言模型的情况下,LoRA适配器能容纳多少知识?) [03:49] 🚀 S*: Test Time Scaling for Code Generation(S*:代码生成中的测试时间缩放) [04:35] ⏳ Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information(时间是否有其位置?时间头:语言模型如何回忆时间特定信息) [05:28] 📄 LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models(LongWriter-V:在视觉-语言模型中实现超长和高保真生成) [06:17] 🧠 Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning(逻辑-RL:通过基于规则的强化学习释放LLM推理能力) [07:13] 🖥 PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC(PC-Agent:一种用于复杂任务自动化在PC上的分层多智能体协作框架) [08:07] 🧠 S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning(S$^2$R:通过强化学习教导大语言模型自我验证与自我修正) [09:01] 🧠 Discovering highly efficient low-weight quantum error-correcting codes with reinforcement learning(利用强化学习发现高效低权重量子纠错码) [09:55] 🎥 Dynamic Concepts Personalization from Single Videos(单视频动态概念个性化) [10:38] 🖼 Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation(通过代码引导的合成多模态数据生成扩展文本丰富的图像理解) [11:23] 🌍 NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization(NAVIG:基于自然语言引导的视觉语言模型用于图像地理定位分析) [12:13] 🧠 AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO(AlphaMaze:通过GRPO提升大型语言模型的空间智能) [13:06] 🌍 How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild(LLMs在多语言环境下的幻觉现象研究:在野外场景中的多语言幻觉估计) [13:52] 🌍 Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework(基于真实人类游戏数据的 geolocation:大规模数据集与人类推理框架) [14:55] 🌐 RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers(RelaCtrl:引导相关性的高效控制扩散变换器) [15:54] 🧠 Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data(增强多模态基础模型的认知与可解释性通过自合成数据) [16:41] 🤖 LLM-based User Profile Management for Recommender System(基于大语言模型的推荐系统用户画像管理) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    18 min
  2. 2 DAYS AGO

    2025.02.20 | 提升视觉感知,强化自动驾驶安全。

    本期的 20 篇论文如下: [00:24] 🌐 Qwen2.5-VL Technical Report(Qwen2.5-VL 技术报告) [01:10] 🚗 RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning(RAD:基于大规模3DGS强化学习的端到端驾驶策略训练) [01:50] 🎶 SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation(SongGen:用于文本到歌曲生成的单阶段自回归Transformer) [02:38] 🧠 MoM: Linear Sequence Modeling with Mixture-of-Memories(MoM:结合记忆混合的线性序列建模) [03:15] 🌐 Craw4LLM: Efficient Web Crawling for LLM Pretraining(Craw4LLM:面向LLM预训练的高效网页爬取方法) [04:05] 🧠 LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization(LongPO:通过短至长偏好优化实现大型语言模型的长上下文自进化) [04:45] 🤔 Small Models Struggle to Learn from Strong Reasoners(小型模型难以从强推理者中学习) [05:27] ⚙ Autellix: An Efficient Serving Engine for LLM Agents as General Programs(Autellix:一种用于LLM代理作为通用程序的高效服务引擎) [06:08] 🌍 Presumed Cultural Identity: How Names Shape LLM Responses(假定的文化身份:名字如何塑造LLM的回应) [06:53] 🚨 Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template Region(为什么安全保障的船只也会搁浅?对齐的大型语言模型安全机制往往锚定在模板区域) [07:38] 🩺 SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering?(搜索RAG:搜索引擎能否助力基于LLM的医疗问答?) [08:21] 🧠 Thinking Preference Optimization(思考偏好优化) [08:59] 🧠 Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering(那是你的最终答案吗?测试时缩放提升选择性问答) [09:40] 🧠 AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence(自适应步骤:通过模型置信度自动划分推理步骤) [10:21] 🧬 NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation(NExT-Mol:3D扩散与1D语言建模结合的3D分子生成) [11:02] 🧩 ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation(ActionPiece:面向生成推荐的上下文感知行为序列标记化) [11:44] 🧠 Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models(小模型训练,大模型推理:用于大型语言模型的内存高效LoRA训练) [12:33] 🌍 GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking(GIMMICK -- 全球包容性多模态多任务文化知识基准测试) [13:19] 🤖 InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning(InfiR:构建高效的小型语言模型和多模态小型语言模型用于推理) [14:06] 🔊 Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective(噪声可能包含可转移的知识:从实证角度理解半监督异构域适应) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    15 min
  3. 3 DAYS AGO

    2025.02.19 | 数据高效语音处理,嵌入空间压缩创新。

    本期的 20 篇论文如下: [00:25] 🎙 Soundwave: Less is More for Speech-Text Alignment in LLMs(声波:减少数据需求,优化语音与文本对齐在LLMs中的应用) [01:05] 🔍 Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity(将1568个Token压缩到一个向量并再次解压:探索嵌入空间容量的极限) [01:48] 🌊 Continuous Diffusion Model for Language Modeling(连续扩散模型用于语言建模) [02:30] 🎥 Phantom: Subject-consistent video generation via cross-modal alignment(幻影:通过跨模态对齐实现主体一致性视频生成) [03:12] 🧠 Rethinking Diverse Human Preference Learning through Principal Component Analysis(重新思考通过主成分分析进行多样化人类偏好学习) [04:00] 🤖 SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation(SoFar:语言引导的方向桥接空间推理与对象操作) [04:36] 🛡 SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models(SafeRoute:大型语言模型中高效且准确的安全防护栏的自适应模型选择) [05:25] 🐍 Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation(多模态Mamba:通过二次到线性蒸馏的解码器多模态状态空间模型) [06:08] 📚 You Do Not Fully Utilize Transformer's Representation Capacity(你没有充分利用Transformer的表示能力) [06:50] 🤖 Magma: A Foundation Model for Multimodal AI Agents(熔岩:多模态AI代理的基础模型) [07:23] 💹 FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading(FLAG-Trader:融合LLM与基于梯度的强化学习用于金融交易) [08:08] 📄 RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm(RealSyn:一种有效且可扩展的多模态交错文档转换范式) [08:49] 🧠 PAFT: Prompt-Agnostic Fine-Tuning(PAFT:与提示无关的微调) [09:27] 🛠 OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning(OctoTools:一个具有扩展工具的复杂推理代理框架) [10:13] 📊 Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?(重新审视o1类模型的测试时缩放能力:它们是否真正具备测试时缩放能力?) [11:00] 🔄 MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections(MUDDFormer:通过多路动态密集连接打破Transformer中的残差瓶颈) [11:37] 🩺 HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation(HealthGPT:通过异构知识适应实现医疗大视觉语言模型的统一理解与生成) [12:12] 🧠 HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading(HeadInfer:通过分头卸载实现高效的LLM推理) [12:51] 🌍 Text2World: Benchmarking Large Language Models for Symbolic World Model Generation(文本到世界:大语言模型符号世界模型生成的基准测试) [13:32] 🧠 Atom of Thoughts for Markov LLM Test-Time Scaling(用于马尔可夫LLM测试时扩展的原子思维) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    15 min
  4. 4 DAYS AGO

    2025.02.18 | 稀疏注意力提升效率,机器人起身策略优化。

    本期的 29 篇论文如下: [00:23] ⚡ Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention(原生稀疏注意力:硬件对齐与原生可训练的稀疏注意力) [01:10] 🤖 Learning Getting-Up Policies for Real-World Humanoid Robots(学习真实世界人形机器人起身策略) [01:55] 🧠 ReLearn: Unlearning via Learning for Large Language Models(ReLearn:通过学习实现大型语言模型的遗忘) [02:35] 💻 SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?(SWE-Lancer:前沿大语言模型能否从真实世界的自由软件工程中赚取100万美元?) [03:21] 🌐 HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation(赫尔墨斯流:无缝衔接多模态理解和生成) [03:58] 🧠 How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training(大型语言模型如何获取新知识?知识电路视角下的持续预训练) [04:33] 🤖 SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors(SURGE:关于大型语言模型作为通用代理代码执行器的潜力) [05:12] 🔧 Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening(扩散锐化:利用去噪轨迹锐化优化扩散模型微调) [05:55] 🧠 I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models(我思故我扩散:在扩散模型中实现多模态上下文推理) [06:38] 🔧 SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL(SAFE-SQL:基于细粒度示例选择的自增强上下文学习用于文本到SQL转换) [07:25] 🧠 CRANE: Reasoning with constrained LLM generation(CRANE:受限LLM生成的推理) [08:07] 🧠 Intuitive physics understanding emerges from self-supervised pretraining on natural videos(直觉物理理解从自然视频的自监督预训练中涌现) [08:46] 🐦 Cuckoo: An IE Free Rider Hatched by Massive Nutrition in LLM's Nest(杜鹃:在大型语言模型的巢中孵化出的信息抽取搭便车者) [09:22] 🧠 Dyve: Thinking Fast and Slow for Dynamic Process Verification(Dyve:动态过程验证中的快思与慢想) [10:06] 🧠 PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning(物理推理:基于物理推理的综合基准) [10:53] 🤖 System Message Generation for User Preferences using Open-Source Models(基于开源模型的用户偏好系统消息生成) [11:38] 🎥 video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model(视频-SALMONN-o1:推理增强的音视频大型语言模型) [12:33] 🧠 Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarsity(构建一个在数据稀缺情况下比GPT-4o好64%的证明导向程序员) [13:11] 🤖 Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning(记忆、基准与机器人:一种用于强化学习解决复杂任务的基准) [13:52] 🤖 MagicArticulate: Make Your 3D Models Articulation-Ready(魔法清晰:让你的3D模型准备好关节动画) [14:37] 🤖 Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems(结构化交流,层次化行动:LLM多智能体系统的协作框架) [15:21] 🧠 One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs(一个示例展示,多个概念知晓!数学大语言模型中的反例驱动概念推理) [16:03] 🤖 Can a Single Model Master Both Multi-turn Conversations and Tool Use? CALM: A Unified Conversational Agentic Language Model(单一模型能否同时掌握多轮对话与工具使用?CALM:一个统一的对话代理语言模型) [16:40] 🚀 Better Embeddings with Coupled Adam(结合Adam优化器的更好嵌入) [17:18] 🧐 Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking(展示工作:事实核查员对可解释自动化事实核查的需求) [17:56] 🧪 Towards Data-Efficient Pretraining for Atomic Property Prediction(面向原子性质预测的数据高效预训练) [18:46] 🌀 The Mirage of Model Editing: Revisiting Evaluation in the Wild(模型编辑的幻象:重新审视实际应用中的评估) [19:31] 🧮 Large Language Models and Mathematical Reasoning Failures(大型语言模型与数学推理失败) [20:11] 📊 Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance(语言复杂度测量作为评估LLM性能的噪声零样本代理) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    21 min
  5. 5 DAYS AGO

    2025.02.17 | RAS加速扩散变换器,视频生成提升质量

    本期的 21 篇论文如下: [00:22] 🌐 Region-Adaptive Sampling for Diffusion Transformers(区域自适应采样扩散变换器) [01:05] 🎥 Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model(步进视频生成技术报告:视频基础模型的实践、挑战与未来) [01:48] 🌊 Large Language Diffusion Models(大规模语言扩散模型) [02:31] 🧠 ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models(零基准:当代大型多模态模型的不可视觉基准) [03:15] 🌟 MM-RLHF: The Next Step Forward in Multimodal LLM Alignment(MM-RLHF:多模态大语言模型对齐的下一步进展) [03:58] 🖼 Precise Parameter Localization for Textual Generation in Diffusion Models(扩散模型中文本生成精确参数定位) [04:40] 🧠 Diverse Inference and Verification for Advanced Reasoning(高级推理的多重推断与验证) [05:22] 🧬 DarwinLM: Evolutionary Structured Pruning of Large Language Models(达尔文LM:大型语言模型的进化结构剪枝) [06:02] 📈 AdaPTS: Adapting Univariate Foundation Models to Probabilistic Multivariate Time Series Forecasting(AdaPTS:将单变量基础模型适配到概率性多变量时间序列预测) [06:40] 🖼 ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation(ImageRAG:动态图像检索用于引导图像生成) [07:23] 🤖 We Can't Understand AI Using our Existing Vocabulary(我们无法用现有词汇理解人工智能) [08:03] 📊 FoNE: Precise Single-Token Number Embeddings via Fourier Features(FoNE:通过傅里叶特征实现精确的单标记数字嵌入) [08:53] 🌍 Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages(小模型,大影响:面向低资源语言的多语言小模型的有效语料库与基于图的适应) [09:41] 🔓 Jailbreaking to Jailbreak(越狱以越狱) [10:23] 🤖 STMA: A Spatio-Temporal Memory Agent for Long-Horizon Embodied Task Planning(STMA:一种用于长时程具身任务规划的时空记忆代理) [11:05] 📊 Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding(文本引导的稀疏体素剪枝用于高效的三维视觉定位) [11:41] ⚡ MRS: A Fast Sampler for Mean Reverting Diffusion based on ODE and SDE Solvers(基于ODE和SDE求解器的均值回归扩散快速采样器) [12:26] 🚗 V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models(V2V-LLM:基于多模态大语言模型的车辆间协同自动驾驶) [13:06] 🎵 CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages(CLaMP 3:跨模态与跨语言的通用音乐信息检索) [13:49] 🧩 Cluster and Predict Latents Patches for Improved Masked Image Modeling(基于聚类与预测潜在补丁的改进掩码图像建模) [14:31] 🧬 Agentic End-to-End De Novo Protein Design for Tailored Dynamics Using a Language Diffusion Model(基于语言扩散模型的端到端从头蛋白质设计以实现定制动力学) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    16 min
  6. FEB 14

    2025.02.14 | GPU扩展至300万tokens,文本编码器内存高效策略。

    本期的 18 篇论文如下: [00:21] 🚀 InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU(InfiniteHiP:在单个GPU上扩展语言模型上下文至300万 tokens) [01:07] 🖼 Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation(Skrr:跳过并重用文本编码器层以实现内存高效文本到图像生成) [01:49] 🧠 An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging(一个开放的方案:通过模型合并在一日内将语言特定LLM适应为推理模型) [02:31] 📚 SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models(SelfCite:大语言模型中上下文归属的自监督对齐方法) [03:14] 🐕 Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights(该模型也能识别狗吗?基于权重的零样本模型搜索) [03:56] 🌐 Exploring the Potential of Encoder-free Architectures in 3D LMMs(探索无编码器架构在三维大尺度多模态模型中的潜力) [04:39] 🎭 CoSER: Coordinating LLM-Based Persona Simulation of Established Roles(协同角色模拟:基于大语言模型的角色扮演语言代理) [05:26] 🌐 TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models(TripoSG:使用大规模校正流模型生成高保真3D形状) [06:09] 🤖 EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents(EmbodiedBench:全面评估视觉驱动具身智能体多模态大语言模型) [07:00] 🌪 Typhoon T1: An Open Thai Reasoning Model(台风T1:一个开放的泰语推理模型) [07:54] 🤖 Logical Reasoning in Large Language Models: A Survey(大型语言模型中的逻辑推理:综述) [08:36] 🧠 MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency(MME-CoT:评估大型多模态模型中链式思维推理质量、鲁棒性和效率) [09:23] 🧠 CoT-Valve: Length-Compressible Chain-of-Thought Tuning(长度可压缩的链式思维调优) [10:11] 🤖 SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models(SQuARE:增强大型语言模型链式思考的顺序问答推理引擎) [10:52] 🌐 mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data(mmE5:通过高质量合成数据改进多模态多语言嵌入) [11:36] 🦜 The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding(随机鹦鹉在大语言模型肩上:物理概念理解的总结性评估) [12:18] 🤖 DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References(DexTrack:面向人类参考的灵巧操作通用神经跟踪控制) [13:00] 🔍 3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly(3CAD:一个大规模真实3C产品数据集用于无监督异常检测) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    14 min

Ratings & Reviews

5
out of 5
2 Ratings

About

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

You Might Also Like

To listen to explicit episodes, sign in.

Stay up to date with this show

Sign in or sign up to follow shows, save episodes, and get the latest updates.

Select a country or region

Africa, Middle East, and India

Asia Pacific

Europe

Latin America and the Caribbean

The United States and Canada