HuggingFace 每日AI论文速递

duan
HuggingFace 每日AI论文速递

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. 21 HR. AGO

    2025.02.20 | 提升视觉感知,强化自动驾驶安全。

    本期的 20 篇论文如下: [00:24] 🌐 Qwen2.5-VL Technical Report(Qwen2.5-VL 技术报告) [01:10] 🚗 RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning(RAD:基于大规模3DGS强化学习的端到端驾驶策略训练) [01:50] 🎶 SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation(SongGen:用于文本到歌曲生成的单阶段自回归Transformer) [02:38] 🧠 MoM: Linear Sequence Modeling with Mixture-of-Memories(MoM:结合记忆混合的线性序列建模) [03:15] 🌐 Craw4LLM: Efficient Web Crawling for LLM Pretraining(Craw4LLM:面向LLM预训练的高效网页爬取方法) [04:05] 🧠 LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization(LongPO:通过短至长偏好优化实现大型语言模型的长上下文自进化) [04:45] 🤔 Small Models Struggle to Learn from Strong Reasoners(小型模型难以从强推理者中学习) [05:27] ⚙ Autellix: An Efficient Serving Engine for LLM Agents as General Programs(Autellix:一种用于LLM代理作为通用程序的高效服务引擎) [06:08] 🌍 Presumed Cultural Identity: How Names Shape LLM Responses(假定的文化身份:名字如何塑造LLM的回应) [06:53] 🚨 Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template Region(为什么安全保障的船只也会搁浅?对齐的大型语言模型安全机制往往锚定在模板区域) [07:38] 🩺 SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering?(搜索RAG:搜索引擎能否助力基于LLM的医疗问答?) [08:21] 🧠 Thinking Preference Optimization(思考偏好优化) [08:59] 🧠 Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering(那是你的最终答案吗?测试时缩放提升选择性问答) [09:40] 🧠 AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence(自适应步骤:通过模型置信度自动划分推理步骤) [10:21] 🧬 NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation(NExT-Mol:3D扩散与1D语言建模结合的3D分子生成) [11:02] 🧩 ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation(ActionPiece:面向生成推荐的上下文感知行为序列标记化) [11:44] 🧠 Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models(小模型训练,大模型推理:用于大型语言模型的内存高效LoRA训练) [12:33] 🌍 GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking(GIMMICK -- 全球包容性多模态多任务文化知识基准测试) [13:19] 🤖 InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning(InfiR:构建高效的小型语言模型和多模态小型语言模型用于推理) [14:06] 🔊 Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective(噪声可能包含可转移的知识:从实证角度理解半监督异构域适应) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    15 min
  2. 1 DAY AGO

    2025.02.19 | 数据高效语音处理,嵌入空间压缩创新。

    本期的 20 篇论文如下: [00:25] 🎙 Soundwave: Less is More for Speech-Text Alignment in LLMs(声波:减少数据需求,优化语音与文本对齐在LLMs中的应用) [01:05] 🔍 Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity(将1568个Token压缩到一个向量并再次解压:探索嵌入空间容量的极限) [01:48] 🌊 Continuous Diffusion Model for Language Modeling(连续扩散模型用于语言建模) [02:30] 🎥 Phantom: Subject-consistent video generation via cross-modal alignment(幻影:通过跨模态对齐实现主体一致性视频生成) [03:12] 🧠 Rethinking Diverse Human Preference Learning through Principal Component Analysis(重新思考通过主成分分析进行多样化人类偏好学习) [04:00] 🤖 SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation(SoFar:语言引导的方向桥接空间推理与对象操作) [04:36] 🛡 SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models(SafeRoute:大型语言模型中高效且准确的安全防护栏的自适应模型选择) [05:25] 🐍 Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation(多模态Mamba:通过二次到线性蒸馏的解码器多模态状态空间模型) [06:08] 📚 You Do Not Fully Utilize Transformer's Representation Capacity(你没有充分利用Transformer的表示能力) [06:50] 🤖 Magma: A Foundation Model for Multimodal AI Agents(熔岩:多模态AI代理的基础模型) [07:23] 💹 FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading(FLAG-Trader:融合LLM与基于梯度的强化学习用于金融交易) [08:08] 📄 RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm(RealSyn:一种有效且可扩展的多模态交错文档转换范式) [08:49] 🧠 PAFT: Prompt-Agnostic Fine-Tuning(PAFT:与提示无关的微调) [09:27] 🛠 OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning(OctoTools:一个具有扩展工具的复杂推理代理框架) [10:13] 📊 Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?(重新审视o1类模型的测试时缩放能力:它们是否真正具备测试时缩放能力?) [11:00] 🔄 MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections(MUDDFormer:通过多路动态密集连接打破Transformer中的残差瓶颈) [11:37] 🩺 HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation(HealthGPT:通过异构知识适应实现医疗大视觉语言模型的统一理解与生成) [12:12] 🧠 HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading(HeadInfer:通过分头卸载实现高效的LLM推理) [12:51] 🌍 Text2World: Benchmarking Large Language Models for Symbolic World Model Generation(文本到世界:大语言模型符号世界模型生成的基准测试) [13:32] 🧠 Atom of Thoughts for Markov LLM Test-Time Scaling(用于马尔可夫LLM测试时扩展的原子思维) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    15 min
  3. 2 DAYS AGO

    2025.02.18 | 稀疏注意力提升效率,机器人起身策略优化。

    本期的 29 篇论文如下: [00:23] ⚡ Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention(原生稀疏注意力:硬件对齐与原生可训练的稀疏注意力) [01:10] 🤖 Learning Getting-Up Policies for Real-World Humanoid Robots(学习真实世界人形机器人起身策略) [01:55] 🧠 ReLearn: Unlearning via Learning for Large Language Models(ReLearn:通过学习实现大型语言模型的遗忘) [02:35] 💻 SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?(SWE-Lancer:前沿大语言模型能否从真实世界的自由软件工程中赚取100万美元?) [03:21] 🌐 HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation(赫尔墨斯流:无缝衔接多模态理解和生成) [03:58] 🧠 How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training(大型语言模型如何获取新知识?知识电路视角下的持续预训练) [04:33] 🤖 SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors(SURGE:关于大型语言模型作为通用代理代码执行器的潜力) [05:12] 🔧 Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening(扩散锐化:利用去噪轨迹锐化优化扩散模型微调) [05:55] 🧠 I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models(我思故我扩散:在扩散模型中实现多模态上下文推理) [06:38] 🔧 SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL(SAFE-SQL:基于细粒度示例选择的自增强上下文学习用于文本到SQL转换) [07:25] 🧠 CRANE: Reasoning with constrained LLM generation(CRANE:受限LLM生成的推理) [08:07] 🧠 Intuitive physics understanding emerges from self-supervised pretraining on natural videos(直觉物理理解从自然视频的自监督预训练中涌现) [08:46] 🐦 Cuckoo: An IE Free Rider Hatched by Massive Nutrition in LLM's Nest(杜鹃:在大型语言模型的巢中孵化出的信息抽取搭便车者) [09:22] 🧠 Dyve: Thinking Fast and Slow for Dynamic Process Verification(Dyve:动态过程验证中的快思与慢想) [10:06] 🧠 PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning(物理推理:基于物理推理的综合基准) [10:53] 🤖 System Message Generation for User Preferences using Open-Source Models(基于开源模型的用户偏好系统消息生成) [11:38] 🎥 video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model(视频-SALMONN-o1:推理增强的音视频大型语言模型) [12:33] 🧠 Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarsity(构建一个在数据稀缺情况下比GPT-4o好64%的证明导向程序员) [13:11] 🤖 Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning(记忆、基准与机器人:一种用于强化学习解决复杂任务的基准) [13:52] 🤖 MagicArticulate: Make Your 3D Models Articulation-Ready(魔法清晰:让你的3D模型准备好关节动画) [14:37] 🤖 Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems(结构化交流,层次化行动:LLM多智能体系统的协作框架) [15:21] 🧠 One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs(一个示例展示,多个概念知晓!数学大语言模型中的反例驱动概念推理) [16:03] 🤖 Can a Single Model Master Both Multi-turn Conversations and Tool Use? CALM: A Unified Conversational Agentic Language Model(单一模型能否同时掌握多轮对话与工具使用?CALM:一个统一的对话代理语言模型) [16:40] 🚀 Better Embeddings with Coupled Adam(结合Adam优化器的更好嵌入) [17:18] 🧐 Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking(展示工作:事实核查员对可解释自动化事实核查的需求) [17:56] 🧪 Towards Data-Efficient Pretraining for Atomic Property Prediction(面向原子性质预测的数据高效预训练) [18:46] 🌀 The Mirage of Model Editing: Revisiting Evaluation in the Wild(模型编辑的幻象:重新审视实际应用中的评估) [19:31] 🧮 Large Language Models and Mathematical Reasoning Failures(大型语言模型与数学推理失败) [20:11] 📊 Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance(语言复杂度测量作为评估LLM性能的噪声零样本代理) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    21 min
  4. 3 DAYS AGO

    2025.02.17 | RAS加速扩散变换器,视频生成提升质量

    本期的 21 篇论文如下: [00:22] 🌐 Region-Adaptive Sampling for Diffusion Transformers(区域自适应采样扩散变换器) [01:05] 🎥 Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model(步进视频生成技术报告:视频基础模型的实践、挑战与未来) [01:48] 🌊 Large Language Diffusion Models(大规模语言扩散模型) [02:31] 🧠 ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models(零基准:当代大型多模态模型的不可视觉基准) [03:15] 🌟 MM-RLHF: The Next Step Forward in Multimodal LLM Alignment(MM-RLHF:多模态大语言模型对齐的下一步进展) [03:58] 🖼 Precise Parameter Localization for Textual Generation in Diffusion Models(扩散模型中文本生成精确参数定位) [04:40] 🧠 Diverse Inference and Verification for Advanced Reasoning(高级推理的多重推断与验证) [05:22] 🧬 DarwinLM: Evolutionary Structured Pruning of Large Language Models(达尔文LM:大型语言模型的进化结构剪枝) [06:02] 📈 AdaPTS: Adapting Univariate Foundation Models to Probabilistic Multivariate Time Series Forecasting(AdaPTS:将单变量基础模型适配到概率性多变量时间序列预测) [06:40] 🖼 ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation(ImageRAG:动态图像检索用于引导图像生成) [07:23] 🤖 We Can't Understand AI Using our Existing Vocabulary(我们无法用现有词汇理解人工智能) [08:03] 📊 FoNE: Precise Single-Token Number Embeddings via Fourier Features(FoNE:通过傅里叶特征实现精确的单标记数字嵌入) [08:53] 🌍 Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages(小模型,大影响:面向低资源语言的多语言小模型的有效语料库与基于图的适应) [09:41] 🔓 Jailbreaking to Jailbreak(越狱以越狱) [10:23] 🤖 STMA: A Spatio-Temporal Memory Agent for Long-Horizon Embodied Task Planning(STMA:一种用于长时程具身任务规划的时空记忆代理) [11:05] 📊 Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding(文本引导的稀疏体素剪枝用于高效的三维视觉定位) [11:41] ⚡ MRS: A Fast Sampler for Mean Reverting Diffusion based on ODE and SDE Solvers(基于ODE和SDE求解器的均值回归扩散快速采样器) [12:26] 🚗 V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models(V2V-LLM:基于多模态大语言模型的车辆间协同自动驾驶) [13:06] 🎵 CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages(CLaMP 3:跨模态与跨语言的通用音乐信息检索) [13:49] 🧩 Cluster and Predict Latents Patches for Improved Masked Image Modeling(基于聚类与预测潜在补丁的改进掩码图像建模) [14:31] 🧬 Agentic End-to-End De Novo Protein Design for Tailored Dynamics Using a Language Diffusion Model(基于语言扩散模型的端到端从头蛋白质设计以实现定制动力学) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    16 min
  5. 6 DAYS AGO

    2025.02.14 | GPU扩展至300万tokens,文本编码器内存高效策略。

    本期的 18 篇论文如下: [00:21] 🚀 InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU(InfiniteHiP:在单个GPU上扩展语言模型上下文至300万 tokens) [01:07] 🖼 Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation(Skrr:跳过并重用文本编码器层以实现内存高效文本到图像生成) [01:49] 🧠 An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging(一个开放的方案:通过模型合并在一日内将语言特定LLM适应为推理模型) [02:31] 📚 SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models(SelfCite:大语言模型中上下文归属的自监督对齐方法) [03:14] 🐕 Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights(该模型也能识别狗吗?基于权重的零样本模型搜索) [03:56] 🌐 Exploring the Potential of Encoder-free Architectures in 3D LMMs(探索无编码器架构在三维大尺度多模态模型中的潜力) [04:39] 🎭 CoSER: Coordinating LLM-Based Persona Simulation of Established Roles(协同角色模拟:基于大语言模型的角色扮演语言代理) [05:26] 🌐 TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models(TripoSG:使用大规模校正流模型生成高保真3D形状) [06:09] 🤖 EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents(EmbodiedBench:全面评估视觉驱动具身智能体多模态大语言模型) [07:00] 🌪 Typhoon T1: An Open Thai Reasoning Model(台风T1:一个开放的泰语推理模型) [07:54] 🤖 Logical Reasoning in Large Language Models: A Survey(大型语言模型中的逻辑推理:综述) [08:36] 🧠 MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency(MME-CoT:评估大型多模态模型中链式思维推理质量、鲁棒性和效率) [09:23] 🧠 CoT-Valve: Length-Compressible Chain-of-Thought Tuning(长度可压缩的链式思维调优) [10:11] 🤖 SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models(SQuARE:增强大型语言模型链式思考的顺序问答推理引擎) [10:52] 🌐 mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data(mmE5:通过高质量合成数据改进多模态多语言嵌入) [11:36] 🦜 The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding(随机鹦鹉在大语言模型肩上:物理概念理解的总结性评估) [12:18] 🤖 DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References(DexTrack:面向人类参考的灵巧操作通用神经跟踪控制) [13:00] 🔍 3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly(3CAD:一个大规模真实3C产品数据集用于无监督异常检测) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    14 min
  6. FEB 13

    2025.02.13 | 多语言评估工具填补空白,密集文本图像数据集挑战生成模型。

    本期的 20 篇论文如下: [00:23] 🌍 BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models(BenchMAX:大型语言模型的综合多语言评估套件) [01:08] 📄 TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation(TextAtlas5M:用于密集文本图像生成的大规模数据集) [01:48] 🎥 Light-A-Video: Training-free Video Relighting via Progressive Light Fusion(光影视频:基于渐进光融合的无训练视频重照明) [02:36] 🎥 CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation(CineMaster:一个三维感知与可控的电影级文本到视频生成框架) [03:16] 🖥 WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation(世界GUI:桌面GUI自动化的综合动态测试) [04:06] ⚡ LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid(LASP-2:重新思考线性注意力及其混合模型的序列并行性) [04:45] 🧠 TransMLA: Multi-head Latent Attention Is All You Need(TransMLA:多头潜在注意力机制的全部需求) [05:31] 💼 Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance(Fino1:关于推理增强型大型语言模型在金融领域的可迁移性研究) [06:23] 📏 Distillation Scaling Laws(蒸馏缩放定律) [07:02] 🚀 Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning(忽略KL惩罚!通过增强关键标记的探索来提升强化学习微调效果) [07:52] 🌍 SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation(SARChat-Bench-2M:用于SAR图像解释的多任务视觉语言基准) [08:25] 🧠 LLM Pretraining with Continuous Concepts(基于连续概念的LLM预训练) [09:09] 🎭 Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance(动画任何人2:利用环境可操作性生成高保真角色图像动画) [09:52] 🔍 NoLiMa: Long-Context Evaluation Beyond Literal Matching(NoLiMa:超越字面匹配的长上下文评估) [10:39] 🧠 Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing(中介:基于参数冲突少和不确定性路由的高效LLM合并) [11:15] 📚 Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey(面向可信赖的大语言模型检索增强生成:综述) [11:58] 🎥 Next Block Prediction: Video Generation via Semi-Autoregressive Modeling(下一区块预测:通过半自回归建模生成视频) [12:43] 🔄 DPO-Shift: Shifting the Distribution of Direct Preference Optimization(DPO-Shift:直接偏好优化分布的可控转移) [13:28] 🧠 LLM Modules: Knowledge Transfer from a Large to a Small Model using Enhanced Cross-Attention(LLM模块:使用增强交叉注意力机制从大模型向小模型进行知识迁移) [14:15] 🛡 MetaSC: Test-Time Safety Specification Optimization for Language Models(MetaSC:语言模型推理时的安全规范优化) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    15 min
  7. FEB 12

    2025.02.12 | 强化学习提升编程竞赛,代码输入输出优化推理模型。

    本期的 21 篇论文如下: [00:25] 🧠 Competitive Programming with Large Reasoning Models(使用大型推理模型进行编程竞赛) [01:03] 🧠 CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction(代码输入输出:通过代码输入输出预测凝练推理模式) [01:47] 🎥 Magic 1-For-1: Generating One Minute Video Clips within One Minute(魔幻1对1:在一分钟内生成一分钟视频片段) [02:27] 🧠 Teaching Language Models to Critique via Reinforcement Learning(通过强化学习教授语言模型进行批判) [03:09] 💼 Expect the Unexpected: FailSafe Long Context QA for Finance(预料之外:金融领域长上下文问答的FailSafe) [03:49] 🌍 Scaling Pre-training to One Hundred Billion Data for Vision Language Models(视觉语言模型预训练扩展至千亿级数据) [04:24] 🧠 LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!(大模型能够轻松从示范结构中学习推理,内容不是关键!) [05:07] 📈 Enhancing Financial Time-Series Forecasting with Retrieval-Augmented Large Language Models(通过检索增强的大型语言模型提升金融时间序列预测) [05:50] 📄 Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents(Éclair -- 提取文档内容的集成阅读顺序) [06:34] 🛠 Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training(赫菲斯托斯:通过持续预训练提升大型语言模型的基础代理能力) [07:15] 🛠 CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for Text-Based CAD Editing(CAD编辑器:基于文本指令的CAD编辑框架及自动训练数据合成) [08:10] 🎥 Enhance-A-Video: Better Generated Video for Free(增强视频:免费生成更高质量的视频) [08:49] 🌍 NatureLM: Deciphering the Language of Nature for Scientific Discovery(NatureLM:解密科学发现的自然语言) [09:34] 🦎 Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon(忘掉你对LLM评估的认知 - LLM就像变色龙) [10:22] 🎥 VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation(VidCRAFT3:图像到视频生成的相机、物体与光照控制) [11:01] 📹 CoS: Chain-of-Shot Prompting for Long Video Understanding(CoS:长视频理解的链式镜头提示) [11:42] 🧩 Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More(掩码增强的自回归预测:少关注以学更多) [12:28] 🎤 FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks(FocalCodec:通过焦点调制网络实现低比特率语音编码) [13:09] 🕵 Auditing Prompt Caching in Language Model APIs(语言模型API中的提示缓存审计) [13:49] 💎 Gemstones: A Model Suite for Multi-Faceted Scaling Laws(宝石:多面性缩放定律的模型套件) [14:32] 🧠 Skill Expansion and Composition in Parameter Space(参数空间中的技能扩展与组合) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    16 min

Ratings & Reviews

5
out of 5
2 Ratings

About

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

You Might Also Like

To listen to explicit episodes, sign in.

Stay up to date with this show

Sign in or sign up to follow shows, save episodes, and get the latest updates.

Select a country or region

Africa, Middle East, and India

Asia Pacific

Europe

Latin America and the Caribbean

The United States and Canada