HuggingFace 每日AI论文速递

duan
HuggingFace 每日AI论文速递

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

  1. 2 DAYS AGO

    2024.12.13 每日AI论文 | 多模态系统提升长期交互,phi-4优化STEM问答表现。

    本期的 23 篇论文如下: [00:23] 🎥 InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions(InternLM-XComposer2.5-OmniLive:一个用于长期流式视频和音频交互的综合多模态系统) [01:03] 🧠 Phi-4 Technical Report(Phi-4 技术报告) [01:43] 🧠 Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions(欧几里得:通过合成高保真视觉描述提升多模态大语言模型) [02:27] 🌐 Multimodal Latent Language Modeling with Next-Token Diffusion(多模态潜在语言建模与下一词扩散) [03:10] 🌐 EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM(EasyRef:基于多模态大语言模型的扩散模型通用化图像参考) [03:57] 🌐 AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials(AgentTrek:通过网络教程引导回放的代理轨迹合成) [04:43] 🌟 Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion(神经光装置:利用多光源扩散解锁精确物体法线和材质估计) [05:24] 📱 SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training(SnapGen:通过高效架构和训练驯服高分辨率文本到图像模型以适应移动设备) [06:02] 🔬 PIG: Physics-Informed Gaussians as Adaptive Parametric Mesh Representations(PIG:物理信息高斯函数作为自适应参数化网格表示) [06:49] 📊 Learned Compression for Compressed Learning(压缩学习中的学习压缩) [07:32] 🎙 Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition(Lyra:一个高效且以语音为中心的全认知框架) [08:20] 📊 RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios(RuleArena:在现实场景中评估LLMs规则引导推理能力的基准) [09:08] 👀 Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders(Gaze-LLE:通过大规模学习编码器进行注视目标估计) [10:02] 🧠 JuStRank: Benchmarking LLM Judges for System Ranking(JuStRank:基准测试用于系统排名的LLM评判器) [10:43] 🧠 OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation(OLA-VLM:通过辅助嵌入蒸馏提升多模态大语言模型的视觉感知能力) [11:34] 📚 The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective(版权材料对大型语言模型的影响:挪威视角) [12:16] 🔗 Word Sense Linking: Disambiguating Outside the Sandbox(词义链接:超越沙盒的消歧) [12:58] 🌐 FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction(FreeSplatter:无姿态高斯喷射用于稀疏视图三维重建) [13:42] 🎥 DisPose: Disentangling Pose Guidance for Controllable Human Image Animation(DisPose:解耦姿态引导的可控人体图像动画) [14:26] 🖼 LoRACLR: Contrastive Adaptation for Customization of Diffusion Models(LoRACLR:对比适应用于扩散模型的定制化) [15:21] 🧭 SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts(SAME:学习基于状态自适应混合专家的通用语言引导视觉导航) [16:05] 🌟 Arbitrary-steps Image Super-resolution via Diffusion Inversion(基于扩散反演的任意步图像超分辨率) [16:46] 📚 Shiksha: A Technical Domain focused Translation Dataset and Model for Indian Languages(Shiksha:面向印度语言的技术领域翻译数据集与模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    18 min
  2. 3 DAYS AGO

    2024.12.12 每日AI论文 | 多视角视频生成突破,复杂场景模型提升

    本期的 14 篇论文如下: [00:23] 🎥 SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints(SynCamMaster:同步多视角视频生成) [01:07] 🌐 LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations(LAION-SG:用于训练复杂图像-文本模型的增强型大规模数据集与结构化注释) [01:51] 🌐 POINTS1.5: Building a Vision-Language Model towards Real World Applications(POINTS1.5:构建面向实际应用的视觉语言模型) [02:28] 🎨 Learning Flow Fields in Attention for Controllable Person Image Generation(在注意力中学习流场用于可控人物图像生成) [03:11] 🎥 StyleMaster: Stylize Your Video with Artistic Generation and Translation(风格大师:艺术生成与转换的视频风格化) [04:00] 🔍 Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction(生成密集化:学习在高保真泛化三维重建中密集化高斯分布) [04:46] 🎥 StreamChat: Chatting with Streaming Video(流媒体聊天:与流媒体视频互动) [05:28] 🧠 3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark(3DSRBench:一个综合的3D空间推理基准) [06:12] 🏃 Mogo: RQ Hierarchical Causal Transformer for High-Quality 3D Human Motion Generation(Mogo:用于高质量3D人体运动生成的RQ分层因果Transformer) [07:01] 🧠 KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models(KaSA:知识感知奇异值适应大型语言模型) [07:40] 🖼 FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models(FlowEdit:基于预训练流模型的无逆向文本编辑) [08:17] 🎨 StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements(StyleStudio:基于文本的风格迁移与风格元素选择性控制) [09:03] 🌍 MIT-10M: A Large Scale Parallel Corpus of Multilingual Image Translation(MIT-10M:大规模多语言图像翻译并行语料库) [09:50] 🚀 Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel(自引导数据飞轮的语言引导导航学习) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    11 min
  3. 4 DAYS AGO

    2024.12.11 每日AI论文 | 代码模型评估改进,视频生成技术突破

    本期的 23 篇论文如下: [00:25] 🧑 Evaluating and Aligning CodeLLMs on Human Preference(评估与对齐代码大语言模型的人类偏好) [01:19] 🎥 STIV: Scalable Text and Image Conditioned Video Generation(STIV:可扩展的文本与图像条件视频生成) [01:59] 🎨 DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation(DiffSensei:连接多模态大语言模型与扩散模型以实现定制化漫画生成) [02:39] 🔒 Hidden in the Noise: Two-Stage Robust Watermarking for Images(隐藏在噪声中:图像的两阶段鲁棒水印技术) [03:19] 🎥 UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics(UniReal:通过学习真实世界动态实现通用图像生成与编辑) [04:04] 📄 OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations(全向文档基准:多样PDF文档解析的综合评估) [04:50] 🎨 FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models(FiVA:用于文本到图像扩散模型的细粒度视觉属性数据集) [05:32] 🎥 3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation(3D轨迹大师:掌握视频生成中的多实体三维运动) [06:09] 🧠 Frame Representation Hypothesis: Multi-Token LLM Interpretability and Concept-Guided Text Generation(框架表示假设:多标记语言模型的可解释性与概念引导文本生成) [06:55] 🧠 Perception Tokens Enhance Visual Reasoning in Multimodal Language Models(感知令牌增强多模态语言模型的视觉推理能力) [07:41] 🎥 Video Motion Transfer with Diffusion Transformers(基于扩散变换器的视频运动迁移) [08:23] 🚀 EMOv2: Pushing 5M Vision Model Frontier(EMOv2:推动5M规模视觉模型前沿) [09:02] 🛡 Granite Guardian(花岗岩守护者) [09:44] 🌟 ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance(ILLUME:让您的LLMs看见、绘制并自我增强) [10:30] 🎥 ObjCtrl-2.5D: Training-free Object Control with Camera Poses(ObjCtrl-2.5D:无需训练的对象控制与相机姿态) [11:21] 🚀 LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation(LoRA.rar:通过超网络学习合并LoRA以实现主题-风格条件图像生成) [12:12] 📱 MoViE: Mobile Diffusion for Video Editing(MoViE:移动设备上的扩散模型视频编辑) [12:46] 🧬 Chimera: Improving Generalist Model with Domain-Specific Experts(奇美拉:通过特定领域专家提升通用模型) [13:28] 🌐 Fully Open Source Moxin-7B Technical Report(全开源Moxin-7B技术报告) [14:09] 📱 Mobile Video Diffusion(移动视频扩散) [14:45] 🤖 Contextualized Counterspeech: Strategies for Adaptation, Personalization, and Evaluation(情境化反驳言论:适应、个性化与评估策略) [15:24] 🤖 Maximizing Alignment with Minimal Feedback: Efficiently Learning Rewards for Visuomotor Robot Policy Alignment(最大化对齐与最小化反馈:高效学习视觉运动机器人策略对齐的奖励) [16:15] 🔒 A New Federated Learning Framework Against Gradient Inversion Attacks(一种对抗梯度反演攻击的新型联邦学习框架) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    17 min
  4. 5 DAYS AGO

    2024.12.10 每日AI论文 | 识别数学推理错误,评估强化学习记忆。

    本期的 9 篇论文如下: [00:23] 🧮 ProcessBench: Identifying Process Errors in Mathematical Reasoning(ProcessBench:识别数学推理中的过程错误) [01:13] 🧠 Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation(揭开强化学习代理中记忆复杂性的分类与评估方法) [01:58] 🧠 Training Large Language Models to Reason in a Continuous Latent Space(在连续潜在空间中训练大型语言模型进行推理) [02:38] 🌐 Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models(探索多粒度概念注释在多模态大语言模型中的应用) [03:22] 🎥 Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation(Divot:基于扩散模型的视频理解与生成) [04:09] 🎥 You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale(所见即所得:在无姿态视频上大规模学习3D创作) [04:53] 🌍 Global and Dense Embeddings of Earth: Major TOM Floating in the Latent Space(地球的全局与密集嵌入:潜在空间中的Major TOM浮动) [05:31] 🌐 Robust Multi-bit Text Watermark with LLM-based Paraphrasers(基于LLM的鲁棒多比特文本水印) [06:15] 🤖 CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction(CARP:通过粗到细自回归预测进行视觉运动策略学习) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    7 min
  5. 6 DAYS AGO

    2024.12.09 每日AI论文 | 提升多模态模型性能,优化文本到视频生成质量。

    本期的 11 篇论文如下: [00:27] 🌐 Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling(扩展开源多模态模型性能边界:模型、数据与测试时扩展) [00:58] 🎥 LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment(利用人类反馈进行文本到视频模型对齐) [01:41] 🧠 MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale(MAmmoTH-VL:大规模指令调优激发多模态推理) [02:24] 🤖 EXAONE 3.5: Series of Large Language Models for Real-world Use Cases(EXAONE 3.5:面向实际应用的大型语言模型系列) [03:26] 🤖 Moto: Latent Motion Token as the Bridging Language for Robot Manipulation(Moto:作为机器人操作桥梁语言的潜在运动标记) [04:10] 🚀 APOLLO: SGD-like Memory, AdamW-level Performance(APOLLO:类似SGD的内存,AdamW级别的性能) [04:49] ⚡ SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion(SwiftEdit:通过一步扩散实现闪电般快速的文本引导图像编辑) [05:26] 🎥 GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration(GenMAC:基于多智能体协作的组合式文本到视频生成) [06:07] ⏱ Mind the Time: Temporally-Controlled Multi-Event Video Generation(注意时间:时间控制的多事件视频生成) [06:42] 🏠 2DGS-Room: Seed-Guided 2D Gaussian Splatting with Geometric Constrains for High-Fidelity Indoor Scene Reconstruction(2DGS-Room:基于种子引导的2D高斯喷射与几何约束的高保真室内场景重建) [07:20] 🗣 DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling(DEMO:通过细粒度元素建模重构对话交互) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    8 min
  6. DEC 6

    2024.12.06 每日AI论文 | 视觉压缩提升效率,代码监控增强机器人可靠性。

    本期的 23 篇论文如下: [00:23] 🔍 VisionZip: Longer is Better but Not Necessary in Vision Language Models(视觉压缩:视觉语言模型中长度并非必要优势) [01:03] 🤖 Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection(代码即监控:约束感知的视觉编程用于反应性和前瞻性机器人故障检测) [01:43] 🖥 Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction(Aguvis:统一纯视觉自主GUI交互代理) [02:27] 🔊 A Noise is Worth Diffusion Guidance(噪声值得扩散引导) [03:04] 📊 Evaluating Language Models as Synthetic Data Generators(评估语言模型作为合成数据生成器) [03:48] 🌐 Structured 3D Latents for Scalable and Versatile 3D Generation(结构化3D潜在表示在可扩展和多功能3D生成中的应用) [04:26] 🌐 MV-Adapter: Multi-view Consistent Image Generation Made Easy(MV-Adapter:多视角一致图像生成变得简单) [05:05] 🖼 Negative Token Merging: Image-based Adversarial Feature Guidance(负向标记合并:基于图像的对抗特征引导) [05:41] 🌐 Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion(佛罗伦萨-VL:通过生成视觉编码器和深度-广度融合增强视觉语言模型) [06:18] 📈 Densing Law of LLMs(大语言模型的密度定律) [06:59] 🌌 Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis(无限:高分辨率图像合成中的比特位自回归建模) [07:37] ⚽ Towards Universal Soccer Video Understanding(面向通用足球视频理解) [08:15] 🎨 HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing(HumanEdit:一个高质量的人类奖励数据集,用于基于指令的图像编辑) [08:53] 👗 AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models(任意服装虚拟试穿:基于潜在扩散模型的可定制多服装生成) [09:35] 🌍 Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation(全球MMLU:理解和解决多语言评估中的文化和语言偏见) [10:11] 🌐 Personalized Multimodal Large Language Models: A Survey(个性化多模态大语言模型:综述) [10:55] ⚡ ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality(ZipAR:通过空间局部性加速自回归图像生成) [11:36] 🧠 MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities(MRGen:基于扩散的可控数据引擎用于无标注模态的MRI分割) [12:14] 🧠 Discriminative Fine-tuning of LVLMs(判别性微调的大视觉语言模型) [12:48] 🧠 Monet: Mixture of Monosemantic Experts for Transformers(Monet:Transformer的单语义专家混合模型) [13:24] 🌊 OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows(全流:多模态校正流的任意到任意生成) [13:59] 🧠 KV Shifting Attention Enhances Language Modeling(KV移位注意力增强语言建模) [14:40] 🌍 Marco-LLM: Bridging Languages via Massive Multilingual Training for Cross-Lingual Enhancement(Marco-LLM:通过大规模多语言训练实现跨语言增强) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

    16 min

About

每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】

You Might Also Like

To listen to explicit episodes, sign in.

Stay up to date with this show

Sign in or sign up to follow shows, save episodes, and get the latest updates.

Select a country or region

Africa, Middle East, and India

Asia Pacific

Europe

Latin America and the Caribbean

The United States and Canada