2024.12.09 每日AI论文 | 提升多模态模型性能,优化文本到视频生成质量。

HuggingFace 每日AI论文速递

本期的 11 篇论文如下:

[00:27] 🌐 Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling(扩展开源多模态模型性能边界:模型、数据与测试时扩展)

[00:58] 🎥 LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment(利用人类反馈进行文本到视频模型对齐)

[01:41] 🧠 MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale(MAmmoTH-VL:大规模指令调优激发多模态推理)

[02:24] 🤖 EXAONE 3.5: Series of Large Language Models for Real-world Use Cases(EXAONE 3.5:面向实际应用的大型语言模型系列)

[03:26] 🤖 Moto: Latent Motion Token as the Bridging Language for Robot Manipulation(Moto:作为机器人操作桥梁语言的潜在运动标记)

[04:10] 🚀 APOLLO: SGD-like Memory, AdamW-level Performance(APOLLO:类似SGD的内存,AdamW级别的性能)

[04:49] ⚡ SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion(SwiftEdit:通过一步扩散实现闪电般快速的文本引导图像编辑)

[05:26] 🎥 GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration(GenMAC:基于多智能体协作的组合式文本到视频生成)

[06:07] ⏱ Mind the Time: Temporally-Controlled Multi-Event Video Generation(注意时间:时间控制的多事件视频生成)

[06:42] 🏠 2DGS-Room: Seed-Guided 2D Gaussian Splatting with Geometric Constrains for High-Fidelity Indoor Scene Reconstruction(2DGS-Room:基于种子引导的2D高斯喷射与几何约束的高保真室内场景重建)

[07:20] 🗣 DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling(DEMO:通过细粒度元素建模重构对话交互)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

To listen to explicit episodes, sign in.

Stay up to date with this show

Sign in or sign up to follow shows, save episodes, and get the latest updates.

Select a country or region

Africa, Middle East, and India

Asia Pacific

Europe

Latin America and the Caribbean

The United States and Canada