3 hrs 38 min

ICLR 2024 — Best Papers & Talks (ImageGen, Vision, Transformers, State Space Models) ft. Durk Kingma, Christian Szegedy, Ilya Sutskever Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

- Technology

Speakers for AI Engineer World’s Fair have been announced! See our Microsoft episode for more info and buy now with code LATENTSPACE — we’ve been studying the best ML research conferences so we can make the best AI industry conf!
Note that this year there are 4 main tracks per day and dozens of workshops/expo sessions; the free livestream will air much less than half of the content this time.
Apply for free/discounted Diversity Program and Scholarship tickets here. We hope to make this the definitive technical conference for ALL AI engineers.
UPDATE: This is a 2 part episode - see Part 2 here.
ICLR 2024 took place from May 6-11 in Vienna, Austria.
Just like we did for our extremely popular NeurIPS 2023 coverage, we decided to pay the $900 ticket (thanks to all of you paying supporters!) and brave the 18 hour flight and 5 day grind to go on behalf of all of you. We now present the results of that work!
This ICLR was the biggest one by far, with a marked change in the excitement trajectory for the conference:
Of the 2260 accepted papers (31% acceptance rate), of the subset of those relevant to our shortlist of AI Engineering Topics, we found many, many LLM reasoning and agent related papers, which we will cover in the next episode. We will spend this episode with 14 papers covering other relevant ICLR topics, as below.
As we did last year, we’ll start with the Best Paper Awards. Unlike last year, we now group our paper selections by subjective topic area, and mix in both Outstanding Paper talks as well as editorially selected poster sessions. Where we were able to do a poster session interview, please scroll to the relevant show notes for images of their poster for discussion. To cap things off, Chris Ré’s spot from last year now goes to Sasha Rush for the obligatory last word on the development and applications of State Space Models.
We had a blast at ICLR 2024 and you can bet that we’ll be back in 2025 🇸🇬.
Timestamps and Overview of Papers
[00:02:49] Section A: ImageGen, Compression, Adversarial Attacks
* [00:02:49] VAEs
* [00:32:36] Würstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models
* [00:37:25] The Hidden Language Of Diffusion Models
* [00:48:40] Ilya on Compression
* [01:01:45] Christian Szegedy on Compression
* [01:07:34] Intriguing properties of neural networks

[01:26:07] Section B: Vision Learning and Weak Supervision
* [01:26:45] Vision Transformers Need Registers
* [01:38:27] Think before you speak: Training Language Models With Pause Tokens
* [01:47:06] Towards a statistical theory of data selection under weak supervision
* [02:00:32] Is ImageNet worth 1 video?

[02:06:32] Section C: Extending Transformers and Attention
* [02:06:49] LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
* [02:15:12] YaRN: Efficient Context Window Extension of Large Language Models
* [02:32:02] Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
* [02:44:57] ZeRO++: Extremely Efficient Collective Communication for Giant Model Training

[02:54:26] Section D: State Space Models vs Transformers
* [03:31:15] Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
* [03:37:08] End of Part 1

A: ImageGen, Compression, Adversarial Attacks
* Durk Kingma (OpenAI/Google DeepMind) & Max Welling: Auto-Encoding Variational Bayes (Full ICLR talk)
* Preliminary resources: Understanding VAEs, CodeEmporium, Arxiv Insights
* Inaugural ICLR Test of Time Award! “Probabilistic modeling is one of the most fundamental ways in which we reason about the world. This paper spearheaded the integration of deep learning with scalable probabilistic inference (amortized mean-field variational inference via a so-called reparameterization trick), giving rise to the Variational Autoencoder (VAE).”
* Pablo Pernías (Stability) et al: Würstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models (ICLR oral, pos