The Information Bottleneck

Ravid Shwartz-Ziv & Allen Roush

Two AI Researchers - Ravid Shwartz Ziv, and Allen Roush, discuss the latest trends, news, and research within Generative AI, LLMs, GPUs, and Cloud Systems.

  1. Reasoning Models and Planning - with Rao Kambhampati (Arizona State)

    -4 ДН.

    Reasoning Models and Planning - with Rao Kambhampati (Arizona State)

    We sat down with Rao Kambhampati, a Professor of CS at Arizona State University and former President of AAAI, to talk about reasoning models: what they are, when they work, and when they break. Rao has been working on planning and decision-making since long before deep learning, which makes him one of the most grounded voices on what today's reasoning systems actually do. We start with definitions of what reasoning is, why planning is the hard subset of it, and what changed when systems like o1 and DeepSeek R1 moved the verifier from inference into post-training. From there we get into where these models generalize, where they don't, and why benchmarks can be misleading about both. A big chunk of the conversation is on chain-of-thought: what intermediate tokens are actually doing, why they help the model more than they help the reader, and what outcome-based RL does to whatever semantic content was there to begin with. We also cover world models and why Rao thinks the video-only framing is the wrong bet, the difference between agentic safety and existential risk, and what the planning community figured out decades ago that the LLM community keeps rediscovering. Timeline(00:12) Intros(01:32) Defining "reasoning" and the System 1 / System 2 framing(04:12) Blocksworld vs Sokoban, and non-ergodicity(06:42) Pre-o1: PlanBench and "LLMs are zero-shot X" papers(07:42) LLM-Modulo and moving the verifier into post-training(10:12) Is RL post-training reasoning, or case-based retrieval?(13:12) τ-Bench and benchmarks that avoid action interactions(14:12) OOD generalization and what we don't know about post-training data(19:02) Does it matter how they work if they answer the questions we care about?(21:27) Architecture lotteries and why no one tries different designs(23:42) Intermediate tokens and the "reduce thinking effort" cottage industry(26:12) The 30×30 maze experiment(27:42) Sokoban, NetHack, and Mystery Blocksworld(34:58) Stop Anthropomorphizing Intermediate Tokens — the swapped-trace experiment(46:12) Latent reasoning, Coconut, and why R0 beat R1(50:12) How outcome-based RL erodes CoT semantics(52:12) Dot-dot-dot and Anthropic's CoT monitoring paper(53:42) Safety: Hinton, Bengio, LeCun(57:12) Existential risk vs real safety work(59:42) World models, transition models, and video-only approaches(1:03:12) Why linguistic abstractions matter — pick and roll(1:05:42) What the planning community knew in 2005(1:08:12) Multi-agent LLMs(1:09:57) Closing thoughts: the bridge analogy Music: "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmedAbout: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

    1 ч. 12 мин.
  2. What Actually Matters in AI? - with Zhuang Liu (Princeton)

    24 АПР.

    What Actually Matters in AI? - with Zhuang Liu (Princeton)

    In this episode, we hosted Zhuang Liu, Assistant Professor at Princeton and former researcher at Meta, for a conversation about what actually matters in modern AI and what turns out to be a historical accident. Zhuang is behind some of the most important papers in recent years (with more than 100k citations): ConvNeXt (showing ConvNets can match Transformers if you get the details right), Transformers Without Normalization (replacing LayerNorm with dynamic tanh), ImageBind, Eyes Wide Shut on CLIP's blind spots, the dataset bias work showing that even our biggest "diverse" datasets are still distinguishable from each other, and more. We got into whether architecture research is even worth doing anymore, what "good data" actually means, why vision is the natural bridge across modalities but language drove the adoption wave, whether we need per-lab RL environments or better continual learning, whether LLMs have world models (and for which tasks you'd need one), why LLM outputs carry fingerprints that survive paraphrasing, and where coding agents like Claude Code fit into research workflows today and where they still fall short. Timeline 00:13 — Intro 01:15 — ConvNeXt and whether architecture still matters 06:35 — What actually drove the jump from GPT-1 to  GPT-3 08:24 — Setting the bar for architecture papers today 11:14 — Dataset bias: why "diverse" datasets still aren't 22:52 — What good data actually looks like 26:49 — ImageBind and vision as the bridge across modalities 29:09 — Why language drove the adoption wave, not vision 32:24 — Eyes Wide Shut: CLIP's blind spots 34:57 — RL environments, continual learning, and memory as the real bottleneck 43:06 — Are inductive biases just historical accidents? 44:30 — Do LLMs have world models? 48:15 — Which tasks actually need a vision world model 50:14 — Idiosyncrasy in LLMs: pre-training vs post-training fingerprints 53:39 — The future of pre-training, mid-training, and post-training 57:57 — Claude Code, Codex, and coding agents in research 59:11 — Do we still need students in the age of autonomous research? 1:04:19 — Transformers Without Normalization and the four pillars that survived 1:06:53 — MetaMorph: Does generation help understanding, or the other way around? 1:09:17 — Wrap Music: "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmedAbout: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

    1 ч. 10 мин.
  3. The Future of Coding Agents with Sasha Rush (Cursor/Cornell)

    15 АПР.

    The Future of Coding Agents with Sasha Rush (Cursor/Cornell)

    We talked with Sasha Rush, researcher at Cursor and professor at Cornell, about what it actually feels like to we in the heart of the AI revolution and build coding agents right now. Sasha shared how these systems are changing day-to-day work and how it feels to develop these systems. A big part of the conversation was about why coding has become such a powerful setting for these tools. We discussed what makes code different from other domains, why agents seem to work especially well there, and how much of today’s progress comes not just from better models, but from better ways of using them. Sasha also gave an inside look at how Cursor thinks about training coding models, long-running agents, context limits, bug finding, and the balance between autonomy and human oversight. We also talked about the broader shift happening in software engineering. Are developers moving to a higher level of abstraction? Is this just a phase where we “babysit” models, or the beginning of a deeper change in how software gets built? Sasha had a very thoughtful perspective here, including what he’s seeing from students, researchers, and engineers who are growing up native to these tools. More broadly, this episode is about what it means to do serious technical work in a moment when the tools are changing incredibly fast. Sasha brought both optimism and skepticism to the discussion, and that made this a really grounded conversation about where coding agents are today, what they are already surprisingly good at, and where all of this might be going next. Timeline 00:00 Intro and Sasha joins us 01:11 What “coding agents” actually mean 02:34 Why coding became the breakout use case 08:56 Long-running agents and autonomous workflows 15:08 How these tools are changing the work of engineers 17:15 Are people just babysitting models right now? 22:11 How Cursor builds its coding models 26:29 Rewards, training, and what makes agents work 34:53 Memory, continual learning, and agent communication 38:00 How context compaction works in practice 41:29 Why coding agents recently got much better 50:31 Refactoring, maintenance, and self-improving codebases 52:16 Bug finding, oversight, and verification 54:43 Will this pace of progress continue? 56:42 Can this spread beyond coding? 58:27 The future of Cursor and coding agents 1:03:08 Model architectures beyond standard transformers 1:05:37 World models, diffusion, and what may come next Music: "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmedAbout: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

    1 ч. 25 мин.
  4. The Hidden Engine of Vision with Peyman Milanfar (Google)

    10 АПР.

    The Hidden Engine of Vision with Peyman Milanfar (Google)

    How Denoising Secretly Powers Everything in AI Peyman Milanfar is a Distinguished Scientist at Google, leading its Computational Imaging team. He's a member of the National Academy of Engineering, an IEEE Fellow, and one of the key people behind the Pixel camera pipeline. Before Google, he was a professor at UC Santa Cruz for 15 years and helped build the imaging pipeline for Google Glass at Google X. Over 35,000 citations. Peyman makes a provocative case that denoising, long dismissed as a boring cleanup task, is actually one of the most fundamental operations in modern ML, on par with SGD and backprop. Knowing how to remove noise from a signal basically means you have a map of the manifold that signals live on, and that insight connects everything from classical inverse problems to diffusion models. We go from early patch-based denoisers to his 2010 "Is Denoising Dead?" paper, and then to the question that redirected his research: if denoising is nearly solved, what else can denoisers do? That led to Regularization by Denoising (RED), which, if you unroll it, looks a lot like a diffusion process, years before diffusion models existed. We also cover how his team shipped a one-step diffusion model on the Pixel phone for 100x ProRes Zoom, the perception-distortion-authenticity tradeoff in generative imaging, and a new paper on why diffusion models don't actually need noise conditioning. The conversation wraps with a debate on why language has dominated the AI spotlight while vision lags, and Peyman's argument that visual intelligence, grounded in physics and robotics, is coming next. Timeline 0:00 Intro and Peyman's background 1:22 Why denoising matters more than you think Sensor diversity and Tesla's vision-only bet 15:04 BM3D and why it was secretly an MMSE estimator 17:02 "Is Denoising Dead?" then what else can denoisers do? 18:07 Plug-and-play methods and Regularization by Denoising (RED) 26:18 Denoising, manifolds, and the compression connection 28:12 Energy-based models vs. diffusion: "The Geometry of Noise" 31:40 Natural gradient descent and why flow models work 34:48 Gradient-free optimization and high-dimensional noise 45:13 Image quality and the perception-distortion tradeoff 48:39 Information theory, rate-distortion, and generative models 52:57 Denoising vs. editing 54:25 The changing role of theory 57:07 Hobbyist tools vs. shipping consumer products 59:40 Coding agents, vibe coding, and domain expertise 1:05:00 Vision and more complex-dimensional signals 1:09:31 Do models need to interact with the physical world? 1:11:28 Continual learning and novelty-driven updates 1:13:00 On-device learning and privacy 1:15:01 Why has language dominated AI? Is vision next? 1:17:14 How kids learn: vision first, language later 1:19:36 Academia vs. industry 1:22:28 10,000 citations vs. shipping to millions, why choose? Music: "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmedAbout: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

    1 ч. 24 мин.
  5. Reinventing AI From Scratch with Yaroslav Bulatov

    30 МАР.

    Reinventing AI From Scratch with Yaroslav Bulatov

    Yaroslav Bulatov helped build the AI era from the inside, as one of the earliest researchers at both OpenAI and Google Brain. Now he wants to tear it all down and start over. Modern deep learning, he argues, is up to 100x more wasteful than it needs to be  -  a Frankenstein of hacks designed for the wrong hardware. With a power wall approaching in two years, Yaroslav is leading an open effort to reinvent AI from scratch: no backprop, no legacy assumptions, just the benefit of hindsight and AI agents that compress decades of research into months. Along the way, we dig into why AGI is a "religious question," how a sales guy with no ML background became one of his most productive contributors, and why the Muon optimizer, one of the biggest recent breakthroughs, could only have been discovered by a non-expert. Timeline 00:12 — Introduction and Yaroslav's background at OpenAI and Google Brain 01:16 — Why deep learning isn't such a good idea 02:03 — The three definitions of AGI: religious, financial, and vibes-based 07:52 — The SAI framework: do we need the term AGI at all? 10:58 — What matters more than AGI: efficiency and refactoring the AI stack 13:28 — Jevons paradox and the coming energy wall 14:49 — The recipe: replaying 70 years of AI with hindsight 17:23 — Memory, energy, and gradient checkpointing 18:34 — Why you can't just optimize the current stack (the recurrent laryngeal nerve analogy) 21:05 — What a redesigned AI might look like: hierarchical message passing 22:31 — Can a small team replicate decades of research? 24:23 — Why non-experts outperform domain specialists 27:42 — The GPT-2 benchmark: what success looks like 29:01 — Ian Goodfellow, Theano, and the origins of TensorFlow 30:12 — The Muon optimizer origin story and beating Google on ImageNet 36:16 — AI coding agents for software engineering and research 40:12 — 10-year outlook and the voice-first workflow 42:23 — Why start with text over multimodality 45:13 — Are AI labs like SSI on the right track? 48:52 — Getting rid of backprop — and maybe math itself 53:57 — The state of ML academia and NeurIPS culture 56:41 — The Sutra group challenge: inventing better learning algorithms Music: "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmedAbout: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

    58 мин.
  6. Why Healthcare Is AI's Hardest and Most Important Problem with Kyunghyun Cho (NYU)

    24 МАР.

    Why Healthcare Is AI's Hardest and Most Important Problem with Kyunghyun Cho (NYU)

    We talk with Kyunghyun Cho, who is a Professor of Health Statistics and a Professor of Computer Science and Data Science at New York University, and a former Executive Director at Genentech, about why healthcare might be the most important and most difficult domain for AI to transform. Kyunghyun shares his vision for a future where patients own their own medical records, proposes a provocative idea for running continuous society-level clinical trials by having doctors "toss a coin" between plausible diagnoses, and explains why drug discovery's stage-wise pipeline has hit a wall that only end-to-end AI thinking can break through. We also get into GLP-1 drugs and why they're more mysterious than people realize, the brutal economics of antibiotic research, how language models trained across scientific literature and clinical data could compress 50 years of drug development into five, and what Kyunghyun would do with $10 billion (spoiler: buy a hospital network in the Midwest). We wrap up with a great discussion on the rise of professor-founded "neo-labs," why academia got spoiled during the deep learning boom, and an encouraging message for PhD students who feel lost right now. Timeline: (00:00) Intro and welcome (01:25) Why healthcare is uniquely hard (04:46) Who owns your medical records? — The case for patient-controlled data and tapping your phone at the doctor's office (06:43) Centralized vs. decentralized healthcare — comparing Israel, Korea, and the US (13:19) Why most existing health data isn't as useful as we think — selection bias and the lack of randomization (16:53) The "toss a coin" proposal — continuous clinical trials through automated randomization, and the surprising connection to LLM sampling. (23:07) Drug discovery's broken pipeline — why stage-wise optimization is failing, and we need end-to-end thinking (28:30) Why the current system is already failing society — wearables, preventive care, and the case for urgency (31:13) Allen's personal healthcare journey and the GLP-1 conversation (33:13) GLP-1 deep dive — 40 years from discovery to weight loss drugs, brain receptors, and embracing uncertainty (36:28) Why antibiotic R&D is "economic suicide" and how AI can help (42:52) Language models in the clinic and the lab — from clinical notes to back-propagating clinical outcomes, all the way to molecular design (48:04) Do you need domain expertise, or can you throw compute at it? (54:30) The $10 billion question — distributed GPU clouds and a patient-in-the-loop drug discovery system (58:28) Vertical scaling vs. horizontal scaling for healthcare AI (1:01:06) AI regulation — who's missing from the conversation and why regulation should follow deployment (1:06:52) Professors as founders and the "neo-lab" phenomenon — how Ilya cracked the code (1:11:18) Can neo-labs actually ship products? Why researchers should do research (1:13:09) Academia got spoiled — the deep learning anomaly is ending, and that's okay (1:16:07) Closing message — why it's a great time to be a PhD student and researcher Music: "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmed About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

    1 ч. 18 мин.
  7. Diffusion LLM & Why the Future of AI Won't Be Autoregressive -  Stefano Ermon (Stanford /Inception)

    19 МАР.

    Diffusion LLM & Why the Future of AI Won't Be Autoregressive - Stefano Ermon (Stanford /Inception)

    In this episode, we talk with Stefano Ermon,  Stanford professor, co-founder & CEO of Inception AI, and co-inventor of DDIM, FlashAttention, DPO, and score-based/diffusion models, about why diffusion-based language models may overtake the autoregressive paradigm that dominates today's LLMs. We start with the fundamental topics, such as what diffusion models actually are, and why iterative refinement (starting from noise, progressively denoising) offers structural advantages over autoregressive generation. From there,  we dive into the technical core of diffusion LLMs. Stefano explains how discrete diffusion works on text, why masking is just one of many possible noise processes, and how the mathematics of score matching carries over from the continuous image setting with surprising elegance. A major theme is the inference advantage. Because diffusion models produce multiple tokens in parallel, they can be dramatically faster than autoregressive models at inference time. Stefano argues this fundamentally changes the cost-quality Pareto frontier, and becomes especially powerful in RL-based post-training. We also discuss Inception AI's Mercury II model, which Stefano describes as best-in-class for latency-constrained tasks like voice agents and code completion. In the final part, we get into broader questions  - why transformers work so well, research advice for PhD students, whether recursive self-improvement is imminent, the real state of AI coding tools, and Stefano's journey from academia to startup founder. TIMESTAMPS 0:12 – Introduction 1:08 – Origins of diffusion models: from GANs to score-based models in 2019 3:13 – Diffusion vs. autoregressive: the typewriter vs. editor analogy 4:43 – Speed, creativity, and quality trade-offs between the two approaches 7:44 – Temperature and sampling in diffusion LLMs — why it's more subtle than you think 9:56 – Can diffusion LLMs scale? Inception AI and Gemini Diffusion as proof points 11:50 – State space models and hybrid transformer architectures 13:03 – Scaling laws for diffusion: pre-training, post-training, and test-time compute 14:33 – Ecosystem and tooling: what transfers and what doesn't 16:58 – From images to text: how discrete diffusion actually works 19:59 – Theory vs. practice in deep learning 21:50 – Loss functions and scoring rules for generative models 23:12 – Mercury II and where diffusion LLMs already win 26:20 – Creativity, slop, and output diversity in parallel generation 28:43 – Hardware for diffusion models: why current GPUs favor autoregressive workloads 30:56 – Optimization algorithms and managing technical risk at a startup 32:46 – Why do transformers work so well? 33:30 – Research advice for PhD students: focus on inference 34:57 – Recursive self-improvement and AGI timelines 35:56 – Will AI replace software engineers? Real-world experience at Inception 37:54 – Professor vs. startup founder: different execution, similar mission 39:56 – The founding story of Inception AI — from ICML Best Paper to company 42:30 – The researcher-to-founder pipeline and big funding rounds 45:02 – PhD vs. industry in 2026: the widening financial gap 47:30 – The industry in 5-10 years: Stefano's outlook Music: "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmedAbout: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

    49 мин.
  8. Training Is Nothing Like Learning with Naomi Saphra (Harvard)

    13 МАР.

    Training Is Nothing Like Learning with Naomi Saphra (Harvard)

    Naomi Saphra, Kempner Research Fellow at Harvard and incoming Assistant Professor at Boston University, joins us to explain why you can't do interpretability without understanding training dynamics,  in the same way you can't do biology without evolution. Naomi argues that many structures researchers find inside trained models are vestigial, they mattered early in training but are meaningless by the end. Grokking is one case of a broader phenomenon: models go through multiple consecutive phase transitions during training, driven by symmetry breaking and head specialization, but the smooth loss curve hides all of it. We talk about why training is nothing like human learning, and why our intuitions about what's hard for models are consistently wrong  -  code in pretraining helps language reasoning, tokenization drives behaviors people attribute to deeper cognition, and language already encodes everything humans care about. We also get into why SAEs are basically topic models, the Platonic representation hypothesis, using AI to decode animal communication, and why non-determinism across training runs is a real problem that RL and MoE might be making worse. Timeline: (00:12) Introduction and guest welcome (01:01) Why training dynamics matter - the evolutionary biology analogy (03:05) Jennifer Aniston neurons and the danger of biological parallels (04:48) What is grokking and why it's one instance of a broader phenomenon (08:25) Phase transitions, symmetry breaking, and head specialization (11:53) Double descent, overfitting, and the death of classical train-test splits (15:10) Training is nothing like learning (16:08) Scaling axes - data, model size, compute, and why they're not interchangeable (19:29) Data quality, code as reasoning fuel, and GPT-2's real contribution (20:43) Multilingual models and the interlingua hypothesis (25:58) The Platonic representation hypothesis and why image classification was always multimodal (29:12) Sparse autoencoders, interpretability, and Marr's levels (37:32) Can we ever truly understand what models know? (43:59) The language modality chauvinist argument (51:55) Vision, redundancy, and self-supervised learning (57:18) World models - measurable capabilities over philosophical definitions (1:00:14) Is coding really a solved task? (1:04:18) Non-determinism, scaling laws, and why one training run isn't enough (1:10:12) Naomi's new lab at BU and recruiting Music: "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0. Changes: trimmed About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

    1 ч. 12 мин.

Оценки и отзывы

5
из 5
Оценок: 5

Об этом подкасте

Two AI Researchers - Ravid Shwartz Ziv, and Allen Roush, discuss the latest trends, news, and research within Generative AI, LLMs, GPUs, and Cloud Systems.

Вам может также понравиться