The Information Bottleneck

Ravid Shwartz-Ziv & Allen Roush

Two AI Researchers - Ravid Shwartz Ziv, and Allen Roush, discuss the latest trends, news, and research within Generative AI, LLMs, GPUs, and Cloud Systems.

  1. Why AI Benchmarks Are Lying to You - with Wenhu Chen (Meta/University of Waterloo)

    1d ago

    Why AI Benchmarks Are Lying to You - with Wenhu Chen (Meta/University of Waterloo)

    In this episode, we sit down with Wenhu Chen, research scientist at Meta MSL, assistant professor at the University of Waterloo, and the person behind MMLU-Pro and MMMU. If you've read a frontier model release in the last two years, you've seen his benchmarks. That makes him one of the best people to answer the question everyone dances around: when a model jumps from 40% to 90% on your benchmark, how much of that is real? In this episode, we dig into why benchmarks have become the loss function of the entire field - design a bad one, and thousands of brilliant researchers will spend months hill-climbing in the wrong direction. Wenhu is surprisingly candid about the limits of his own creations: contamination is everywhere, saturation turns frontier benchmarks into unit tests, and popular alternatives, such as LM Arena, mostly measure tone and length rather than capability. His answer is to evaluate models where they've never been: private codebases, hospital data, and the messy, live internet. We also talk about ClawBench, his new benchmark that deploys agents to over 140 real production websites to do things people actually want done, such, such as ordering food, booking tickets, and applying for jobs. The best model in the world completes about a third of these tasks. We unpack why: bot detection, models that refuse to click "pay," agents that give up the moment an environment doesn't match their training, and harnesses that can swing results by 20% without changing the model at all. Along the way, we cover the overlooked science of evaluating pre-training, data flywheels, and synthetic environments for agent training, and whether RL teaches models to reason or just surfaces what's already there. We close with Wenhu's predictions: exploration and adaptability will improve rapidly, but security will become the field's hardest problem as agents gain real permissions in the real world. Timestamps 00:00 – Intro 00:55 – What good evaluation means, and how it's changed since the early GPT days 03:35 – Benchmarks as the field's loss function 05:50 – Contamination: the problem nobody fully solves 08:08 – MMLU-Pro scores: real progress or training on the test set? 11:05 – Can you measure creativity? 12:34 – Why human judges and arenas are unreliable — and what to use instead 19:22 – What a good benchmark actually looks like 22:34 – Chain of thought: signal or scratchpad? 26:01 – Auto-research and hill-climbing agents 28:52 – Harnesses: 20% swings without touching the model 32:28 – Safety, model release, and an "FDA for models" 36:53 – The overlooked science of pre-training evaluation 43:49 – Designing pre-training benchmarks when one run costs a billion dollars 49:45 – ClawBench: agents on 140+ live websites, and why the best model gets 33% 54:42 – How MMLU-Pro and MMMU-Pro were born from public complaints 59:16 – Pixel agents vs. APIs: will MCP kill computer use? 1:02:11 – Training agents: data flywheels and synthetic environments 1:05:43 – SFT vs. RL, and does RL teach reasoning or reveal it? 1:09:21 – What gets solved next year — and what doesn't 1:14:32 – Undervalued ideas, and what's next for ClawBench Music: "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0. About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

    1h 19m
  2. Jürgen Schmidhuber - Part 2: JEPA, the Road to AGI, and Who Really Invented Modern AI

    Jun 7

    Jürgen Schmidhuber - Part 2: JEPA, the Road to AGI, and Who Really Invented Modern AI

    In the second half of our conversation with Jürgen Schmidhuber, we focus on the key ideas he's pursued since the early 1990s and discuss why he believes these concepts are only now being rediscovered. We start with JEPA. Jürgen argues that the method LeCun named in 2022 is the same family he published in 1992 as Predictability Maximization. From there he traces the adversarial lineage back further still, to his 1990 world-model paper and 1991 Predictability Minimization  -  the curiosity-driven minimax games he sees as the real origins of GANs. We also talk about why these ideas took thirty years to land, why today's trillion-dollar data-center buildout is driven by AGI fear, and why he thinks Apple may come out ahead. The back half turns to what he sees as the real frontier: physical AI. Today's systems are superhuman behind the screen but helpless at a leaky pipe, and until a robot can use human tools, there's no AGI. He discusses self-replicating, self-improving machines as "a new kind of life," reframes continual learning and test-time training as ideas from his 1991 fast-weight work, and detours through Solomonoff's universal prior, Hutter's AIXI, and the Gödel machine. We close on the subject Jürgen is famous for: scientific credit. He makes his case for rigorous attribution, casts himself as a "speaker for the dead" championing forgotten pioneers like Ivakhnenko, and reflects candidly on whether the fights are personal. Timeline 00:30 — What JEPA is, and the 1992 Predictability Maximization story 04:54 — Implementing PMAX: autoencoders, Siamese networks, Infomax 09:10 — Predictability Minimization, factorial codes, and the roots of GANs 16:00 — Why it took 30 years: the economics of compute 20:52 — Data, the web, and 1990 as the origin point 23:09 — Hardware inflation, the trillion-dollar buildout, and the coming crash 34:05 — Physical AI: the plumber problem and self-replicating machines 41:14 — Which 90s ideas are being scaled right now 45:26 — Continual learning and test-time training as "old hats" 55:19 — Measuring intelligence: Solomonoff, AIXI, and the Gödel machine 1:05:26 — Self-replication and von Neumann 1:09:51 — Will he see AGI in his lifetime? 1:10:42 — Credit, integrity, and being a "speaker for the dead" Music: "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmed About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

    1h 29m
  3. Jürgen Schmidhuber  -  World Models, RL, and the Year that changed AI (Part 1)

    Jun 4

    Jürgen Schmidhuber - World Models, RL, and the Year that changed AI (Part 1)

    In this episode, we host Jürgen Schmidhuber - the man, the legend, one of the godfathers of modern AI. His lab worked out many ideas behind today’s systems (LSTM, world models, artificial curiosity, Transformer variants, and even GAN-style setups) decades before they became fashionable, and he’s just as well known for making sure people remember who did what first. This is the first of two conversations with him. We go back to his lab in the early 90s and ask how one small group came up with so many of the ideas that are now being scaled to a thousand billion dollars, back when compute was ten million times more expensive. A lot of the episode comes down to one distinction he keeps making: prediction vs. decision-making. His take is that LLMs are very good prediction machines that imitate the web, but that’s only half the problem. To actually act in the world, you need a controller that uses a world model to plan. He talks about his 1990 work on world models and artificial curiosity, where the controller gets rewarded for running experiments that improve its own model (an adversarial setup years before GANs), why planning millisecond by millisecond doesn’t scale, and why you need sub-goals instead. We also talk about compression as the core of understanding, from falling apples to Kepler to Einstein, and why we still don’t have a robot that can do what a plumber does, even though the AI behind the screen keeps getting better. Then the conversation moves to credit assignment: how “to Schmidhuber” became a verb, what he thinks is broken about the award system, and a long exchange on PMAX vs. JEPA. He ends on the real origins of deep learning and a prediction about self-replicating machines in space. Timeline 00:00  Intro 00:55  1991 in Munich, and why that lab mattered 02:38  "I'm not very smart"  and why compute getting 10× cheaper every 5 years changed everything 04:25  Chess as an AI proxy 08:27  Artificial curiosity in the 90s vs. today's RL exploration 09:10  Why RL is harder than supervised learning 20:48  Coding agents vs. robots, and how a baby learns its own hands 26:20  Compression as understanding 33:40  What's actually missing on the road to AGI 37:30  Why millisecond-by-millisecond planning is stupid 47:44  Convergence to LLMs, GPUs, and how far we still are from the Bremermann limit 51:49  Unsupervised learning, factorial codes, and predictability minimization 58:12  Credit assignment: the fights with LeCun and the Nobel critique 1:02:13  On his last name becoming a verb 1:05:17  The award system's missing peer review 1:07:03  Closed labs and the decline of open research 1:13:23  Audience questions 1:34:02  Closing: who really invented deep learning? Music: "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmed About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

    1h 38m
  4. AI for Science and the Thermodynamics of Generative AI - with Max Welling (UvA, CuspAI)

    May 29

    AI for Science and the Thermodynamics of Generative AI - with Max Welling (UvA, CuspAI)

    In this episode, we sit with Max Welling, Professor of Machine Learning at the University of Amsterdam, co-founder and CTO of CuspAI, and a foundational figure behind variational autoencoders (VAEs), equivariant networks, and Bayesian deep learning. We talk about AI for science, the physics underneath generative models, and what's still missing on the road to real intelligence. Max starts with what impresses him and what worries him about the LLM era, then makes the case that the next leaps will come from physical AI and from science itself. We dig into how machine learning actually works in the lab, world models and whether priors like geometry and symmetry should be built in or simply learned, and whether transformers will still rule a decade from now. At the end, we talk about CuspAI's climate mission, AI risk and regulation, Max’s new book, and where neuroscience might inspire the next wave of ML. Timeline 00:00 — Intro00:47 — Are we happy with the LLM era?03:14 — Embodiment and physical AI08:05 — Does "AGI" even matter as a term?11:34 — Verifiers, RL, and why math/coding are tractable13:17 — What actually shifted to make materials discovery work14:42 — From molecules to biology and wet labs16:26 — Working with real labs: timescales, friction, and the "Mira" agent20:29 — Balancing simulators vs. experiments: the exploration–exploitation trade-off23:44 — Active learning for experimental design24:23 — Why active learning hasn't been central to LLMs25:24 — A general loop for ML-for-science across domains27:10 — Foundation models for chemistry: a "mother ship" plus a zoo of fine-tuned models30:04 — Quantum mechanics, interpretation, and AI as a creative theorist31:54 — World models and Yann LeCun's view; priors vs. learning34:57 — Should world knowledge be explicit? (responding to Stefano Ermon)36:41 — Vision: equivariance vs. transformers, and the role of optimization40:32 — Best model for molecular properties in 10 years? Will transformers survive?43:16 — CuspAI's climate focus and what motivated it47:10 — One platform for every material class — what transfers and what doesn't48:42 — Where does the risk of human extinction really come from?51:06 — The "pause AI" debate and the arms-race reality52:40 — Regulating powerful models: government vs. self-regulation55:16 — Who should design AI regulation? 56:29 — The new book1:00:31 — Compression, the information bottleneck, and renormalization1:03:30 — The role of foundational principles in modern AI1:04:06 — Waves in computing, the brain, and the next wave of innovation1:07:11 — Neuroscience and ML: are we in a better position now?1:09:17 — Conferences, the ICLR keynote, and finding the right peopleMusic: "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmedAbout: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

    1h 14m
  5. After Math Falls, What's Next?  with Julia Kempe (NYU/Meta)

    May 25

    After Math Falls, What's Next? with Julia Kempe (NYU/Meta)

    Julia Kempe on Why Math Will Fall Next, Superhuman Provers, and the Return of the Renaissance Researcher In this episode, we sit down with Julia Kempe, a Professor at NYU's Center for Data Science and researcher at Meta FAIR's Foundations of Reasoning team,  for a wide-ranging conversation on the future of AI research. We dig into why verifiable domains like mathematics may be on track to "fall" the way Go did. With formal verification through Lean and the Mathlib infrastructure, LLM agents can now generate and check proofs at scale, and Julia makes the case that a new industry of automated mathematical discovery is closer than most mathematicians believe. We explore why Erdős problems are already falling, what's still missing for harder fields like analysis and physics, and how synthetic data, curation, and verification fit together. From there we get into the energy and scaling limits of frontier models, the case for academic research that big labs can't pursue, how to advise PhD students when Claude can already do their first-year work, the rise of AI safety and security as research priorities, and Julia's optimistic argument that AI tools are bringing back the Renaissance generalist  -  the researcher who can finally work fluently across math, biology, and beyond. Timeline 00:00 — Introductions01:00 — Defining reasoning and verifiable domains04:00 — Lean, Mathlib, and the formalization of mathematics10:00 — Constructive proofs, Erdős problems, and the new wave of "AI mathematicians"14:00 — Will math be "solved"? Art, photography, and the changing nature of creative work18:00 — Why physics is harder than math22:00 — Moravec's paradox, evolution, and why robotics lags behind language27:00 — The Renaissance is back: generalist researchers in the age of AI29:00 — Advising students: math, programming, and what core education still matters32:00 — Teaching and assessment when GPT can do the homework35:00 — Anti-AI backlash, energy costs, and the security threat40:00 — Scaling vs. efficiency42:00 — Model collapse, synthetic data, and what's left to squeeze from the internet44:00 — What's exciting next: AI for science, safety, robotics, memory, and planning47:00 — Annotation costs as a proxy50:00 — Superhuman models and what security even means against them52:00 — AlphaGo as precedent for verifiable superhuman performance54:00 — Hallucination, the Mirage paper, and whether these are solvable problems56:00 — Why coding isn't fully solved yet58:00 — Agent security, prompt injection, and the Wild West of deployed agents1:01:00 — Regulation: what's needed and what's possible1:04:00 — Advice for PhD students and what research academia should pursue1:09:00 — Startup opportunities: robotics, security, and AI for finance1:12:00 — Closing thoughts: use the tools, and build grassroots AI for goodMusic: "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmedAbout: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

    1h 15m
  6. Intelligence in an Open World - with Mengye Ren (NYU)

    May 20

    Intelligence in an Open World - with Mengye Ren (NYU)

    We talk with Mengye Ren, Assistant Professor at NYU's Center for Data Science, about what intelligence actually means once you step outside a benchmark, and why scaling a single centralized model isn't the whole story. We get into why intelligence has to be defined in open environments, not closed ones, and what that means for how we measure progress. We push on the creativity question: today's models sample bottom-up from a softmax or a Gaussian, with no internal loop of consideration, and as Mengye puts it, we haven't understood creativity yet and we're already prepared to hand it over. We also talk about what's missing for the next paradigm: continual learning, memory, embodied grounding, and smaller models that actually accumulate experience instead of re-deriving everything from scratch each call. Along the way, we get into JEPA and latent variables, biology as inspiration vs. blueprint, why frontier labs don't lean on explicit latents, the limits of synthetic data and world models, agent-to-agent communication, model uncertainty and forecasting, and whether ML education still matters when AI writes the experiments. A grounded, contrarian conversation about where AI research should be looking next, beyond benchmarks, beyond scale. Timeline00:00 — Intro and welcome 01:24 — What is intelligence? Defining it relative to objectives and open environments 04:19 — Is intelligence really the path to human flourishing, or is it productivity? 04:57 — Safety, scalable oversight, and whether stronger models help or hurt 06:09 — What does "alignment" actually mean? 07:18 — Centralized vs. decentralized models: objectivity vs. personal meaning 08:50 — Hinton vs. LeCun: where Mengye stands on AI risk 10:29 — Bottom-up vs. top-down architectures and feedback loops 21:28 — Biology and AI: inspiration, not blueprint 24:14 — Biological plausibility, spiking nets, and where the analogy breaks 25:39 — JEPA, Mamba, and architectures beyond the transformer 27:31 — Language as a special modality: abstraction built for communication 29:04 — Are we too locked into the current paradigm? Risk of creativity collapse 30:09 — Synthetic data, simulation, and the brain's own generative models 31:43 — World models and physical AI: how babies actually learn 33:03 — The case for smaller, continually learning models 37:02 — The role of academic research in a frontier-lab world 39:47 — Why LLMs aren't funny: the creativity gap 40:35 — What research areas matter most: embodiment, continual learning, creativity 42:05 — Creativity is bounded by experience — and why bottom-up sampling isn't enough 45:35 — Agent-to-agent communication and the limits of sub-agents 46:39 — Model confidence, epistemic uncertainty, and forecasting 49:44 — Tokenization, static vs. dynamic worlds, and always-learning systems 52:20 — Latent variables, JEPA, and why frontier models skip them 53:40 — The future of ML education when AI writes the experiments Music: "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmedAbout: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

    59 min
  7. Language, Cognition, and the Limits of LLMs - with Tal Linzen (NYU/Google)

    May 17

    Language, Cognition, and the Limits of LLMs - with Tal Linzen (NYU/Google)

    We host Tal Linzen, Associate Professor at NYU and Research Scientist at Google, for a conversation on the intersection of cognitive science and large language models. We discussed why children can learn language from around 100 million words while LLMs need trillions, and the surprising finding that as models get better at predicting the next word, they become worse models of how humans actually process language. Tal walked us through how his lab uses eye-tracking and reading-time data to compare model behavior to human behavior, and what that reveals about prediction, working memory, and the limits of current architectures. We also got into nature versus nurture and how inductive biases can be instilled by pre-training on synthetic languages, world models and whether transformers actually use the geometric structure they encode, the BabyLM challenge and data-efficient language learning, and what mechanistic interpretability can offer cognitive science beyond just fixing model bugs. The conversation closed on academia versus industry, the role of PhDs in the current AI moment, and how AI coding tools are changing the way Tal teaches and evaluates students at NYU. Timeline 00:13 — Intro and what cognitive science means02:16 — Using computational simulations to understand how humans learn language05:26 — How children learn language vs. how LLMs are pre-trained07:53 — Why mainstream LLMs are not good models of humans 10:07 — Comparing humans and models with eye-tracking and reading behavior13:52 — Sensory modalities, smell, and how much you can learn from language alone16:03 — Animal cognition and decoding animal communication17:00 — Nature vs. nurture, inductive biases, and what transformers can and can't learn21:21 — Instilling inductive biases through synthetic languages 27:34 — The bouba/kiki effect and cross-linguistic sound symbolism28:33 — Latent causal structure in language and whether models discover it31:13 — Does knowing linguistics help build better models?35:07 — World models: what they mean, and why transformers encode geometry but don't use it39:13 — Tokenization, and why Tal doesn't like it41:35 — Scaling laws and the inverse-U curve of model quality vs. human fit44:34 — Where the human–model mismatch comes from: architecture, memory, and data47:08 — Diffusion language models and sentence planning48:21 — Data quality, synthetic data, and curriculum effects50:54 — Comparing models at different training stages to human development; BabyLM54:40 — What level of the model should we actually probe? Representations vs. behavior1:01:04 — Mechanistic interpretability, Deep Dream, and human dreaming1:02:11 — Cognitive neuroscience, intracranial recordings, and working memory1:10:31 — Should you still do a PhD in 2026?1:12:31 — Will software engineers lose their jobs to AI?1:17:43 — Teaching in the age of coding agents: what changes in the classroom1:20:54 — What's next: human-like LLMs as user simulators, and recruitingMusic: "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmedAbout: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

    1h 23m
  8. The Principles of Diffusion Models -  with Jesse Lai (Sony AI)

    May 10

    The Principles of Diffusion Models - with Jesse Lai (Sony AI)

    We host Chieh-Hsin (Jesse) Lai, Staff Research Scientist at Sony AI and visiting professor at National Yang Ming Chiao Tung University, Taiwan, for a conversation about diffusion models, the technology behind tools like Stable Diffusion, and most of the AI image and video generators you've seen in the last few years. Jesse recently co-authored The Principles of Diffusion Models with Stefano Ermon, and the book is quickly becoming a go-to reference in the field. We start with what a generative model actually is, and what it means to "generate" an image or a sound. Jesse explains the core idea behind diffusion in plain terms. You start with pure noise, and a neural network gradually cleans it up, step by step, until a realistic image emerges. From there, we talk about why diffusion has come to dominate so much of generative AI. Because the model builds an image gradually, you can guide it along the way, nudging the output toward what you actually want, refining details, or combining it with other controls. We also discuss the common critique that diffusion is slow and how the field has largely addressed it through new techniques. We zoom out to the bigger picture, too. Jesse shares his view on world models and whether diffusion is the right foundation for them. We talk about what makes a generative model genuinely good versus just good at gaming benchmarks, and why evaluating creativity and realism is so much harder than scoring a multiple-choice test. Timeline 00:12 — Intro and welcoming Jesse 00:47 — Why Jesse wrote the book, and who it's for 03:29 — The three families of diffusion models, and why they're really one idea 05:14 — What makes a good generative model 07:39 — How do you even measure if a generated image is good 08:59 — Why diffusion beats autoregressive models for images 10:33 — Is diffusion still slow? How fast generation got fast 11:12 — A simple intuition for what a "score" is 14:12 — How the different flavors of diffusion connect under the hood 14:42 — Diffusion for text and proteins 17:12 — Consistency models and the push for one-step generation 22:12 — Diffusion for world models: simulating reality in real time 26:12 — Do world models need to understand language 35:12 — Is diffusion the right tool, or just a convenient one 38:12 — What benchmarks actually tell us, and what they miss 46:12 — Closing thoughts and where to find the book Music: "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0."Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.Changes: trimmedAbout: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

    56 min

Ratings & Reviews

5
out of 5
6 Ratings

About

Two AI Researchers - Ravid Shwartz Ziv, and Allen Roush, discuss the latest trends, news, and research within Generative AI, LLMs, GPUs, and Cloud Systems.

You Might Also Like