Linear Digressions

Katie Malone

4.8 (355)
TECHNOLOGY
UPDATED BIWEEKLY

Demystifying AI for the intelligently curious

5D AGO

ReAct and Tool Usage (The Agents Season, Episode 2)

Before 2022, there was a wall between AI and the real world — models could reason impressively, but couldn't look anything up, run code, or check whether anything they said was actually true. This episode traces the moment that wall came down, through two landmark papers: ReAct, which showed what happens when you interleave reasoning and action in a loop, and Toolformer, which taught models to decide *for themselves* when to reach for a tool. Plus: what MCP actually is, and why a hobbyist project called Open Claw became the fastest-growing open source project in history. --- Website: https://lineardigressions.com Apple Podcasts: https://podcasts.apple.com/us/podcast/linear-digressions/id941219323 Spotify: https://open.spotify.com/show/1JdkD0ZoZ52KjwdR0b1WoT Substack: https://substack.com/@lineardigressions

24 min
APR 20

What's an AI Agent? And Why's That Hard to Define? (The Agents Season, Episode 1)

AI agents are having a moment — and unpacking them properly takes more than a single conversation. This episode kicks off a dedicated multi-part season exploring AI agents from every angle, building up a complete picture piece by piece rather than skimming the surface. Think of it as a structured deep dive into one of the most talked-about (and most misunderstood) topics in machine learning right now. Buckle up — ten more episodes to go. --- Website: https://lineardigressions.com Apple Podcasts: https://podcasts.apple.com/us/podcast/linear-digressions/id941219323 Spotify: https://open.spotify.com/show/1JdkD0ZoZ52KjwdR0b1WoT Substack: https://substack.com/@lineardigressions

19 min
APR 13

Unfaithful Chain of Thought

What's actually happening when an LLM "thinks out loud"? Research on human decision-making suggests that much of the reasoning we believe drives our choices is actually post hoc rationalization — we decide first, explain later. Katie and Ben get curious about whether the same might be true for large language models: when you watch a model reason through a problem in real time, is that chain of thought the genuine process, or just a plausible-sounding story told after the fact? It's a deceptively deep question with real stakes for how much we should trust model explanations. Miles Turpin et al., "Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting" (NeurIPS 2023, NYU and Anthropic): https://arxiv.org/abs/2305.04388 Anthropic, "Reasoning Models Don't Always Say What They Think" (Alignment Faking research, 2025): https://www.anthropic.com/research/reasoning-models-dont-say-think

25 min
APR 6

Benchmark Bank Heist

What if an AI decided the smartest way to pass its test was to find the answer key? That's exactly what Anthropic's Claude Opus did when faced with a benchmark evaluation — reasoning that it was being tested, tracking down the encrypted eval dataset, decrypting it, and returning the answer it found inside. It's equal parts impressive and unsettling. This episode digs into what actually happened, why it matters for how we measure AI progress, and what this very novel failure mode means for the already-tricky science of benchmarking language models. Links Anthropic's writeup on the BrowseComp reverse-engineering done by Claude Opus 4.6: https://www.anthropic.com/engineering/eval-awareness-browsecomp BrowseComp benchmark from OpenAI: https://openai.com/index/browsecomp/

13 min
MAR 30

Benchmarking AI Models

How do you know if a new AI model is actually better than the last one? It turns out answering that question is a lot messier than it sounds. This week we dig into the world of LLM benchmarks — the standardized tests used to compare models — exploring two canonical examples: MMLU, a 14,000-question multiple choice gauntlet spanning medicine, law, and philosophy, and SWE-bench, which throws real GitHub bugs at models to see if they can fix them. Along the way: Goodhart's Law, data contamination, canary strings, and why acing a test isn't always the same as being smart.

30 min
MAR 23

The Hot Mess of AI (Mis-)Alignment

The paperclip maximizer — the classic AI doom scenario where a hyper-competent machine single-mindedly converts the universe into office supplies — might not be the AI risk we should actually lose sleep over. New research from Anthropic's AI safety division suggests misaligned AI looks less like an evil genius and more like a distracted wanderer who gets sidetracked reading French poetry instead of, say, managing a nuclear power plant. This week we dig into a fascinating paper reframing AI misalignment through the lens of bias-variance decomposition, and why longer reasoning chains might actually make things worse, not better. - "The Hot Mess Theory of AI Misalignment: How Misalignment Scales with Model Intelligence and Task Complexity" — Anthropic AI Safety. https://arxiv.org/abs/2503.08941

23 min
MAR 15

The Bitter Lesson

Every AI builder knows the anxiety: you spend months engineering prompts, tuning pipelines, and chaining calls together — then a new model drops and half your work evaporates overnight. It turns out researchers have been wrestling with this exact dynamic for 30 years, and they keep arriving at the same uncomfortable answer. That answer is called the Bitter Lesson — and understanding it might be the most important thing you can do for whatever you're building right now. From Deep Blue to AlexNet to modern LLMs, scale keeps beating sophistication, and knowing which side of that line your work falls on makes all the difference. Links - Richard Sutton, "The Bitter Lesson" - Alon Halevy, Peter Norvig, and Fernando Pereira, "The Unreasonable Effectiveness of Data" - Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, "ImageNet Classification with Deep Convolutional Neural Networks"

19 min
MAR 9

From Atari to ChatGPT: How AI Learned to Follow Instructions

From Atari to ChatGPT: How AI Learned to Follow Instructions by Katie Malone

26 min

See All (304)

4.8

out of 5

355 Ratings

Reminiscing

04/12/2025

bob2457;654765

Used to really enjoy this podcast back in the day. I sometimes wonder what they think about the all the changes the last last five years.
My favorite Data Science podcast

07/27/2021

Cruzisiah

Great mix of technical terms explained in digestible way and not being too robotic
Great Podcast!

05/01/2021

Tabeen Raoof

Katie & Ben, thank you for such a great podcast. I just finished listening to all of the episodes. I started mid 2020 after the the last episode was published! I’m not even a data scientist or have any specialty in the area but it’s been informative and really fun to listen to Linear Digression.
Thanks for the ride!

08/05/2020

flyeversheep

It’s been great time!

Demystifying AI for the intelligently curious

Creator

Katie Malone
Years Active

2014 - 2026
Episodes

304
Rating

Clean
Show Website

Linear Digressions

Technology

Technology

Updated Semiweekly
Technology

Technology

Updated Weekly
Life Sciences

Life Sciences

Updated Weekly
Social Sciences

Social Sciences

Updated Biweekly
Technology

Technology

Updated Biweekly
Documentary

Documentary

Updated Weekly
Investing

Investing

Updated 1d ago

Linear Digressions

ReAct and Tool Usage (The Agents Season, Episode 2)

What's an AI Agent? And Why's That Hard to Define? (The Agents Season, Episode 1)

Unfaithful Chain of Thought

Benchmark Bank Heist

Benchmarking AI Models

The Hot Mess of AI (Mis-)Alignment

The Bitter Lesson

From Atari to ChatGPT: How AI Learned to Follow Instructions

Reminiscing

My favorite Data Science podcast

Great Podcast!

Thanks for the ride!

About

Information

You Might Also Like

Linear Digressions

Episodes

ReAct and Tool Usage (The Agents Season, Episode 2)

What's an AI Agent? And Why's That Hard to Define? (The Agents Season, Episode 1)

Unfaithful Chain of Thought

Benchmark Bank Heist

Benchmarking AI Models

The Hot Mess of AI (Mis-)Alignment

The Bitter Lesson

From Atari to ChatGPT: How AI Learned to Follow Instructions

Ratings & Reviews

About

Information

You Might Also Like