The MAD Podcast with Matt Turck

Matt Turck

5.0 (28)
Technology

The MAD Podcast with Matt Turck, is a series of conversations with leaders from across the Machine Learning, AI, & Data landscape hosted by leading AI & data investor and Partner at FirstMark Capital, Matt Turck.

2d ago

The Biggest Chip Ever Built — Why OpenAI Runs On It | Cerebras CEO Andrew Feldman

AI is no longer just a race to train smarter models. As AI moves into production, the bottleneck is increasingly inference: how fast models can generate tokens, use tools, reason, verify, and act. In this episode of the MAD Podcast, Matt Turck sits down with Andrew Feldman, co-founder and CEO of Cerebras, to explain why fast inference may define the next era of AI. Cerebras is known for building a chip the size of a silicon wafer. But this conversation is not just about one company or one chip. It is a deep dive into the AI infrastructure stack: GPUs, ASICs, memory, HBM, SRAM, data centers, power, TSMC, AWS, OpenAI, agents, reasoning models, and why speed changes what AI products can become. Andrew explains why “tokens per second per user” matters, why generating a single word can require moving the equivalent of 100 HD movies through memory, why agents amplify latency, why GPUs struggle with certain inference workloads, and why fast AI may eventually reshape SaaS itself. This is a reference conversation on fast inference, AI chips, and the next compute bottleneck. (00:00) Cold open & Intro (01:31) Why speed became the AI bottleneck (02:32) Tokens per second per user, explained (03:16) AI’s broadband moment and the Netflix analogy (04:35) The AI chip landscape: GPUs, TPUs, Trainium, ASICs (06:36) What is an ASIC? (08:08) Nvidia, Groq, and the fast inference war (09:16) OpenAI, Broadcom, and specialized silicon (12:10) China, power, and sovereign AI infrastructure (15:05) Is the AI infrastructure boom a bubble? (18:56) The hidden bottlenecks: HBM, CoWoS, and 3nm (22:57) Why agents are creating CPU demand (25:36) Andrew Feldman’s path from SeaMicro to Cerebras (26:13) Why Cerebras bet on AI in 2016 (31:14) SRAM vs. HBM: why inference is a memory problem (33:19) What wafer-scale computing actually means (34:28) The deep-tech “Everest” problem (36:07) The moment the first Cerebras system worked (36:49) Ringing the bell and surviving deep tech (39:08) How a giant chip handles failure (41:22) Why GPUs struggle with decode (42:17) Prefill vs. decode explained (44:01) The “100 HD movies” problem in AI inference (45:04) How fast inference changes RL and training (48:08) Reasoning models and why they cost more compute (50:08) Verification, guardrails, and small models checking big models (52:37) Multimodal AI and the path to video (53:51) Cerebras’ business model: hardware, cloud, and API (55:14) OpenAI’s 750MW inference deal (55:36) Why data centers are measured in megawatts (58:01) AWS Trainium + Cerebras decode (59:29) Fast tokens as a cloud product (01:00:52) Is CUDA still a moat? (01:03:53) How TSMC helped Cerebras build the giant chip (01:07:41) Why nobody cared in 2020 (01:08:15) Why chip supply chains are hard to diversify (01:09:54) Why today’s AI models will be the worst you ever use (01:10:38) What fast AI could do to SaaS
Jul 16

OpenAI’s Compute Chief: We Can’t Build Fast Enough | Sachin Katti

Is the AI industry actually overbuilding, or is the physical world moving too slowly to keep up? In this episode of the MAD Podcast, OpenAI's Head of Industrial Compute, Sachin Katti, takes us inside the "belly of the beast" of what may be the largest infrastructure project in human history. We explore the staggering physical reality of the AI boom—from $50 billion supercomputers and liquid-cooled data centers that "turn electrons into tokens," to overhauling the U.S. power grid and exploring nuclear energy. Sachin also pulls back the curtain on OpenAI's Stargate strategy, their move into custom silicon with Project Jalapeno, and the mind-bending reality that AI is now beginning to design the very chips that will power its own future. (00:00) — Cold open: “One of the largest things humanity has ever built”(00:30) — Welcome: Sachin Katti, Head of Industrial Compute at OpenAI(01:44) — Is this the biggest infrastructure buildout in history?(03:41) — Why OpenAI is building a new industrial muscle(04:54) — What an AI data center actually is(05:27) — “Factories turning electrons into tokens”(06:35) — Why AI data centers need liquid cooling everywhere(08:10) — The power problem: grids, generation, transmission, substations(10:43) — Behind-the-meter power and gas turbines(11:02) — Why nuclear “can’t come soon enough”(11:49) — Jalapeño: why OpenAI is designing its own AI chips(13:19) — Tokens per watt: the new metric that matters(13:38) — Why inference may now dominate AI compute(14:58) — Is OpenAI overbuilding compute?(16:47) — Why OpenAI thinks the bigger risk is not building fast enough(17:55) — Communities, jobs, water, and the local data-center debate(21:16) — How OpenAI chooses data-center sites(22:25) — What “industrial compute” means inside OpenAI(25:59) — Sachin’s path: Stanford, startups, Intel, OpenAI(28:05) — OpenAI’s compute portfolio: Microsoft, hyperscalers, neoclouds(29:37) — Stargate explained(31:21) — Abilene, Oracle, and the next wave of AI data centers(32:48) — How massive AI compute gets financed(34:05) — How OpenAI designed Jalapeño so quickly(35:59) — AI is starting to help design AI chips(36:20) — MRC: the networking problem behind 100,000 GPUs(38:47) — Bottlenecks: transformers, turbines, electricians, supply chains(40:29) — Guaranteed capacity: intelligence as a supply unit(42:08) — Will AI data centers move to space?
Jul 9

Stripe's AI Chief: How AI Agents Will Buy, Sell, and Pay

Is the internet ready for AI agents to take over our wallets and run their own businesses? In this episode of The MAD Podcast, Stripe's Emily Sands reveals how agentic commerce is rapidly shifting from a hypothetical concept to deployed financial infrastructure. From combating the rising existential threat of token theft to solving the bottleneck of "vibe deployment", Emily unpacks the shared payment tokens and real-time billing systems required to securely scale autonomous digital buyers and highlights a near future where agents operate as independent, end-to-end micro-firms. (00:00) — Cold open & Intro (01:24) — The rise of agentic e-commerce (02:11) — The spectrum of agent-led purchases (03:16) — How merchants adapt to AI-driven commerce (05:50) — Defining the levels of autonomy in AI shopping (07:08) — What is the Agent E-Commerce Protocol (AEP)? (08:49) — Shared payment tokens and secure AI transactions (09:58) — Who is adopting the Agent E-Commerce Protocol? (11:38) — Can agents negotiate and sell products? (13:32) — The macroeconomic impact of AI agents (14:46) — The boom of solopreneurs and AI-driven business creation (16:56) — Why building trust is the biggest roadblock for AI commerce (20:19) — How link wallets improve payment security (21:21) — Improving the user experience in AI shopping apps (23:16) — How the Link Wallet sets guardrails for AI agents (25:40) — One-time use virtual cards vs flexible AI wallets (28:03) — Unpacking the shared payment token primitive (29:59) — How stablecoins enable profitable AI microtransactions (35:03) — Managing liability: Who is at fault if an agent goes haywire? (36:38) — Why agent payments might be safer than human transactions (37:41) — What is Vibe Deployment? (40:13) — Why Stripe built Stripe Projects for agent deployment (41:22) — Why Stripe cares about orchestrating app deployments (42:50) — How tokens break the traditional SaaS billing model (44:34) — Why AI companies are moving to hybrid and usage-based billing (47:15) — Streaming payments and real-time token tracking (48:42) — The massive data challenge for AI company accountants (50:41) — Token theft: The fastest-growing fraud in the AI economy (52:04) — The cottage industry of free trial and multi-account abuse (54:16) — How fraudsters monetize stolen AI tokens on the dark web (01:00:06) — How Stripe Radar uses network density to fight AI fraud (01:01:15) — Tempo's role in the Agent E-Commerce Protocol (01:04:12) — The AI startup ecosystem is accelerating business creation (01:09:01) — The token cost shock: Are buyers getting carried away? (01:11:19) — 2026 Predictions: Agents running businesses end-to-end
Jul 2

Inside Nemotron & NVIDIA’s AI Lab | Bryan Catanzaro

NVIDIA is a chip company. So why does it put hundreds of researchers on building AI models — and then give them away for free? Bryan Catanzaro is VP of Applied Deep Learning Research at NVIDIA and one of the people whose work quietly underpins modern AI: he helped create cuDNN (NVIDIA's first deep learning product), co-invented DLSS, and named and built Megatron, the framework behind how much of the industry trains large models. Today he leads Nemotron, NVIDIA's family of open models — and Nemotron 3 Ultra, released just weeks ago, is one of the strongest open-weights models to come out of the US. Matt Turck sits down with Bryan for a genuinely deep conversation: the real business logic behind a chip company building its own models, the state of open vs. closed AI, and whether the US is falling behind China in open models. Then they go inside Nemotron itself — four-bit (NVFP4) pretraining, hybrid Mamba-Transformer architecture, mixture-of-experts, multi-token prediction, and multi-teacher distillation — all explained in plain language. Plus a rare look at how a modern AI research org actually runs, what it was like working alongside Andrew Ng and Dario Amodei at Baidu, why Bryan doesn't believe in the singularity, and his contrarian case that open AI is safer than closed. A reference conversation for anyone trying to understand where AI is really headed. (00:00) — Cold open & Intro (01:33) — Is open source AI catching the frontier? (05:29) — Do closed labs blocking distillation slow open source down? (07:42) — Is the US falling behind China? (10:30) — Why companies actually choose open models (12:39) — A "crazy" 2008 bet: machine learning on GPUs (15:33) — Working with Andrew Ng and Dario Amodei at Baidu (17:41) — Coming back to NVIDIA: DLSS and the birth of Megatron (21:55) — The real reason NVIDIA builds its own models (24:28) — Is Moore's Law really dead? (33:37) — The Nemotron family: Nano, Super, Ultra (35:09) — Built for agents: why NVIDIA bets on speed (36:02) — How you train a 550B model in 4 bits (39:25) — Hybrid Mamba-Transformer, explained simply (42:31) — Mixture of experts — and why NVIDIA built NVL72 around it (47:26) — Why a 1-million-token context window matters (49:26) — Multi-token prediction: how the model predicts 5 tokens at once (52:47) — Multi-teacher distillation: teaching one model from many (58:01) — Where reinforcement learning goes next (01:00:16) — Inside NVIDIA's research org: "the mission is the boss" (01:04:03) — How NVIDIA decides who gets the GPUs (01:10:53) — Why NVIDIA still feels entrepreneurial after 33 years (01:12:58) — Why Bryan doesn't believe in the singularity (01:17:50) — The AI backlash (01:19:18) — The controversial case: open AI is safer than closed
Jun 25

Cloudflare CEO: The Internet's Business Model Is Dead

Cloudflare CEO and co-founder Matthew Prince joins Matt Turck for a wide-ranging and fascinating conversation about what happens when the Internet is no longer mostly used by humans, but by bots, AI agents and machines. Matthew explains why Cloudflare now sees automated traffic overtaking human traffic online, why agents could create a massive explosion in Internet traffic, and why the old web business model built around clicks, ads, and pageviews may be breaking. We also go deep on what Cloudflare actually does, how it built one of the world’s most important Internet networks, why products like Workers, AI Gateway, edge inference, Durable Objects, sandboxes, and agent security matter, and how Cloudflare is reorganizing itself for the AI era. Along the way, Matthew shares wild Cloudflare origin stories involving hacker kids, human rights groups, cricket in Pakistan, root servers, Eurovision, JPMorgan, and the strange paths that led Cloudflare from scrappy startup to critical Internet infrastructure. (00:00) — Cold open (00:34) — Intro (01:27) — The moment bots passed humans online (04:05) — "Agent," "bot," "crawler" — what they really mean (05:28) — Why your AI agent visits 5,000 sites to do one thing (06:27) — The internet's business model is breaking (06:52) — What happens to "brands" when machines do the buying (08:11) — What Cloudflare actually does, explained simply (10:29) — Hackers, human rights groups & an accidental product-market fit (13:37) — Building a global network (and the Telecom Pakistan cricket story) (21:10) — One hacker, from Turkish escort sites to Eurovision to JP Morgan (30:54) — Fundraising, VCs & an unlikely founding team (37:06) — How Cloudflare became an AI infrastructure company (40:24) — Cloudflare Workers and why the edge wins for inference (44:30) — AI Gateway: auditing, guardrails & runaway costs (47:05) — Why agents need a new kind of compute (52:13) — A "Log4j every week": security in the agentic era (56:03) — Inside Cloudflare: 241 billion tokens and "Cloudflare OS" (01:05:02) — Builders, sellers — and "measurers" (01:06:30) — The decision Matthew thinks every company will face (01:11:09) — What to do if AI is coming for your job (01:13:56) — Content Independence Day & the new economics of the web (01:18:27) — Pay-per-crawl, micropayments & out-scaling Visa (01:20:20) — A better internet: Spotify, local news & "holes in the cheese"
Jun 18

The GPU Myth: State of AI Compute 2026 | Stephen Balaban

Many people said GPU compute would become a commodity. The opposite happened — and a new category of "neoclouds" is now racing to build the physical backbone of the AI boom. Stephen Balaban, co-founder and CTO of Lambda, explains why the conventional wisdom was exactly wrong, why we're still massively underbuilding compute, and what it actually takes to stand up a gigawatt-scale AI factory: land, power, cooling, networking, and a financing stack most people have never heard of. We go deep on the physics of how energy becomes tokens, NVIDIA's real moat, why a 2023 GPU can lease for more today than the day it shipped, and Stephen's provocative vision of "neural software." Plus the wild Lambda origin story — from a facial recognition startup to a camera in a baseball cap to a near-billion-dollar cloud business. This is the state of AI compute in 2026, from inside one of the companies building it. (00:00) — Cold open (01:21) — Why GPU compute was never a commodity (02:45) — The H100 price index and what it gets wrong (04:02) — The real moat: technology or financing? (05:57) — Winner-take-all, or room for many neoclouds? (06:48) — Are we overbuilding or underbuilding AI compute? (09:26) — What if AI gets 10x more compute-efficient? (10:44) — The real bottleneck: land, power, and shell (11:38) — The backlash against data centers — and the misinformation (15:00) — Opening the hood: from photons to tokens (17:11) — Extracting more value from the same chip (19:26) — Frontier inference and distributed training, explained (23:26) — What actually drives compute cost (25:21) — Lambda's chip stack and the NVIDIA relationship (26:17) — A multi-silicon world? CUDA, CUDNN, and NVIDIA's real moat (28:59) — Networking, storage, and the one-click cluster (34:46) — Renting vs. owning, and full vertical integration (36:24) — How global is Lambda? Does location still matter? (38:44) — The financing stack: off-take agreements, SPVs, and credit (41:16) — Why a 2023 GPU leases for more today (42:36) — A futures market for compute? (43:54) — Origin story: facial recognition, Perceptio, and Apple (47:03) — The Lambda hat and Dream Scope (48:59) — The $60K bet that became a cloud business (52:00) — Holding the team together through the hard times (54:30) — Bringing on a new CEO; Stephen as CTO (57:33) — Matching xAI on high-velocity deployment (59:29) — "AI won't write software — it will become the software" (01:01:30) — Neural software vs. vibe coding (01:04:25) — Do agents change the compute layer? (01:06:14) — Self-assembling software inside Lambda (01:08:18) — Gigawatt-scale AI factories (01:08:57) — One person, one GPU (01:12:04) — Hot takes: overrated and underrated in AI
Jun 4

OpenAI's Dan Roberts: Why AI Can Now Make Discoveries

Are we witnessing the first real signs of AI becoming a scientist? In this episode of The MAD Podcast, Matt Turck sits down with Dan Roberts, lead of the Foundations of Reinforcement Learning team at OpenAI, to explore one of the biggest shifts happening in AI: the rise of reasoning models, test-time compute, and reinforcement learning as engines of scientific discovery. Dan brings a rare perspective - from theoretical physics, black holes, quantum information, and deep learning theory - to explain how models are learning to “think,” why language may be such a powerful foundation for intelligence, what recent AI math breakthroughs really mean, and whether we are beginning to see AI systems that can contribute to science itself. (00:00) Intro: AI's wild week in mathematics (01:21) What OpenAI's Foundations of RL team does (03:08) Dan's journey: from black holes and quantum gravity to frontier AI (07:04) Are AI systems becoming useful for real science? (08:21) The AI math moment: Erdős, OpenAI, DeepMind, and Anthropic (08:52) Why the OpenAI result was an act of exploration (10:25) OpenAI vs. DeepMind: informal reasoning vs. formal proof (12:13) RL 101: learning by doing, not just watching (15:10) Why reinforcement learning works (15:58) How RL breaks: sparse feedback and long-horizon tasks (17:03) RLHF: how human feedback shaped early language models (18:48) Move 37, self-play, and the search for novel strategies (22:16) Explore vs. exploit in scientific discovery (24:49) Why RL may now be "the cake," not the cherry on top (25:46) Why RL started working with large language models (27:29) Is RL "sucking supervision through a straw"? (28:47) Why language may be the grounding layer for intelligence (31:46) A contrarian take on the Bitter Lesson (32:41) What test-time compute actually is (34:50) How RL gives models the ability to think (35:40) Verifiable rewards, math, coding, and the messy real world (38:00) What physics can teach us about AI (42:08) Is there a thermodynamics of AI? (43:08) From Erdős problems to Einstein-level AI (45:16) Is AI already doing original science? (45:51) How far are we from AI automating AI research? (47:41) Why Dan is excited about the future of science
May 28

State of Enterprise AI 2026: Aaron Levie on Tokenmaxxing, Rise of Headless, and AI-Proofing Your Job

Aaron Levie, co-founder and CEO of Box, returns to the MAD Podcast with the clearest read in tech on what AI is actually doing inside the world's largest enterprises right now - not the hype version, the real one. After hundreds of Fortune 500 CIO conversations this year, Aaron explains why we're still in "day one" of the agent era, why one badly written agent run can now cost $1,000 in compute, and why progress at the AI labs is paradoxically slowing enterprise deployment. We get into the token cost shock now reshaping IT budgets, why coding agents have reached escape velocity while the rest of knowledge work hasn't, the rise of headless software and what replaces per-seat pricing, the emergence of the forward-deployed engineer as the hottest job in tech, why Aaron thinks the AI doomers are wrong about jobs, and where startups can still win as the labs move up the stack. (00:00) Intro (01:18) Silicon Valley engineering vs. everyone else (05:35) Are enterprise CIOs actually bullish on AI? (08:51) Tokenmaxxing & why your AI bill is about to explode (11:34) The myth of falling token costs and AI spend escaping IT budgets (17:37) The $5B startup hiding in AI compute (18:14) The mosaic of models inside every enterprise (21:28) Why coding works and the rest of knowledge work doesn't (25:53) The Bob and Sally problem: access control breaks agents (30:31) Will enterprise AI really take 10 years to roll out? (32:24) The capability overhang: why faster models slow diffusion (34:23) Data is the bottleneck (it always was) (39:02) The rise of internal forward-deployed engineers (41:23) Why the AI doomers are wrong about jobs (43:43) Headless software is inevitable (46:14) What replaces per-seat pricing (47:37) How Box itself is going headless (49:42) How the org chart actually evolves (1:00:33) Future-proofing yourself as an enterprise employee (1:06:40) Are we all just going to work for OpenAI and Anthropic? (1:07:11) Where startups can still win as the labs move up

See All (125)

out of 5

28 Ratings

Exactly the AI podcast your looking for

Jan 15

John_Karly

High quality interviews with genuine researchers, succinctly discussed at an approachable level
Keeping sharing the latest and bring amazing thinking !

06/08/2025

NishaPaliwal

Great show and weekly must follow
Global Investment Views

06/01/2025

Cnaloha

Recommend if interested in broader perspectives for investing in business + tech -+ government interdependencies globally.
Best way to keep up with AI

03/30/2025

Cmcrimer

I’m often asked how best to get up to speed on how AI is impacting business and I always recommend the MAD podcast. To get started, check out the landscape overview and investing overview in November 2024. With some foundational language, pick topics and people that interest you. Matt does an amazing job demystifying even the most technical topics. The MAD podcast is enlightening, entertaining and educational. 🫶

Creator

Matt Turck
Years Active

2023 - 2026
Episodes

125
Rating

Clean
Show Website

The MAD Podcast with Matt Turck

Technology

Technology

Updated Weekly
Technology

Technology

Updated Weekly
Technology

Technology

Updated Weekly
Technology

Technology

Updated Biweekly
Technology

Technology

Updated Daily
Technology

Technology

Updated Weekly
Technology

Technology

Updated Weekly

The MAD Podcast with Matt Turck

The Biggest Chip Ever Built — Why OpenAI Runs On It | Cerebras CEO Andrew Feldman

OpenAI’s Compute Chief: We Can’t Build Fast Enough | Sachin Katti

Stripe's AI Chief: How AI Agents Will Buy, Sell, and Pay

Inside Nemotron & NVIDIA’s AI Lab | Bryan Catanzaro

Cloudflare CEO: The Internet's Business Model Is Dead

The GPU Myth: State of AI Compute 2026 | Stephen Balaban

OpenAI's Dan Roberts: Why AI Can Now Make Discoveries

State of Enterprise AI 2026: Aaron Levie on Tokenmaxxing, Rise of Headless, and AI-Proofing Your Job

Exactly the AI podcast your looking for

Keeping sharing the latest and bring amazing thinking !

Global Investment Views

Best way to keep up with AI

About

Information

You Might Also Like

The MAD Podcast with Matt Turck

Episodes

The Biggest Chip Ever Built — Why OpenAI Runs On It | Cerebras CEO Andrew Feldman

OpenAI’s Compute Chief: We Can’t Build Fast Enough | Sachin Katti

Stripe's AI Chief: How AI Agents Will Buy, Sell, and Pay

Inside Nemotron & NVIDIA’s AI Lab | Bryan Catanzaro

Cloudflare CEO: The Internet's Business Model Is Dead

The GPU Myth: State of AI Compute 2026 | Stephen Balaban

OpenAI's Dan Roberts: Why AI Can Now Make Discoveries

State of Enterprise AI 2026: Aaron Levie on Tokenmaxxing, Rise of Headless, and AI-Proofing Your Job

Ratings & Reviews

About

Information

You Might Also Like