Latent Space: The AI Engineer Podcast

Latent.Space

The podcast by and for AI Engineers! In 2025, over 10 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0. We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al. Full show notes always on https://latent.space www.latent.space

  1. 29M AGO

    🔬Doing Vibe Physics — Alex Lupsasca, OpenAI

    Some people are going crazy over GPT 5.5. Some people. This is the story of the Jagged Frontier. People who use AI to write emails or even code implementation work find the lift moderate whereas people pushing the limits of the model are figuring out that the limits just moved outwards. Alex Lupsaska has been tracking this limit for a year and a half now. “When GPT5 came out, it was able to reproduce one of my best papers (that took a very long time to come up with) in 30 minutes.” But Alex also notes that this shift was mostly invisible. I remember when GPT-5 came out… on Twitter, the reception was lukewarm. A lot of people were like, well, we expected a lot more, and it’s not better at writing email. And I remember thinking, well, okay, GPT-3 could write email. How much better can it get at writing email? That’s not the point. But at the science frontier, the capabilities were really taking off. We walk through his paper and more with him in today’s Science pod! Watch here. The “Oscar for physics” Alex made an early splash in his career with breakthroughs in our understanding of black holes. He’s also known for Black Hole Explorer and an iPhone app that makes visualizing black holes fun and interactive to regular audiences. Alex won the 2024 New Horizons in Fundamental Physics Breakthrough Prize. Known as the “Oscar for physics” this is arguably the most prestigious prize an early stage theoretical physicist can win. Alex first saw promise for AI in theoretical physics after he asked o3 for help on his research. In the podcast, Alex recalls asking GPT for help with a calculation that would have taken days, and getting a result in eleven minutes. He immediately recognized how impactful AI would be for his work even as though his physicist colleagues and the larger community gave it a lukewarm or skeptical reception. The Move 37 Moment for AI x Physics GPT-5 had just been released, and Alex tried asking it to solve a problem in a just published paper. GPT-5 said no answer. But Mark Chen, CRO of OpenAI, pushed a bit harder, and had Alex prime the model with a textbook warmup problem, which it easily solved. After using this “priming” trick, GPT-5 was able to reproduce his full result in eleven minutes (yes, the paper was released after the model’s training cutoff). “This changes everything.” Alex notes that we seem to be on the edge of a massive change in theoretical physics reasoning. A year prior LLMs were just starting do correct math. Now ChatGPT could reproduce his hardest paper in the time it takes to get a coffee. Alex was on sabbatical at Vanderbilt, and he joined OpenAI to start pushing the boundary of AI’s ability to accelerate physics. “AI solved the problem before the plane landed” Alex began to put GPT through it’s paces, reaching out to colleagues for problems they were stuck on. His old PhD advisor (Prof. Andrew Storminger at Harvard) had an insidght about certain physical quantities known as “single-minus gluon tree amplitudes”. In certain cases, these amplitudes may be non-zero when previously shown to always vanish. The team pushed this intuition forward, and came up with a formula for these quantities that appeared nonzero, but which was otherwise completely intractable. Spending over a year on this problem, no real progress was made. Prof. Storminger planned to visit OpenAI to work on the problem the week after the initial conversation started. In that one week ChatGPT fully solved the problem, as Alex recalled, before Prof. Storminger’s plane even landed. What was interesting is not only that ChatGPT solved this problem, but how it solved it. The model quickly realized found a limiting case (known as the “half-collinear regime”), that in hindsight has a nice intuitive explanation. Taking this limit, the gnarly results collapsed down to a simple and intuitive formula! The last step was to prove this intuitive formula. The team started with a fresh session, gave a prompt with the context of what they previously learned, and let the model loose. Not only was ChatGPT able to reproduce the previous result, it was able to prove it using a technique unknown to the authors! The Vibe Physics moment With a concrete success in the bag, the team asked if they could generate new physics from scratch using ChatGPT. They took on what they felt to be a harder problem, looking at the graviton, a proposed particle that should appear when one combines gravity and quantum mechanics. They wrote up a simple prompt asking ChatGPT to perform the same research as the gluon paper but instead for gravitons. And then hit go! What came next was truly “vibe physics”, with ChatGPT pushing out 110 pages of novel physics, new calculations, and novel techniques. This was over the course of a day, with most interactions the familiar following the now familiar pattern for anyone who uses a coding agent: GPT: Here's your . Would you like me to do ? Alex: Yes, please do! GPT: And for those who look deeply, this really was not just a direct 1-1 mapping between gluons and gravitons. ChatGPT imported new techniques that were necessary due to the nature of gravitons, and used them flawlessly. They spent the next three weeks verifying all the results. And voila! A new paper featuring novel results in quantum gravity, generated in less than three days total. Truly a “Feel the AGI moment”. For those interested, there’s a blog post with the full transcript from initial prompt to final paper. Even if you know no physics, it’s crazy seeing pages of correct calculations fall out of simple prompts such as “Yes calculate outside of SD first. This is the first step.” Out-of-domain = new knowledge The thing that is qualitatively different between Vibe Physics and Vibe Coding is that Vibe Physics means actually extending the frontier of human knowledge. Looking at the Gluon and Graviton results, they seem in retrospect, like many results in physics and math, like natural extensions of what we already know. This is in fact part of what makes them beautiful. But this was a problem that stumped experts in the domain for a year. Although it does still have a bit of a recombinant flavor, this thing has never been done before. It may be that there are still large classes of problems that AI won’t do well on, and approaches that an AI might not think to take. This is the “taste” that everyone has been talking about. Alex told us that these capabilities, however, allow him to explore many possible avenues in order to map out much more ambitious problems to tackle. With AI able to output results basically as fast as we can conceive and validate them, the scope of what one theorist can hope to achieve has just gotten a lot, lot bigger. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe

    1h 32m
  2. APR 27

    Physical AI that Moves the World — Qasar Younis & Peter Ludwig, Applied Intuition

    From building Applied Intuition from YC-era autonomy tooling into a $15B physical AI company, Qasar Younis and Peter Ludwig have spent the last decade living through the full arc of autonomy: from simulation and data infrastructure for robotaxi companies, to operating systems for safety-critical machines, to deploying AI onto cars, trucks, mining equipment, construction vehicles, agriculture, defense systems, and driverless L4 trucks running in Japan today. They join us to explain why “physical AI” is not just LLMs on wheels, why the real bottleneck is no longer model intelligence but deployment onto constrained hardware, and why the future of autonomy may look less like one-off demos and more like Android for every moving machine. We discuss: * Applied Intuition’s mission: building physical AI for a safer, more prosperous world, powering cars, trucks, construction and mining equipment, agriculture, defense, and other moving machines * Why physical AI is different from screen-based AI: learned systems can make mistakes in chat or coding, but safety-critical machines like driverless trucks, autonomous vehicles, and robots need much higher reliability * The evolution from autonomy tooling to a broad physical AI platform: starting with simulation and data infrastructure for robotaxi companies, then expanding into 30+ products across simulation, operating systems, autonomy, and AI models * Why tooling companies came back into fashion: Qasar on why developer tooling looked unfashionable in 2016, why Applied Intuition still bet on it, and how the AI boom made workflows and tools central again * The three core buckets of Applied Intuition’s technology: simulation and RL infrastructure, true operating systems for vehicles and machines, and fundamental AI models for autonomy and world understanding * Why vehicles need a real AI operating system: real-time control, sensor streaming, latency, memory management, fail-safes, reliable updates, and why “bricking a car” is much worse than bricking an iPad * Physical machines as “phones before Android and iOS”: Peter explains why today’s vehicle and machine software stack is fragmented across many operating systems, and why Applied Intuition wants to consolidate the platform layer * Coding agents inside Applied Intuition: Cursor, Claude Code, internal adoption leaderboards, and how AI tools are changing engineering workflows even in embedded systems and safety-critical software * Verification and validation for physical AI: why evals get harder as models improve, how end-to-end autonomy changes simulation requirements, and why neural simulation has to be fast and cheap enough to make RL practical * From deterministic tests to statistical safety: why autonomy validation is shifting from binary pass/fail requirements toward “how many nines” of reliability and mean time between failures * Cruise, Waymo, and public trust: Qasar and Peter discuss why autonomy failures are not just technical issues, how companies interact with regulators, and why Waymo is setting a high bar for the industry * Simulation vs. reality: why no simulator perfectly represents the real world, how sim-to-real validation works, and why real-world testing will never disappear * World models for physical AI: hydroplaning, construction equipment, visual cues, cause-and-effect learning, and where world models help versus where they are not enough * Onboard vs. offboard AI: why data-center models can be huge and slow, but onboard vehicle models need millisecond-level latency, low power, small size, and distillation-like efficiency * Why physical AI is not constrained by model intelligence alone: the hard part is deploying models onto real hardware, under safety, latency, power, cost, and reliability constraints * Legacy autonomy vs. intelligent autonomy: RTK GPS in mining and agriculture, why hand-coded path-following worked for decades, and why modern systems need perception and dynamic intelligence * Planning for physical systems: how “plan mode” applies to robotaxis, mining, defense, and multi-step physical tasks where actions change the state of the world * Why robotics demos are not production: the brittle last 1%, humanoid reliability, DARPA Grand Challenge-style prize policy, and the advanced engineering gap between research and deployment * Applied Intuition’s hard-earned lessons: after nearly a decade, Peter says they can look at a robotics demo and predict the next 20 problems the company will hit * Qasar’s advice to founders: constrain the commercial problem, avoid copying mature-company strategies too early, and remember that compounding technology only matters if you survive long enough to see it compound * Why 2014 YC advice may not apply in 2026: capital markets, AI company dynamics, and the difference between building in stealth with a deep network versus building as a new founder today * What Applied is hiring for: operating systems, autonomy, dev tooling, model performance, evals, safety-critical systems, hardware/software boundaries, and engineers with deep curiosity about how things work Applied Intuition: * YouTube: https://www.youtube.com/@AppliedIntuitionInc * X: https://x.com/AppliedInt * LinkedIn: https://www.linkedin.com/company/applied-intuition-inc Qasar Younis: * X: https://x.com/qasar * LinkedIn: https://www.linkedin.com/in/qasar/ Peter Ludwig: * LinkedIn: https://www.linkedin.com/in/peterwludwig/ Timestamps 00:00:00 Introduction: Applied Intuition, Physical AI, and 10 Years of Building 00:01:37 Physical AI vs. Screen AI: Why Safety-Critical Changes Everything 00:02:51 The Origin Story: Tooling, YC, and the Scale AI Comparison 00:05:41 The Three Buckets: Simulation, Operating Systems, and Autonomy Models 00:11:10 Hardware, Sensors, and the LiDAR Question 00:14:26 The Operating System Layer: Why Vehicles Are Like Pre-Android Phones 00:19:13 Customers, Licensing, and the Better-Together Stack 00:21:19 AI Coding Adoption: Cursor, Claude Code, and the Bimodal Engineer 00:26:41 Verifiable Rewards, Evals, and Neural Simulation 00:31:04 Statistical Validation, Regulators, and the Cruise Lesson 00:40:25 World Models, Hydroplaning, and Cause-Effect Learning 00:43:34 Onboard vs. Offboard: Latency, Embedded ML, and Distillation 00:50:57 Plan Mode for Physical Systems and Next-Token Prediction Universally 00:53:04 Productionization: The 20 Problems Every Robotics Demo Will Hit 00:58:00 Founder Advice: Constraints, Compounding Tech, and Mature-Company Mimicry 01:05:41 Hiring Philosophy: Hardware/Software Boundary and Engineering Mindset 01:08:50 General Motors Institute, Education, and the Curiosity Mindset Transcript Introduction: Applied Intuition, Physical AI, and 10 Years of Building Alessio [00:00:00]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, founder of Kernel Labs, and I’m joined by Swyx, editor of Latent Space. Swyx [00:00:10]: And today we’re very honored to have the founders of Applied Intuition, Qasar and Peter. Welcome. Qasar [00:00:17]: You guys really know how to turn it on to podcast mode. That was, you guys are real pros at this. Qasar [00:00:23]: They were just joking around right before this, and then they flipped it pretty quick. Alessio [00:00:29]: Oh, yeah, it’s good to have you guys. Maybe you just wanna introduce yourself so people know the voice on the mic and they’ll know what they’re hearing. Peter [00:00:33]: Oh, sure. Yeah, I’m Peter Ludwig. I’m the co-founder and CTO of Applied Intuition. Qasar [00:00:38]: And my name is Qasar Younis. I am the CEO and co-founder with Peter. Alessio [00:00:42]: Nice. Can you guys give the high-level overview of what Applied Intuition is? And I was reading through some of the Congress files, when you went out there, Peter, and eighteen of the top twenty global non-Chinese automakers, you two guys, you have customers in agriculture, defense, construction. I think most people have heard of Applied Intuition tied to YC when it was first started, and then you were kinda in stealth for a long time, so maybe just give people the high-level overview of what it is today, and then we’ll dive into the different pieces. Peter [00:01:10]: Yeah. So at Applied Intuition, our mission is to build physical AI for a safer, more prosperous world. And so we work on physical AI for all different types of moving systems, everything from cars to trucks to construction and mining equipment, to defense technologies. And we’re a true technology company, so we build and sell the technology, and we sell it to the companies that make the machines. We sell it to the government, really anyone that wants to buy a technology to make machines smart. Physical AI vs. Screen AI: Why Safety-Critical Changes Everything Qasar [00:01:38]: Yeah. And I think in the broader AI landscape, a lot of the focus, rightfully so in the last, three years has been on large language models, and so everything fits in a screen. Like, whether it’s code complete products or things like that. And what’s different about us is we’re deploying intelligence onto a lot of things that don’t have screens. they’re physical machines. There are sometimes screens within the cabin or for example of a car or a truck or something like that, but most of the value we provide is putting intelligence that is in safety critical environments. So that those two words are really important because learn systems can make mistakes if you’re asking for, like, some, so something like, “Tell me about these podcast hosts Qasar [00:02:28]: that I’m about to go meet.” But you can’t do that obviously when you run, like, as an example, we run driverless trucks in Japan right now, as we speak. We can’t have errors. Those are L4 trucks. Yeah. Alessio [00:02:40]: Yeah. Was that always the mission? I remember initially, I think people put you and Scale AI very similarly for some things about being kinda like on the data infrastructur

    1h 12m
  3. APR 23

    AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

    Today, we check in a year after the first Unsupervised Learning x Latent Space Crossover special to discuss everything that has changed (there is a lot) in the world of AI. This episode was recorded just after AIE Europe, but before the Cursor-xAI deal. Unsupervised Learning is a podcast that interviews the sharpest minds in AI about what’s real today, what will be real in the future and what it means for businesses and the world - helping builders, researchers and founders deconstruct and understand the biggest breakthroughs. Thanks to Jacob and the UL production team for hosting and editing this! Jacob Effron * LinkedIn: https://www.linkedin.com/in/jacobeffron/ * X: https://x.com/jacobeffron Full Episode on Their YouTube We discuss: * swyx’s view from the center of the AI engineering zeitgeist: OpenClaw, harness engineering, context engineering, evals, observability, GPUs, multimodality, and why conference tracks now reveal what matters most in AI * Whether AI infrastructure has finally stabilized: why “skills” may be the minimal viable packaging format for agents, why infra companies have had to reinvent themselves every year, and why application companies have had an easier time surviving model volatility * The vertical vs. horizontal AI startup debate: why application companies can act as the outsourced AI team for enterprises, why some horizontal companies still matter, and why sandboxes may be the clearest reinvention of classic cloud infrastructure for the AI era * The “agent lab” playbook: starting with frontier models, specializing for your domain, then training your own models once you have enough data, workload, and user behavior to justify the cost and latency savings * Why domain-specific model training is real, not just marketing: how companies like Cursor and Cognition can get users to choose their in-house models, and why search, domain specialization, and distillation are becoming more important * Open models, custom chips, and alternative inference infrastructure: why swyx has turned more bullish on open source, why non-NVIDIA hardware is suddenly getting real attention, and why every 10x speedup can unlock new product experiences * What it means to sell to agents instead of humans: why agent experience may mostly just be good developer experience by another name, why APIs and docs matter more than ever, and how pretraining-data incumbents are compounding advantages in an agent-first world * Why memory and personalization may become the next big wedge: today’s models mostly reward frequency of mentions, but in the future, swyx expects product choice to be shaped much more by personalized memory systems * The state of the AI coding wars: why coding has become one of the largest and fastest-growing categories in AI, how Anthropic, OpenAI, Cursor, and Cognition have all ridden the wave, and why the category may still have more room to run * Capability exploration vs. efficiency: why the industry is still in a token-maxing, experiment-heavy phase where people are rewarded for spending more rather than less * Claude Code vs. Codex and the strange stickiness of coding products: why first magical product experiences may matter more than expected, and why the bigger mystery may be why only a few names have emerged as real winners so far * What the end state of the coding market might look like: two major players, a longer tail of niche products, and possible disruption if Microsoft, Mistral, xAI, or the Chinese labs push harder into coding * Where application companies still have room against the labs: why frontier labs are trying to expand into verticals like finance and healthcare, but still leave space for focused companies that own the workflow and the last mile * Why coding may be a preview of every other AI market: the first category to truly go parabolic, the clearest example of foundation model companies colliding with application companies, and a template for how future vertical AI markets may develop * Why AI valuations now feel unbounded: from billion-dollar ARR products built in a year to trillion-dollar market caps, swyx and Jacob unpack how the AI market has broken traditional startup intuitions about scale and durability * Consumer AI vs. coding AI: why ChatGPT’s consumer category may have plateaued on frequency and product design, while coding continues to feel like a daily-use category with real momentum * The next product frontier beyond coding: consumer agents, computer use, and “coding agents breaking containment,” with swyx’s thesis that 2025 was the year of coding agents and 2026 may be the year they begin to do everything else * Whether foundation models are really killing startup categories: why swyx is less worried for early founders, more worried for mid-size startups and traditional SaaS, and why building something ambitious may now be the best job interview for a frontier lab * AI vs. SaaS and the internal culture war around adoption: the tension between AI-native employees who want to rip out expensive software and skeptics who think quick AI-built replacements create fragile systems * Why traditional SaaS may be under real pressure: swyx’s own experience spending six figures on event and sponsor management software, the temptation to rebuild it cheaply with AI, and the broader question of whether teams will trust custom AI-native replacements * Biosafety, security, and frontier model access: why swyx raised biosafety at a dinner with Anthropic’s Mike Krieger, why Krieger argued security is the bigger issue, and what restricted model releases reveal about Anthropic vs. OpenAI * The era of giant models: why 10T+ parameter systems may only be a temporary rationing phase before bigger clusters arrive, why labs may increasingly keep their most powerful models private for distillation, and why scale alone no longer feels like a complete answer * Memory as the slowest scaling factor in AI: why context windows have improved far more slowly than people hoped, why million-token context still has not changed most real workflows, and why memory may be the key bottleneck for the next generation of systems * What swyx changed his mind on in the past year: becoming more bullish on open models, more convinced that the top tier of agent startups behaves very differently from the median AI company, and more optimistic about fine-tuning and specialized model adaptation * “Dark factories” and zero-human-review coding: the next frontier after zero human-written code, where models not only write the code but ship it without human review, forcing companies to rethink testing and verification from first principles * Why RL and post-training may matter more than people assumed: even if the resulting models get thrown out every few months, the data, workflows, and domain-specific improvements persist * Synthetic rubrics, Doctor GRPO, and multi-turn RL: why reinforcement learning is becoming much more domain-specific and multi-step than many people realize, opening the door to much deeper customization * The next frontier after coding: memory, personalization, and world models, including why swyx thinks world models matter not just for robotics or gaming, but for giving AI something closer to lived understanding * Fei-Fei Li, spatial intelligence, and the Good Will Hunting analogy: the idea that today’s LLMs may know everything by reading it all, but still lack the lived experience that turns knowledge into a deeper kind of intelligence Timestamps * 00:00:00 Intro preview: AI coding wars, startup pressure, and market structure * 00:00:28 Welcome to the Latent Space × Unsupervised Learning crossover * 00:01:17 What AI builders are focused on now: OpenClaw, harnesses, and infra * 00:04:33 Why AI infra is harder than apps, and where startups can still win * 00:06:39 Should companies train their own models? * 00:09:28 Open models, custom chips, and the new inference race * 00:11:25 Designing products for agents, not just humans * 00:16:49 The state of the AI coding wars in 2026 * 00:19:27 Capability exploration, token-maxing, and why coding is going parabolic * 00:21:41 What the end state of the coding market could look like * 00:23:50 Where app companies still have room against the labs * 00:27:02 Why AI valuations and market swings feel unprecedented * 00:28:56 Consumer AI vs. coding AI, and why sticky products still matter * 00:32:28 What the next breakthrough product experience might be * 00:32:53 2026 thesis: coding agents break containment and eat the world * 00:35:27 Are foundation models wiping out startup categories? * 00:37:33 AI vs. SaaS, vibe coding, and internal team tensions * 00:40:01 Biosafety, security, and the politics of restricted model releases * 00:42:19 Giant models, compute constraints, and the limits of scale * 00:44:30 Memory as the real bottleneck in AI * 00:44:57 Why swyx changed his mind on open models * 00:47:44 Dark factories and the future of zero-human-review coding * 00:49:36 Why post-training and RL may matter more than people think * 00:51:50 Memory, world models, and the next frontier of intelligence * 00:53:54 The Good Will Hunting analogy for LLMs * 00:54:21 Outro Transcript [00:00:00] swyx: Isn’t that crazy? That number is just mind boggling. [00:00:03] Jacob Effron: What is the state of the AI coding wars today? [00:00:05] swyx: We’re in a phase of sort of like capability exploration. The general thesis that I have been pursuing now is that the same way that 2025 was a year coding agents 2026 is coding agents breaking containments to do everything else. [00:00:16] Jacob Effron: Do you worry about the foundation models just getting into a bunch of these startup categories? [00:00:21] swyx: Mid-size startups. Yes. [00:00:23] Jacob Effron: What do you think the end state of this market is [00:00:25] swyx: for the market structure to, to significantly change? There would be [00:00:28] Jacob Effron: today on unsupervised lea

    55 min
  4. APR 22

    Shopify’s AI Phase Transition: 2026 Usage Explosion, Unlimited Opus-4.6 Token Budget, Tangle, Tangent, SimGym — with Mikhail Parakhin, Shopify CTO

    Early bird discounts for the San Francisco World’s Fair, the biggest AIE gathering of the year, end today - prices will go up by ~$500 tonight so do please lock in ASAP! From near-universal AI tool adoption inside Shopify to internal systems for ML experimentation, auto-research, customer simulation, and ultra-low-latency search, Mikhail Parakhin joins us for a deep dive into what it actually looks like when a 20-year-old, $200B software company goes all-in on AI. We cover why Shopify has become much more vocal about its internal stack, what changed after the December model-quality inflection, and why the real bottleneck in AI coding is no longer generation, but review, CI/CD, and deployment stability. We also go inside Tangle, Tangent, SimGym, which are three major AI initiatives that Shopify is doing to make experimentation reproducible, optimization automatic, customer behavior simulatable, and search and catalog intelligence faster and cheaper at scale. Along the way, Mikhail explains UCP, Liquid AI, and why token budgets are directionally right but often measured badly, why AI-written code can still increase bugs in production, what makes Shopify’s customer simulation defensible, and what he learned from the Sydney era at Bing. We discuss: * Mikhail’s path from running a major Microsoft business unit spanning Windows, Edge, Bing, and ads to becoming CTO of Shopify * Why Shopify is talking more publicly about AI now, and why staying at the frontier has become necessary for the company * Shopify’s internal AI adoption curve, the December inflection, and why CLI-style tools are rising faster than traditional IDE-based tools * Why Jensen Huang is directionally right on token budgets, but raw token count is still the wrong way to evaluate engineering output * Why the real unlock is not more agents in parallel, but better critique loops, stronger models, and spending more on review than generation * Why AI coding can still lead to more bugs in production even if models write cleaner code on average than humans * Why Shopify built its own PR review flow, and why Mikhail thinks most off-the-shelf review tools miss the point * How PR volume, test failures, and deployment rollback are becoming the real bottlenecks in the agent era * Why Git, pull requests, and CI/CD may need a new metaphor once code is written at machine speed * What Tangle is, and how Shopify uses it to make ML and data workflows reproducible, collaborative, and production-ready from the start * Why Tangle is different from Airflow, and why content-addressed caching creates network effects across teams * What Tangent is, and how Shopify is using auto-research loops to optimize search, themes, prompt compression, storage, and more * Why Tangent is becoming a democratizing tool for PMs and domain experts, not just ML engineers * Why AutoML finally feels real in the LLM era, and where auto-research still falls short today * Why Tangle, Tangent, and SimGym become much more powerful when combined into one system * What SimGym is, why simulated customers only work if you have real historical behavior, and why Shopify’s data gives it a moat * How SimGym evolved from comparing A/B variants to telling merchants what to change on a single live storefront to raise conversions * Why customer simulation is so expensive, from multimodal models to browser farms to serving and distillation costs * How Shopify models merchant and buyer trajectories, runs counterfactuals, and thinks about interventions like discounts, campaigns, and notifications * Why category-level behavior is so different across commerce, and why ideas like Chinese Restaurant Processes are showing up again in practice * Shopify’s new UCP and catalog work, including runtime product search, bulk lookups, and identity linking * Why Shopify is using Liquid AI, and why Mikhail sees it as the first genuinely competitive non-transformer architecture he has used in practice * Where Liquid already works inside Shopify today, from low-latency query understanding to large-scale catalog and Sidekick Pulse workloads * Whether Liquid could become frontier-scale with enough compute, and why Shopify remains pragmatic and merit-based about model choice * Who Shopify is hiring right now across ML, data science, and distributed databases * The Sydney story at Bing, why its personality was not an accident, and what Mikhail learned from deliberately shaping AI character early on Mikhail Parakhin * LinkedIn: https://www.linkedin.com/in/mikhail-parakhin/ * X: https://x.com/MParakhin Timestamps 00:00:00 Introduction: Mikhail Parakhin, Microsoft, and Shopify 00:01:16 Why Shopify Is Talking More About AI 00:02:29 Internal AI Adoption at Shopify and the December Inflection 00:06:54 Token Budgets, Jensen Huang, and Why Usage Metrics Can Mislead 00:10:55 Why Shopify Built Its Own AI PR Review System 00:12:38 AI Coding, More Bugs, and the Real Deployment Bottleneck 00:14:11 Why Git, PRs, and CI/CD May Need to Change for Agents 00:18:24 Tangle: Shopify’s Reproducible ML and Data Workflow Engine 00:21:19 Why Tangle Is Different from Airflow 00:26:14 Tangent: Auto Research for Optimization and Experimentation 00:30:07 How Tangent Democratizes Experimentation Beyond ML Engineers 00:33:06 The Limits of Auto Research 00:36:36 Why Tangle, Tangent, and SimGym Compound Together 00:37:20 SimGym: Simulating Customers with Shopify’s Historical Data 00:42:47 The Infra Behind SimGym 00:46:00 Why SimGym Gets Better with Real Customer History 00:47:30 Counterfactuals, HSTU, and Modeling Merchant Trajectories 00:51:55 CRPs, Clustering, and Category-Level Customer Behavior 00:53:30 UCP, Shopify Catalog, and Identity Linking 00:55:07 Liquid AI: Why Shopify Uses Non-Transformer Models 00:59:13 Real Shopify Use Cases for Liquid 01:03:00 Can Liquid Scale into a Frontier Model? 01:09:49 Hiring at Shopify: ML, Data Science, and Databases 01:10:43 Sydney at Bing: Personality Shaping and AI Character 01:13:32 Closing Thoughts Transcript [00:00:00] swyx: Okay. We’re here in the studio, a remote studio, with Mikhail Parakhin, CTO of Shopify. Welcome. [00:00:08] Mikhail Parakhin: Thank you. Welcome. [00:00:10] swyx: I don’t even know if I should introduce you as CTO of Shopify. I feel like you have many identities. Uh, you led sort of the, the Bing ML team, I guess, uh, uh, or ads team. I, I don’t know, I don’t know, uh, you know, it’s, uh, people va-variously refer you as like CEO or, or, uh, I don’t know what that, that, that said previous role at Microsoft was. [00:00:29] Mikhail Parakhin: Uh, that was... Yeah, my previous role w- at Microsoft was the-- I actually was the CEO of one of Microsoft’s business units, which included, as I, you know, as we discussed, all the things that people like to laugh about, uh, including Windows and Edge and Bing and ads and everything. [00:00:47] swyx: Yeah, yeah. What a, what a, what a wild time. You’ve obviously, uh, done a lot since you landed at Shopify. Uh, one of the reasons I reached out was because you started promoting more sort of internal tooling, uh, primarily Tangle, but also a lot of people have seen and adopted Tobi’s QMD, uh, and obviously, I think, uh, Shopify has always been sort of leading in terms of, uh, engineering. I think more-- it’s just more recent that you guys have been more vocal about your sort of AI adoption. Is that, is that true? [00:01:16] Mikhail Parakhin: Well, I think AI tools in general are fairly recent development, uh, and we’ve-- Shopify, you know, at this stage of its development, we’re developing AI in-in-house and other, uh, building tools that use AI and, you know, interfacing with the wider AI community, uh, you know, are on the sort of the, uh, runaway trajectory. So it just did by sort of natural byproduct. We, we talk about it more also. We just, uh, just even yesterday, Andrej Karpathy was famous in tweeting about, oh, are there some, uh, ways, uh, that, that you can organize your agents to store the data and then, uh, look up the data so that you don’t have to research or, or lose context every- Yes time. And a little bit tongue in cheek, I tweeted that, “Hey, we’ve, we’ve done it much earlier, and we even have different approaches, Tobi and I.” Tobi, of course, is a big fan of QMD, and I’m more of a SQL, SQLite fan. But, uh, yeah, very similar things that we’ve already done here. The point is, yeah, we’re very dynamic, you know, explosively growing company, and we have to be at the forefront of AI adoption, obviously. [00:02:29] swyx: Yeah. Yeah. Um, you, your team kindly prepared some slides actually that we were gonna bring up on to, uh, the screen. I think I can, I can screen share, and then we can kind of go through some of the shocking stats that maybe, maybe put some numbers to what exactly is going on. So here we have, uh- An internal AI tool adoption chart. What are we looking at here? What ? [00:02:54] Mikhail Parakhin: Yeah, this is very interesting statistics. Uh, this is number of daily active workers, you know, think of, uh, DAO, basically the active users of- [00:03:05] swyx: Yeah ... [00:03:05] Mikhail Parakhin: AI tool as a percentage of all the people in the company, right? And then- Yeah ... different AI tools. And, uh, you could see two things here is that one is the green is total. Uh, green is just total. So you could see that it approaches really % by now. It’s hard not to do your job now without interacting deeply, at least with one tool. You could see another interesting thing is just as many people commented in December was the phase transition when suddenly models gotten good enough that, that everything took off and started growing. Uh, it, it was many people noticed that the thing is that small improvements accumulated into this big change in Sep- December roughly timeframe. [00:03:52] swyx: Yeah. [00:03:52] Mikhail Parakhin: The other thing I would claim you could see is tha

    1h 12m
  5. APR 20

    🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

    Today, we explain this piece of “clickbait” from our guest! TL;DR: 95% of cancer treatments fail to pass clinical trials, but it may be a matching problem — if we better understood what patients have which tumors which will respond to which treatments, success rates improve dramatically and millions of lives can be saved — with the treatments we ALREADY have. See our full episode dropping today: Why Big Pharma is licensing AI Models Tolstoy famously wrote, ‘All healthy cells are alike; each cancer cell is unhappy in its own way.’ Or something like that. Cancer might be the most misunderstood disease out there. It’s not one disease, it’s a family of diseases. Hundreds, maybe thousands, of unique diseases each with its own underlying biology. With this lens, saying you’ll “cure cancer” is like saying you’ll solve legos. We keep hearing AI will cure cancer, but sadly it may not be so easy. Today’s guests — Ron Alfa and Daniel Bear from Noetik — thinks they can use AI to break through a core bottleneck in the treatment development process. GSK recently signed a $50M deal for their technology that also includes an (undisclosed) long-term licensing deals for Noetik’s models like the recently announced TARIO-2, an autoregressive transformer trained on one of the largest sets of tumor spatial transcriptomics datasets in the world. Whole-plex spatial transcriptomics is the richest way to read a tumor, and approximately ~0% of cancer patients going through standard care ever get one — and TARIO-2 can now predict an ~19,000-gene spatial map from the H&E assay every patient already has. Most big AI plays in BioTech have focused on discovery, and usually result in an in-house development effort (meaning tools companies usually become drug companies). This deal stands out in that it is a software licensing deal, and represents a commitment to a platform rather than a drug. With attention on other software tools for drug development (see the Boltz episode and Isomorphic for example), it is starting to look like the appetite of Pharma for biotech tools has finally started to grow. Why the sudden interest? Cancer is hard Biology is hard, cancer is harder. But despite this, we’ve made incredible progress. So many cancers that would have been death sentences twenty years ago are routinely survivable. It used to be our main strategy was just chemotherapy — poison you and hope the tumor dies before you do. Now, there are many treatments that actually kill a tumor and leave the rest of you intact! Immune checkpoint inhibitors like Keytruda and Opdivo target the defenses of dozens of tumor types. CAR-T therapy adds modified T-cells to your blood that can target B-cell malignancies very accurately. Antibody Drug Conjugates such as Trastuzumab combine a drug with an antibody, allowing it to target very specific (cancer) cells. We truly live in marvelous times. With that said, we still have a long way to go. For every type of cancer with a miracle treatment, we have many more that are still death sentences. The world spends $20-30 billion a year trying to cure cancers, with hundreds of clinical trials yearly.Yet, progress is slow with a 95% failure rate in clinical trials. The lab doesn’t translate to the clinic Are we leaving something on the table? Enter Noetik and Ron Alfa. Ron’s core thesis is that many of these “failed” treatments actually work! But we’re not looking at the right patients with the right tumors. If only we had a way to really understand the unique types of cancer biologies and which patients will respond to which treatments, we might be able to show a much higher success rate. Millions of lives (and billions of dollars) may ride on this. The Hard part: Blind Faith in Data Collection Ron and Noetik had the conviction to spend almost two years just collecting data. Lots, and lots, and lots, of data. Noetik has acquired thousands of actual human tumors, and collects a large multimodal dataset of hundreds of millions of images that allows them to create a detailed map of the cell makeup in the local environment. These are real human tumors, not frankenstein mouse models or immortal cell lines. This data is then fed into a massive self-supervised model, creating a “virtual cell”. This model has a deep understanding of cancer biology — Noetik has worked carefully to show it can distinguish different types of tumors. Maybe even tumors we didn’t identify as distinct previously! More recently they figured out how to scale up their model and data, and see no limit in their scaling laws! Noetik’s models can simulate how a patient will respond to experimental treatments. They are working with partners to test promising drugs that were demonstrated to be safe, but not effective. If these models work as hoped, Noetik will bring new cancer treatments to patients without developing a new drug! Their models will also guide the discovery process towards drugs that are more likely to make it through clinical trials. You can imagine why this is so attractive to GSK. We’ll see… Ron and Dan make pretty persuasive arguments that their models will truly assist in cohort selection in useful ways and this seems valuable. And we think it’s pretty clear that * Translation from lab to clinic is the biggest bottleneck for drug development. * Better cohort selection using biomarkers is likely to improve translation from lab to clinic. Noetik has already had some success here. We’ll see if they’re able to translate that into a reliable advantage. Stepping back a bit from the technology, curing cancer is a pretty unambiguously positive application of AI. It is also a very hard problem to solve. Our guess is that most people have been impacted by cancer or will be at some point soon. And we hope that learning about the amazing work that companies like Noetik are doing will inspire a generation of AI engineers to work on the hardest and most exciting problems that society faces. Full Video Pod: This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe

    1h 25m
  6. APR 15

    Notion’s Token Town: 5 Rebuilds, 100+ Tools, MCP vs CLIs and the Software Factory Future — Simon Last & Sarah Sachs of Notion

    For all those who missed out on London, see you in Miami next week! Notion, the knowledge work decacorn, has been building AI tooling since before ChatGPT, with many hits from Q&A in 2023 and unified AI in 2024 and Meeting Notes in 2025. At the end of their last Make user conference, Ryan Nystrom teased Notion 3.0’s Custom Agents - and they are finally embracing the Agent Lab playbook! Sarah Sachs and Simon Last of Notion join us for a deep dive into how Notion built Custom Agents, why it took years and multiple rebuilds to get right, and what it means to turn a productivity tool into an agent-native system of record for enterprise work. We go inside the product, engineering, evals, pricing, and org design decisions behind one of the most ambitious AI product efforts in software today — from early failed tool-calling experiments in 2022 to agent harnesses, progressive tool disclosure, meeting notes as data capture, and the long-term vision for software factories and agentic work. We discuss: * Sarah and Simon’s path to launching Notion Custom Agents, and why the feature was rebuilt four or five times before it was ready for production * Why early agent attempts failed: no tool-calling standard, short context windows, unreliable models, and too much complexity exposed to the model * The “Agent Lab” thesis: not just wrapping a model, but understanding how people collaborate and building the right product system around frontier capabilities * How Notion thinks about roadmap timing: not swimming upstream against model limitations, but also building early enough that the product is ready when the models are * Why coding agents feel like the kernel of AGI, and how Notion is thinking about “software factories” made up of agents that spec, code, test, debug, review, and maintain codebases together * How Sarah runs AI engineering at Notion (“notes from Token Town”): objective-setting over idea ownership, low-ego teams comfortable deleting their own work, and a culture designed to swarm around fast-changing opportunities * The “Simon Vortex,” company hackathons, and why security gets pulled in early rather than late * How Notion organizes AI: core AI capabilities and infrastructure, product packaging teams, and a broader company mandate that every product surface must increasingly work for both humans and agents * Why prototypes have become much easier to build internally, and how “demos over memos” changes product development inside a tool the whole company already uses every day * Notion’s eval philosophy: regression tests, launch-quality evals, and “frontier/headroom” evals that intentionally only pass ~30% of the time so the company can see where model capabilities are going * What a “Model Behavior Engineer” is, and why Notion treats eval writing, failure analysis, and model understanding as a distinct function rather than just software engineering * The changing role of software engineers in the age of coding agents, and why the new job looks less like typing code and more like supervising a rigorous outer system of agents, PRs, and verification loops * How the “software factory” should work: specs, self-verification, bug flows, subagents, and minimizing human intervention while preserving the invariants that matter * A live walkthrough of a Notion Custom Agent handling coworking space tenant applications by triaging email, enriching applicants with web search, and writing structured data into a Notion database * How agents compose inside Notion: shared databases as primitives, agents invoking other agents, “manager agents” supervising dozens of specialized agents, and memory implemented simply as pages and databases * Notion’s take on MCP vs CLI: why Simon is bullish on CLI’s self-debugging nature, where MCP still makes sense, and how Sarah thinks about capability, determinism, permissioning, and pricing alignment * The evolution of Notion’s internal agent harness: from early JavaScript coding agents, to custom XML, to Markdown and SQL-like abstractions, to tool definitions, progressive disclosure, and a much shorter system prompt * Why Notion cares about teaching “the top of the class,” building for sophisticated operators rather than abstracting away too much capability for everyone * How agent setup works today: agents that can configure themselves, inspect their own failures, and edit their own instructions — with guardrails around permissions * How Notion prices Custom Agents: credits as an abstraction over tokens, model type, serving tier, web search, and future sandbox costs; why usage-based pricing was necessary; and how “auto” tries to match the right model to the right task * Why Notion is not eager to train a foundation model, where they do fine-tune and optimize today, and why retrieval/ranking is one of the most important investment areas as more searches come from agents rather than humans * Why Meeting Notes became one of Notion’s strongest growth loops: not just as transcription, but as high-signal data capture that powers search, custom agents, follow-up workflows, and the broader system of record for company collaboration * Why Notion is more interested in being the place where collaboration data lives than in building hardware themselves — and how wearables or other capture devices may eventually feed into that system Sarah SachsLinkedIn: https://www.linkedin.com/in/sarahmsachsX: https://x.com/sarahmsachs Simon LastLinkedIn: https://www.linkedin.com/in/simon-last-41404140X: https://x.com/simonlast Full Video Episode Timestamps * 00:00:00 Introduction and launching Notion Custom Agents * 00:01:17 Why Notion rebuilt agents four or five times * 00:03:35 Building for where models are going, not just where they are * 00:05:32 The Agent Lab thesis, wrappers, and product intuition * 00:08:07 User journeys, leadership, and low-ego AI teams * 00:13:16 The Simon Vortex, hackathons, and bringing security in early * 00:16:39 Team structure, demos over memos, and building for agents * 00:20:25 Evals, Notion’s Last Exam, and the Model Behavior Engineer role * 00:27:37 Evals as an agent harness and the changing role of software engineers * 00:30:42 The software factory: specs, verification, and agent workflows * 00:32:18 Live demo: a custom agent for coworking space applications * 00:35:08 Composing agents, manager agents, and memory as pages * 00:38:15 Notion Mail, Gmail, native integrations, and tools * 00:39:43 MCP vs CLI and the cost of capability * 00:44:13 When Notion uses MCP vs building its own integrations * 00:47:43 The history of Notion’s agent harness rebuilds * 00:55:35 Power users, public tools, and the setup agent * 00:58:01 Self-fixing agents, permissions, and “flippy” * 01:01:13 Pricing, credits, and choosing the right model automatically * 01:09:01 Why Notion isn’t training its own frontier model * 01:14:07 Retrieval, ranking, and search built for agents * 01:17:27 Meeting Notes as data capture and workflow automation * 01:21:18 Wearables, hardware, and Notion as the system of record * 01:23:45 Outro Transcript [00:00:00] Alessio: Hey everyone. Welcome to the Latent Space podcast. This is Alessio founder of Kernel Labs and I’m joined by swyx, editor of the Latent Space. [00:00:11] swyx: Hello. Hello. We’re back in the beautiful studio that, uh, Alessio has set up for us with Simon and Sarah from Notion. Welcome. [00:00:18] Sarah Sachs: Thanks for having us. [00:00:19] Alessio: Thanks for having us. Yeah. [00:00:20] swyx: Congrats on the launch recently the custom agents, finally it’s here. How’s it feel? [00:00:26] Sarah Sachs: We ship things slowly. So it had been in Alpha for a little bit and at the point at which is it’s an alpha, um, there’s a group of people that are making sure it’s ready for prod, and then there’s a group of people working on the next thing. So sometimes some of these launches are a bit delayed satisfaction, so it’s quite nice to remind yourself all the work you did because we do have a habit of like. Being two or three milestones ahead. Uh, just ‘cause you have to be, you know, you can’t get complacent. Um, but it’s been great that people understood how this is helpful. And I think that’s just easier in general building AI tools today than it was two, three years ago. People kind of get it and so that user education, um, there’s just, it was our most successful launch in terms of free trials and converting people and things like that. It was really successful, so yeah. But there’s a lot to build. [00:01:12] swyx: Making it free for three months helps. [00:01:16] Sarah Sachs: Yep. [00:01:17] Simon Last: It was definitely super exciting for me because it’s probably the fourth or fifth time that we rebuilt that. [00:01:22] swyx: Yes. [00:01:23] Simon Last: And I mean, [00:01:24] swyx: you’ve been building this since like 20, 22. [00:01:26] Simon Last: Yeah, I mean, like, it was even right when we got access to like GPT four in late 20 22, 1 of the first ideas we had is like, oh, okay, let’s make an agent that I, we used the word assistant at the time, there wasn’t really the word, the word agent yet, but, oh, we’ll give an access to all the tools the notion can do, and then it, we run in the background like, like do work for us. And then we just tried that many times and it just. Was too early. Um, [00:01:48] swyx: I need to force you to like double click on that. What is too early? What didn’t work? [00:01:52] Sarah Sachs: We were fine to, like, before function calling came out. We were trying to fine tune with the Frontier Labs and with fireworks, like a function calling model on notion functions. This is right when I joined. I joined because, um, we needed a manager as Simon was needed to be able to go on vacation. So, uh, that’s, that’s around when I joined, so you can speak much more to it. [00:02:11] Simon Last: Yeah, we did partnerships wit

    1h 17m
  7. APR 7

    Extreme Harness Engineering for Token Billionaires: 1M LOC, 1B toks/day, 0% human code, 0% human review — Ryan Lopopolo, OpenAI Frontier & Symphony

    We’re proud to release this ahead of Ryan’s keynote at AIE Europe. Hit the bell, get notified when it is live! Attendees: come prepped for Ryan’s AMA with Vibhu after. Move over, context engineering. Now it’s time for Harness engineering and the age of the token billionaires. Ryan Lopopolo of OpenAI is leading that charge, recently publishing a lengthy essay on Harness Eng that has become the talk of the town: In it, Ryan peeled back the curtains on how the recently announced OpenAI Frontier team have become OpenAI’s top Codex users, running a >1m LOC codebase with 0 human written code and, crucially for the Dark Factory fans, no human REVIEWED code before merge. Ryan is admirably evangelical about this, calling it borderline “negligent” if you aren’t using >1B tokens a day (roughly $2-3k/day in token spend based on market rates and caching assumptions): Over the past five months, they ran an extreme experiment: building and shipping an internal beta product with zero manually written code. Through the experiment, they adopted a different model of engineering work: when the agent failed, instead of prompting it better or to “try harder,” the team would look at “what capability, context, or structure is missing?” The result was Symphony, “a ghost library” and reference Elixir implementation (by Alex Kotliarskyi) that sets up a massive system of Codex agents all extensively prompted with the specificity of a proper PRD spec, but without full implementation: The future starts taking shape as one where coding agents stop being copilots and start becoming real teammates anyone can use and Codex is doubling down on that mission with their Superbowl messaging of “you can just build things”. Across Codex, internal observability stacks, and the multi-agent orchestration system his team calls Symphony, Ryan has been pushing what happens when you optimize an entire codebase, workflow, and organization around agent legibility instead of human habit. We sat down with Ryan to dig into how OpenAI’s internal teams actually use Codex, why the real bottleneck in AI-native software development is now human attention rather than tokens, how fast build loops, observability, specs, and skills let agents operate autonomously, why software increasingly needs to be written for the model as much as for the engineer, and how Frontier points toward a future where agents can safely do economically valuable work across the enterprise. We discuss: * Ryan’s background from Snowflake, Brex, Stripe, and Citadel to OpenAI Frontier Product Exploration, where he works on new product development for deploying agents safely at enterprise scale * The origin of “harness engineering” and the constraint that kicked off the whole experiment: Ryan deliberately refused to write code himself so the agent had to do the job end to end * Building an internal product over five months with zero lines of human-written code, more than a million lines in the repo, and thousands of PRs across multiple Codex model generations * Why early Codex was painfully slow at first, and how the team learned to decompose tasks, build better primitives, and gradually turn the agent into a much faster engineer than any individual human * The obsession with fast build times: why one minute became the upper bound for the inner loop, and how the team repeatedly retooled the build system to keep agents productive * Why humans became the bottleneck, and how Ryan’s team shifted from reviewing code directly to building systems, observability, and context that let agents review, fix, and merge work autonomously * Skills, docs, tests, markdown trackers, and quality scores as ways of encoding engineering taste and non-functional requirements directly into context the agent can use * The shift from predefined scaffolds to reasoning-model-led workflows, where the harness becomes the box and the model chooses how to proceed * Symphony, OpenAI’s internal Elixir-based orchestration layer for spinning up, supervising, reworking, and coordinating large numbers of coding agents across tickets and repos * Why code is increasingly disposable, why worktrees and merge conflicts matter less when agents can resolve them, and what it really means to fully delegate the PR lifecycle * “Ghost libraries”, spec-driven software, and the idea that a coding agent can reproduce complex systems from a high-fidelity specification rather than shared source code * The broader future of Frontier: safely deploying observable, governable agents into enterprises, and building the collaboration, security, and control layers needed for real-world agentic work Ryan Lopopolo * X: https://x.com/_lopopolo * Linkedin: https://www.linkedin.com/in/ryanlopopolo/ * Website: https://hyperbo.la/contact/ Timestamps 00:00:00 Introduction: Harness Engineering and OpenAI Frontier00:02:20 Ryan’s background and the “no human-written code” experiment00:08:48 Humans as the bottleneck: systems thinking, observability, and agent workflows00:12:24 Skills, scaffolds, and encoding engineering taste into context00:17:17 What humans still do, what agents already own, and why software must be agent-legible00:24:27 Delegating the PR lifecycle: worktrees, merge conflicts, and non-functional requirements00:31:57 Spec-driven software, “ghost libraries,” and the path to Symphony00:35:20 Symphony: orchestrating large numbers of coding agents00:43:42 Skill distillation, self-improving workflows, and team-wide learning00:50:04 CLI design, policy layers, and building token-efficient tools for agents00:59:43 What current models still struggle with: zero-to-one products and gnarly refactors01:02:05 Frontier’s vision for enterprise AI deployment01:08:15 Culture, humor, and teaching agents how the company works01:12:29 Harness vs. training, Codex model progress, and “you can just do things”01:15:09 Bellevue, hiring, and OpenAI’s expansion beyond San Francisco Transcript Ryan Lopopolo: I do think that there is an interesting space to explore here with Codex, the harness, as part of building AI products, right? There’s a ton of momentum around getting the models to be good at coding. We’ve seen big leaps in like the task complexity with each incremental model release where if you can figure out how to collapse a product that you’re trying to. Build a user journey that you’re trying to solve into code. It’s pretty natural to use the Codex Harness to solve that problem for you. It’s done all the wiring and lets you just communicate in prompts. To let the model cook, you have to step back, right? Like you need to take a systems thinking mindset to things and constantly be asking, where is the Asian making mistakes? Where am I spending my time? How can I not spend that time going forward? And then build confidence in the automation that I’m putting in place. So I have solved this part of the SDLC. swyx: [00:01:00] All right. [00:01:03] Meet Ryan swyx: We’re in the studio with Ryan from OpenAI. Welcome. Ryan Lopopolo: Hi, swyx: Thanks for visiting San Francisco and thanks for spending some time with us. Ryan Lopopolo: Yeah, thank you. I’m super excited to be here. swyx: You wrote a blockbuster article on harness engineering. It’s probably going to be the defining piece of this emerging discipline, huh? Ryan Lopopolo: Thank you. It is it’s been fun to feel like we’ve defined the discourse in some sense. swyx: Let’s contextualize a little bit, this first podcast you’ve ever done. Yes. And thank you for spending with us. What is, where is this coming from? What team are you in all that jazz? Ryan Lopopolo: Sure, sure. Ryan Lopopolo: I work on Frontier Product Exploration, new product development in the space of OpenAI Frontier, which is our enterprise platform for deploying agents safely at scale, with good governance in any business. And. The role of VMI team has been to figure out novel ways to deploy our models into package and products that we can sell as solutions to enterprises. swyx: And you have a background, I’ll just squeeze it in there. Snowflake, brick, [00:02:00] stripe, citadel. Ryan Lopopolo: Yes. Yes. Same. Any kind of customer swyx: entire life. Yes. The exact kind of customer that you want to, Vibhu: so I’ll say, I was actually, I didn’t expect the background when I looked at your Twitter, I’m seeing the opposite. Stuff like this. So you’ve got the mindset of like full send AI, coding stuff about slop, like buckling in your laptop on your Waymo’s. Yes. And then I look at your profile, I’m like, oh, you’re just like, you’re in the other end too. Oh, perfect. Makes perfect. Ryan Lopopolo: I it’s quite fun to be AI maximalist if you’re gonna live that persona. Open eye is the place to do it. And it’s swyx: token is what you say. Ryan Lopopolo: Yeah. Certainly helps that we have no rate limits internally. And I can go, like you said, full send at this stay. swyx: Yeah. Yeah. So the Frontier, and you’re a special team within O Frontier. Ryan Lopopolo: We had been given some space to cook, which has been super, super exciting. [00:02:47] Zero Code Experiment Ryan Lopopolo: And this is why I started with kind of a out there constraint to not write any of the code myself. I was figuring if we’re trying to make agents that can be deployed into end to enterprises, they should be [00:03:00] able to do all the things that I do. And having worked with these coding models, these coding harnesses over 6, 7, 8 months, I do feel like the models are there enough, the harnesses are there enough where they’re isomorphic to me in capability and the ability to do the job. So starting with this constraint of I can’t write the code meant that the only way I could do my job was to get the agent to do my job. Vibhu: And like a, just a bit of background before that. This is basically the article. So what you guys did is five months of working on an in

    1h 13m
  8. APR 3

    Marc Andreessen introspects on The Death of the Browser, Pi + OpenClaw, and Why "This Time Is Different"

    Fresh off raising a monster $15B, Marc Andreessen has lived through multiple computing platform shifts firsthand, from Mosaic and Netscape to cofounding A16z. In this episode, Marc joins swyx and Alessio in a16z’s legendary Sand Hill Road office to argue that AI is not just another hype cycle, but the payoff of an “80-year overnight success”: from neural nets and expert systems to transformers, reasoning models, coding, agents, and recursive self-improvement. He lays out why he thinks this moment is different, why AI is finally escaping the old boom-bust pattern, and why the real bottleneck may be less about models than about the messy institutions, incentives, and social systems that struggle to absorb technological change. This episode was a dream come true for us, and many thanks to Erik Torenberg for the assist in setting this up. Full episode on YouTube! We discuss: * Marc’s long view on AI: from the 1980s AI boom and expert systems to AlexNet, transformers, and why he sees today’s moment as the culmination of decades of compounding technical progress * Why “this time is different”: the jump from LLMs to reasoning, coding, agents, and recursive self-improvement, and why Marc thinks these breakthroughs make AI real in a way prior cycles were not * AI winters vs. “80-year overnight success”: why the field repeatedly swings between utopianism and doom, and why Marc thinks the underlying researchers were mostly right even when the timelines were wrong * Scaling laws, Moore’s Law, and what to build: why he believes AI scaling laws will continue, why the outside world is messier than lab purists assume, and how startups can still create durable value on top of rapidly improving models * The dot-com crash and AI infrastructure risk: Marc’s comparison between today’s AI capex boom and the fiber/data-center overbuild of 2000, plus why he thinks this cycle is different because the buyers are huge cash-rich incumbents and demand is already here * Why old NVIDIA chips may be getting more valuable: the pace of software progress, chronic capacity shortages, and the idea that even current models are “sandbagged” by supply constraints * Open source, edge inference, and the chip bottleneck: why Marc thinks local models, Apple Silicon, privacy, trust, and economics all point toward a major role for edge AI * American vs. Chinese open source AI: DeepSeek as a “gift to the world,” why open models matter not just because they’re free but because they teach the world how things work, and how open source strategies may shift as the market consolidates * Why Pi and OpenClaw matter so much: Marc’s claim that the combination of LLM + shell + filesystem + markdown + cron loop is one of the biggest software architecture breakthroughs in decades * Agents as the new “Unix”: how agent state living in files allows portability across models and runtimes, and why self-modifying agents that can extend themselves may redefine what software even is * The future of coding and programming languages: why Marc thinks software becomes abundant, why bots may translate freely across languages, and why “programming language” itself may stop being a salient concept * Browsers, protocols, and human readability: lessons from Mosaic and the web, why text protocols and “view source” mattered, and how similar principles may shape AI-native systems * Real-world OpenClaw use: health dashboards, sleep monitoring, smart homes, rewriting firmware on robot dogs, and why the most aggressive users are discovering both the power and danger of agents first * Proof of human vs. proof of bot: why Marc thinks the internet’s bot problem is now unsolvable via detection alone, and why biometric + cryptographic proof of human becomes necessary Timestamps * 00:00 Marc on AI’s “80-Year Overnight Success” * 00:01 A Quick Message From swyx * 01:44 Inside a16z With Marc Andreessen * 02:13 The Truth About a16z’s AI Pivot * 03:29 Why This AI Boom Is Not Like 2016 * 06:33 Marc on AI Winters, Hype Cycles, and What’s Different Now * 10:09 Reasoning, Coding, Agents, and the New AI Breakthroughs * 12:13 What Founders Should Build as Models Keep Improving * 16:33 AI Capex, GPU Shortages, and the Dot-Com Crash Analogy * 24:54 Open Source AI, Edge Inference, and Why It Matters * 33:03 Why OpenClaw and PI Could Change Software Forever * 41:37 Agents, the End of Interfaces, and Software for Bots * 46:47 Do Programming Languages Even Have a Future? * 54:19 AI Agents Need Money: Payments, Crypto, and Stablecoins * 56:59 Proof of Human, Internet Bots, and the Drone Problem * 01:06:12 AI, Management, and the Return of Founder-Led Companies * 01:12:23 Why the Real Economy May Resist AI Longer Than Expected * 01:15:53 Closing Thoughts Transcript Marc: Something about AI that causes the people in the field, I would say, to become both excessively utopian and excessively apocalyptic. Having said that, I think what’s actually happened is an enormous amount of technical progress that built up over time. And like for, for example, we now know that neural network is the correct architecture.And I, I will tell you like there was a 60 year run where that was like a, you know, or even 70 years where that was controversial. And so, so the way I think about what’s happening is basically, I think, I think about basically the, the, the period we’re in right now is it’s, I call it 80 year overnight success, right?Which is like, it’s an overnight success ‘cause it’s like bam, you know, chat GPT hits and then, and then oh one hits, and then, you know, open claw hits and like, you know, these are open, these are, these are like overnight, like radical, overnight transformative successes, but they’re drawing on an 80 year sort of wellspring backlog, you know, of, of, of, of ideas and thinking it’s not just that it’s all brand new, it’s that it’s an unlock of all of these decades of like very serious, hardcore research.If I were 18, like this is a hundred, this is what I would be spending all of my time on. This is like such an incredible conceptual breakthrough.swyx: Before we get into today’s episode, I just have a small message for listeners. Thank you. We will not be able to bring you the ai, engineering, science, and entertainment contents that you so clearly want if you didn’t choose to also click in and tune into our content.We’ve been approached by sponsors on an almost daily basis, but fortunately enough of you actually subscribed to us to keep all this sustainable without ads, and we wanna keep it that way. But I just have one favor to ask all of you. The single, most powerful, completely free thing you can do is to click that subscribe button.It’s the only thing I’ll ever ask of you, and it means absolutely everything to me and my team that works so hard to bring the in space to you each and every week. If you do it, I promise you will never stop working to make the show even better. Now, let’s get into it.Alessio: Hey everyone, welcome to the Lidian Space Pockets. This is CIO, founder Kernel Labs, and I’m joined by s Swix, editor of Lidian Space.swyx: Hello. And we’re in a 16 Z with a, uh, mark G and welcome.Marc: Yes, yes. A and what, half of 16? Something like that. A one. Exactly,swyx: exactly. Uh, apparently this is the, the final few days in your, your current office.You’re moving across the road.Marc: Uh, we’re, yeah. We have a, we have some, we have some projects underway, but yeah, this is actually, oh, this is the original. We’re in actually the original office. We’re in the, we’re in the, we’re, we’re in the whole thing.swyx: It’s beautiful. Yeah. Great.Marc: Thank you.swyx: So I have to come out, uh, this is a, you know, I wanted to pick a spicy start in October, 2022.I just made friends with Roone and, uh, I wanted to give him something to sort of be spicy about. And I said, uh. Uh, it’ll never not be funny. The A 16 Z was constantly going. The future is where the smart people choose to spend their time and then going deep into crypto and not in ai. And that was in October 22nd, 2022.And Ruen says there was an internal meeting in a 16 Z to reorient around Gen ai. Obviously you have, but was there a meeting? What, what was that?Marc: I mean, I don’t, look, I’ve been doing AI since the late eighties.swyx: Yeah.Marc: So I, I don’t know, like all that, as far as I’m concerned, this stuff is all Johnny cum lately.Yeah. You, I mean, look, we’ve been doing ar entire existence. I mean, we’ve been doing AI machine learning deep, you know, deeply. We’ve been doing this stuff way from the beginning. Obviously a AI is just core to computer science. I, I, I actually view them as like quite, uh, quite continuous. Um, you know, Ben and I both have computer science degrees.Um, you know, we, we both, Ben, Ben and I actually both are world enough to remember the actual AI boom in the 1980s. Yeah. There was like a, there was a big AI boom at the time. Um, and there was a, was names like expert systems. Um, and they of like lisp and lisp machines. Uh, I, I coded in lisp. I was coding a lisp in 1989.When that was the, the language of the AI future. Um, yeah. So this is something that we’re like completely, you completely comfortable with. I’ve been doing the whole time and are very enthusiastic aboutswyx: is there a strong, like this time is different because, uh, my closest analog was 20 16 17. It was an AI boom.Mm-hmm. And it petered out very, very quickly. Um, we, it just, it just in terms of investingMarc: sort of, sort of,swyx: yeah. Investment, investment excitement.Marc: Although that’s really when the, the, the Nvidia phenomenon really, it was, I would say it was in that period when it was very clear that at, at the time it, the vocabulary was more machine learning, but it, it was very clear at that time that machine learning was hitting some sort of takeoff p

    1h 16m
4.6
out of 5
101 Ratings

About

The podcast by and for AI Engineers! In 2025, over 10 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0. We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al. Full show notes always on https://latent.space www.latent.space

You Might Also Like