Latent Space: The AI Engineer Podcast

Latent.Space

The podcast by and for AI Engineers! In 2025, over 10 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0. We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al. Full show notes always on https://latent.space www.latent.space

  1. 3h ago

    GitHub's plan for Agents — Kyle Daigle, GitHub

    I’m excited to work with Microsoft once again as the presenting sponsors of the AI Engineer World’s Fair! We’ll streaming live from MS Build today for a special crossover pod with our friends at No Priors and the one and only Satya Nadella. However we did not hold back with this interview - we asked all the burning questions about uptime and Copilot that we know you have in your minds. Lets go! For almost two decades, GitHub has been the home of software, where both open source and closed flow, through commits, pull requests, reviews, actions, etc. This ecosystem flourished as open-source maintainers and contributors would continue shipping code for the benefit of the community. However as coding agents began to ship mass quantities of code - growing 1400% in 2026, it marked a new era that was both extremely exciting and challenging for GitHub. While these agents help more people ship more projects, they also significantly increase the floor of how much code is shipped, how often it is shipped, how many people commit code, and basically orders of magnitude multiples in every dimension of GitHub infrastructure: Now GitHub inevitably experiences more pressure on their infrastructure which was originally designed around human developers moving at human speed. This has resulted in a very publicly notable uptime story: So it begs the question of whether current systems around code can absorb what AI produces. Can CI/CD keep up when every idea becomes a build? Can open source maintainers survive floods of AI-generated slop contributions? Can GitHub preserve the human social contract of software while becoming the operating layer for agents? Which brings us to the perfect person to answer these questions: GitHub COO Kyle Daigle. In this episode, he joins swyx to unpack what happens when AI doesn’t just autocomplete code, but starts changing how companies operate, how open source works, how pull requests get reviewed, and how GitHub itself has to scale. We go deep on GitHub’s internal AI workflows: micro-skills, WorkIQ, MCP, Slack, Teams, email, Copilot workflows, the new Copilot desktop app, CLI, cloud agents, and how Kyle uses agents to look backwards across company context before deciding what to do next. Kyle also reflects on GitHub’s history building webhooks, APIs, Actions, npm, Dependabot, and Semmle, why the AI era is breaking GitHub in new ways, how Actions became a general-purpose compute layer, and what Copilot becomes after code completion. Full Video Pod We discuss: * Kyle’s expanded role across GitHub * How AI got Kyle coding again after years in leadership * Why GitHub rolls out AI through existing workflows instead of forcing new tools * WorkIQ, MCP, Slack, Teams, email, and GitHub as company context * Why massive “mega-skills” are giving way to small, atomic micro-skills * How AI changes summarization, communications, marketing, and analyst work * Why former developers in leadership may have a unique advantage in the AI era * Kyle’s “15 agents on Saturday” workflow * How Kyle built an AI-generated executive presentation for CRO/CFO teams * Why AI changes the chief of staff role without removing the human work * GitHub Actions, webhooks, arbitrary code execution, and secure agent compute * The npm acquisition, supply-chain security, 2FA, and token invalidation * Slop forks, vendoring, and whether AI agents change dependency management * What pull requests become when most PRs come from agents * Prompt requests, vouching, AI review, and trust in open source * What counts as a “developer” when AI lowers the barrier to building * GitHub Spark, low-code, and why GitHub refuses to hide the code * 14x commit growth, Actions load, databases, monorepos, and availability * Copilot’s evolution from completion to CLI, desktop app, cloud agents, and SDK * Context, memory, rules, and making GitHub “act like Kyle wants it to act” * Ambient AI, OpenClaw, enterprise security, and the new operating system for agents * What swyx should ask Satya Nadella about Microsoft’s AI future Kyle Daigle * LinkedIn: https://www.linkedin.com/in/kyledaigle * X: https://x.com/kdaigle Timestamps 00:00:00 Introduction 00:03:36 Why AI Got Kyle Coding Again 00:07:04 Running GitHub with AI: WorkIQ, MCP, Slack, Teams, and Skills 00:15:39 The Golden Age for Former Developers in Leadership 00:17:31 15 Agents on Saturday and AI-Generated Executive Work 00:20:20 How AI Changes the Chief of Staff Role 00:21:45 GitHub’s History: Actions, npm, Webhooks, and Open Source 00:28:45 Slop Forks, Vendoring, and AI Dependency Management 00:33:57 Pull Requests, Prompt Requests, and Trust in Agent-Generated Code 00:41:21 GitHub Stars, 200M+ Developers, and the New AI Builder Wave 00:45:15 GitHub Spark, Low-Code, and Why GitHub Still Shows the Code 00:47:38 GitHub’s Hardest Era: 14x Growth, Reliability, and Scale 00:59:21 Actions as the Compute Layer for CI/CD and Automation 01:02:04 The State and Future of GitHub Copilot 01:08:24 Ambient AI, Background Agents, and the Future of the SDLC 01:13:09 OpenClaw, Enterprise Security, and the New OS for Agents 01:18:03 Build Announcements, WorkIQ, FoundryIQ, and Microsoft Context 01:21:41 What Should swyx Ask Satya? Transcript Introduction: Kyle Daigle’s Expanded Role at GitHub and Microsoft Swyx [00:00:00]: We’re here with Kyle Daigle, COO of GitHub. Welcome. Kyle [00:00:07]: Hey, thanks for having me. Swyx [00:00:08]: You’re not just CEO of GitHub. People know you as that. You have a new role. Kyle [00:00:11]: So I have an expanded role now. I’ve been working at GitHub for thirteen years and doing all things developer. Joined as a developer myself. And now, I’m also responsible as the CMO of Developer for Microsoft. And so all the kind of learnings and passion for developers and how we work with them and how we communicate and how we bring our products to market, we’re also bringing that expertise to the broader Microsoft ecosystem and helping every developer that uses a Microsoft product or would like to have a sort of similar experience that they’ve had with GitHub over the years. So it’s a different role in some ways, but it’s also just building on the experience that I’ve had at GitHub of just sort of tell the truth, be authentic, show people how to use it and then let the products speak for themselves. Now just doing that with, all of Microsoft. Swyx [00:01:09]: We’ll be releasing this in conjunction with Build. You got lots of stuff planned, and we can sort of touch on that whenever it’s appropriate. I think one of the interesting things is I rarely meet a COO who’s also a CMO. I think you’re a very outward facing and you’re very confident publicly. That’s rare. Do you actually view yourself as COO? What’s What is your thing? From GitHub Developer to COO/CMO: Building the Platform and Operating GitHub Kyle [00:01:33]: I think for me, it’s been funny. The titles have always been, a— have always felt a little strange to me. I joined GitHub as a developer? I wrote so much of the Swyx [00:01:46]: Let’s bring that up. You wrote the back ends? Kyle [00:01:48]: I was going through, I was going through, some old photos, when folks were talking about how things were being built or how there was a build GitHub. I built, webhooks and worked with teams building the API, built the platform layer. Anything that integrated with GitHub, up until really twenty eighteen, I built or ran the engineering teams. And that’s kind of where my the beginning of my passion always was helping people build things, deliver them to, their customers. And so being a developer, building for developers was always super unique. In a— I think as my role expanded, it became my ability to talk to not just developers, but also enterprise customers or business leaders and have this translation layer. And then through all those years, GitHub has always operated pretty uniquely. Post-pandemic, working remotely was not as novel as it was when GitHub started in two thousand and eight. But all that expertise of running remote teams, doing it well, became this sort of bigger role, ultimately turning into the COO role of how do we operate GitHub in the way that GitHub’s always operated after the Microsoft acquisition. And kind of so on from there. So like for me, I think the— I’ve, I still code. I love coding but the problem has always been, people. It’s a much harder problem to both support our own employees, a harder problem to communicate to developers and enterprise buyers what we’re building why it matters, ‘cause those are two very different messages. And so getting to work in the mix of COO, CMO, also just being a dev, I think is what’s kept me at GitHub for so long. AI Workflows for Leadership: Commits, Retrospectives, and Context Swyx [00:03:40]: Apparently, you have— your commits have gone up. What’s this? What’s going on? Kyle [00:03:45]: Rui’s called me out pretty aggressively. So I think— as you can imagine, right, you can see my normal era of being a dev In the twenty thirteen, twenty fourteen era, and then moving into management, and then ultimately the COO role. I think what you see there is me, really getting back to coding thanks to AI. I— similar to, attaching problems between how to market and how to operate a business and how to code, I find, building agents and workflows that are connecting very disparate problems to be what’s driving this. So that’s, some of it’s writing software. A lot of it is, connecting a ton of a different data sources to, help me out. But that is completely me really diving in on the AI side in trying out our tools, trying out everyone’s tools, But building for me, building for the non-technical leader, though I’m technical and how we’re, able to use these tools more than just the simple, call and response that I think a lot of the non-technical, your employers, you have to get— y

    1h 23m
  2. 1d ago

    Why Video Agent models are next — Ethan He, xAI Grok Imagine

    We’re announcing AIEWF speakers this week! Take the AI Engineering Survey! Today’s guest Ethan first joined us for the LS Paper Club as the lead on NVIDIA Cosmos World Model, but then joined xAI and built Grok Imagine in 3 months: He comes back on Latent Space with some nuclear hot takes: that Video Models primarily get their intelligence from LLMs, not from training on video data, and that the next frontier for truly interactive, realtime, long-horizon world models is to work on LLMs (perhaps Interaction Models as well…) Put it this way: In the near term, the next Sora won’t be a better video model, but a video agent. Generative Media may more closely follow the evolution of AI coding which went from focusing on one-shot output performance and cost, to multiturn reasoning and planning models for agents and systems that can plan, edit, test, debug, and submit PRs. At a certain point, coding models got so good that the only significant next step to improve performance was handling the orchestration of these models. Now as the performance of video models increases significantly across realism, consistency, & prompt adherence while becoming more cost efficient, the next evolution of video generation may also be systems that can plan, generate, edit, critique, and iterate across an entire creative task. In this episode, Ethan joins swyx and Vibhu to unpack what it actually takes to build frontier image and video systems: data, VAEs, diffusion transformers, audio-video alignment, inference speedups, and the hidden cost of storing and moving massive video datasets. From building NVIDIA’s Cosmos world model to joining xAI as Grok Imagine was being built from zero to one, Ethan He has been at the center of some of the most important work in video generation, multimodal models, and real-time world models. We go deep on Grok Imagine, how a small xAI team shipped its first multimodal video model in three months, why iteration speed matters more than almost anything in model development, and why many of the biggest gains come from fixing tiny bugs in data and training pipelines. Flipbook: The future of Videomaxxing Video agents are almost a sure bet to be the trend in the coming year. We end with a glance at what’s beyond video agents: Flipbook caused a minor sensation this year when it was released, but most treat it as a fun demo. Ethan takes it very seriously — with the speed and cost of inference coming down every year, the future of custom video JIT UI is closer than you think. We talked about why videogen models may become the front end of AI, how generative UI could replace traditional HTML/CSS, why world models need to be real-time, interactive, and long-horizon, and why the future of video generation may depend more on language models and agents than on diffusion alone. We discuss: * Why fast iteration mattered more than meetings * Why small training bugs can drive huge model quality gains * Why coding models may make compute the bottleneck again * How image and video models are trained with synthetic captions * The role of VAEs and latent space in frontier video models * Why image models are the foundation for video models * The tradeoff between temporal compression and real-time interactivity * Flipbook, Neural OS, and the future of generative UI * Why future interfaces may go from user intent to pixels * The hidden cost of training video models: storage, egress, and GPU hours * How step distillation and consistency models (like OpenAI sCM) makes video inference orders of magnitude faster * Grok Imagine 0.9 and large-scale audio-video generation * Why audio-video alignment is harder than text-video alignment * Ethan’s definition of world models * Reference-to-video, video extension, and long-context video generation * Why xAI’s research communication undersells Grok Imagine * How xAI culture shaped the speed of development * AI watermarking, SynthID, and detecting generated media * Why prompt rewriting matters for video models * Grok Imagine Agent and the rise of video agents * Why language models may unlock better video generation * Robotics, physical AI, and embodied world models * Why Ethan left xAI and shifted focus toward LLMs * Self-managed context, memory, and the next frontier for language models Ethan He * LinkedIn: https://www.linkedin.com/in/ethanhe42 * X: https://x.com/EthanHe_42 Timestamps 00:00:00 Introduction 00:01:25 From NVIDIA Cosmos to xAI 00:03:24 Building Grok Imagine from Zero to One 00:10:07 How Image and Video Models Are Trained 00:18:53 Video Compression, VAEs, and Real-Time Tradeoffs 00:22:10 Generative UI, Flipbook, and Neural OS 00:32:10 The Cost of Training Large Video Models 00:37:04 Distillation, GANs, and Fast Video Inference 00:41:21 Audio-Video Generation and Grok Imagine 0.9 00:48:34 What Makes a World Model? 00:55:51 Reference Videos, Long Context, and Video Memory 01:00:11 xAI Culture, Research, and First-Principles Building 01:09:45 AI Safety, Watermarking, and Prompt Rewriting 01:13:10 Video Agents and AI-Assisted Creation 01:27:32 Why Language Models Unlock Better Video 01:31:15 Robotics, Physical AI, and Embodied World Models 01:32:38 Why Ethan Left xAI 01:34:16 Self-Managed Context and the Future of LLMs 01:38:43 Ethan’s Career Path and Closing Thoughts Transcript Introduction: Ethan He, Latent Space, and the Path to xAI Swyx [00:00:00]: We’re here in the studio with Ethan He, most recently of xAI. Welcome. Ethan [00:00:10]: Thank you. Glad being here. Swyx [00:00:11]: We’re also here with Vibhu. you were first coming to us or joining the latent space world because you were working on Kosmos at NVIDIA, and you did a paper. We loved it. you presented it as well, so thank you for doing that. Ethan [00:00:23]: I’ve actually, I also presented the MoEs twice at latent space. Swyx [00:00:29]: How did you actually hear about us? Did we reach out to you? Is that how it worked? Ethan [00:00:33]: No, actually, I-- the community. Like I realized, oh, there is this online community that people talk about AI and also learn from each other through papers every week through the Paperclip. It’s very nice. Ethan [00:00:49]: I learned a lot. Swyx [00:00:49]: I think three years stop. We haven’t stopped even on Christmas and New Years. many weeks I want to stop but it keeps going. Vibhu [00:00:58]: No, that was good. I think you had posted that you worked on a paper, and I was “Oh, very cool. We have Paperclip. Present then.” Vibhu [00:01:04]: But I might have reached out to you after. Swyx [00:01:05]: you-- because it’s an amateur club, right? Swyx [00:01:08]: so it’s very unusual and but we have sometimes paper authors come by and actually explain the paper. Today we just did, the poolside paper, which was apparently very good. Vibhu [00:01:18]: Came out yesterday. Vibhu [00:01:19]: pretty interesting, right? Fully open. They talk about everything, systems. So it’s a good one. We’ll, we’ll recommend people to read it. Swyx [00:01:25]: Bring us up to speed on your transition to xAI, ‘cause I actually don’t even know when you joined. just like tell the, tell the story about the sort of transition. From NVIDIA Cosmos to xAI: Scaling Video and World Models Ethan [00:01:34]: Before xAI, I was working on Kosmos world model as in-- at NVIDIA. So Kosmos is, it’s a giant video foundation models that can-- that aims to simulate the world and for-- it serves as a foundation of-- for all of the roboticists to build on top of. There, once I built the Kosmos one, I realized as this thing also has a scaling law similar to language model, we need to scale up the video models further. that’s, that’s why I realized I need to move to somewhere with much more compute resources. That’s how I Swyx [00:02:13]: Than NVIDIA? Vibhu [00:02:14]: The GPU rich came themselves. Vibhu [00:02:19]: And timeline-wise, when was Kosmo? It was pretty early, right? It was open world model, open paper, everything. Ethan [00:02:25]: It was end of twenty-four. Vibhu [00:02:28]: End of twenty-four. Ethan [00:02:30]: Then at mid twenty-five, I moved to xAI. At that time-- I joined about the time when xAI was about to build video models and in multi-model models. There were no infra, no data, and no model, and it just-- as a few engineers, we built it in three months and released the first model, Grok Imagine zero point nine. Ethan [00:02:55]: And since then, I keep working on video models and move more from training and to post-training of the video models. For example, like a reference to videos, kind of like the cameo feature and, video extensions. And, before I left, I worked on a world model, leading a small team to focus on the real-time long horizon video generation. Building Grok Imagine From Scratch in Three Months Swyx [00:03:24]: Can you give like a rough roadmap of okay, you’re on a brand-new team. Grok previously was only text, or they partnered with BFL for their image gen stuff. What do you-- what are the building blocks, right? You have compute, data you can procure somewhere. Like just what are like the sequence of things that people should think about when you’re setting up a new team? Vibhu [00:03:43]: actually even deeper, not just data you can procure. You guys had to go through getting the data too, right? So you shipped it pretty fast, but yeah Swyx [00:03:51]: three months is like Vibhu [00:03:52]: From everything Swyx [00:03:52]: actually like very surprisingly fast. Ethan [00:03:55]: One thing I say like thanks to my experience at NVIDIA, ‘cause first time when we were building Kosmos together, we built it, for about a year. So this is like the second time I do it. Roughly have an idea, what to do. I say the most important thing is the talent. Everyone were very strong and clever, very close with each other towards a common goal. So that speed up things a lot. So you reduce the communication bandwidth among people, and everyone can work

    1h 43m
  3. 5d ago

    The Age of Async Agents — Cognition's Walden Yan & OpenInspect's Cole Murray

    The new AIEWF website is live! CFPs close in 2 days and we will run our first New Engineer Orientation this weekend, get your tickets booked ASAP as they -will- sell out. Take the AI Engineering Survey and get >$2k in credits and free AIE WF tickets! One of the central tensions in the agents industry is that even while there are major decacorn agent labs like Sierra, Decagon, Notion and Cursor being built up, it is also true that it has never been easier to DIY agents, with a plethora of agent frameworks like LangGraph and Pydantic and Flue, and managed agents from Anthropic and Gemini and Amazon. There has been a wave of companies building their own background agents from Shopify to Stripe to Paradigm to Razorpay, and even Cognition’s friends Ramp have built their own coding agent with other friend Modal. You’d think Cognition might feel a bit threatened, but they’re not - even after all this, they were way oversubscribed for the $1B Series D they just announced: Walden Yan, coiner of context engineering and Chief Product Officer/Cofounder of Cognition, invited OpenInspect’s Cole Murray to talk about why the Devin is in the Details. Full conversation live on the pod today: In retrospect, async agents were the most AGI pilled bet you could make in 2024 - the models weren’t good enough yet to vibecode, and people didn’t trust AI enough to let it rip, nobody (including early Cognition) was sure about the form factors. Now it is obvious: * The first wave of AI coding tools made the developer faster but remain heavily in the loop. Copilor and Cursor’s tab autocomplete are prime examples However, the workflow was still heavily centered around and bottlenecked by the developer’s local workflow: a developer in an IDE, watching the model, accepting or rejecting changes, and pushing code one interaction at a time. * The second wave was local agents: Claude Code, Windsurf, Cursor’s agents pane: first one and increasingly many terminals all running concurrently. * The current Age of Async Agents points to a different future focused more on agent orchestration which drives end-to-end development. According to previous guest Steve Yegge, there are finer-grained 8 levels to agent adoption, but we have collapsed it into three. As Cursor’s Michael Truell put it in The third era of AI software development: Cursor is no longer primarily about writing code. It is about helping developers build the factory that creates their software. This factory is made up of fleets of agents that they interact with as teammates: providing initial direction, equipping them with the tools to work independently, and reviewing their work. The agent should not sit solely inside the developer’s flow. It should be setup to work in the background so that you can give it a task, a repo, a machine, a shell, a browser, tests, memory, and review loops to go do the work somewhere else. In less than a year, the sentiment has shifted from avoiding multi-agent systems: to suggesting approaches that actually work: From coining “context engineering” to building the infrastructure behind Devin’s 7x PR growth and jump from 16% to 80% of commits across Cognition repos, Walden Yan has had a front-row seat to the background-agent shift. In this episode, Cognition co-founder and CPO Walden Yan joins swyx alongside Cole Murray, creator of OpenInspect, to unpack why everyone is building their own Devin, what changed after the December 2025 model inflection, and why “spec to pull request” is now becoming a real production workflow. We go deep on the architecture of background agents: harness-in-the-box vs out-of-the-box, why Devin separates the “brain” from the machine, why repo setup is still one of the hardest problems, why Docker is not always enough, and how full VMs, snapshots, scoped secrets, GitHub bots, Slack integrations, and video-based testing all fit together. Walden and Cole also dig into memory, MCP limitations, multi-agent orchestration, AI code review, SRE auto-triage, PMs shipping code from Slack, Windsurf 2.0, hybrid frontier/sub-frontier systems, and the real failure mode of uncontrolled vibe coding: your codebase regressing to your worst engineer. And as agents eat software… and software eats the world… you can draw the conclusion on what is next: We discuss: * Why the engineering world is waking up to background agents and cloud agents * The December 2025 model inflection that made spec-to-PR workflows practical * Devin’s 7x merged PR growth and rise from 16% to 80% of commits * Why Cole built OpenInspect as an open-source background-agent system * The economics of $20/seat agent products and why monetization is tricky * What Cognition actually sells beyond Devin: infra, onboarding, integrations, and adoption * Harness in the box vs out of the box, and why architecture matters * Why Devin separates the brain from the machine for security and permissions * Repo setup, scoped secrets, Docker Compose, and agent-ready dev environments * Why full VMs matter when agents need to run real applications and test them * Android, macOS, Windows, nested virtualization, and machine-specific agent work * Why testing is much harder than “computer use” * Screenshots, video verification, and the “I know it works” merge moment * GitHub UX, Devin Review, AI reviewers, and agents responding to PR comments * Why MCP alone is not enough for first-class Slack and enterprise integrations * Memory, Knowledge, skills, Claude.md, and why retrieval is still unsolved * Devin’s auto-generated memories and the challenge of memory pruning * Always-on agents as permanent PMs for issues, tickets, and product areas * Sub-agents, meta-Devin management, and what multi-agent systems actually add * Why pure auto-merge vibe coding breaks down after about two weeks * AI code smells, lint rules, reward hacking, and Semgrep for agent-written code * GitAI, inline context, and preserving the “why” behind code changes * Local testing, mock servers, older codebases, and preparing companies for agents * Windsurf 2.0 and the handoff between local foreground agents and cloud background agents * SRE auto-triage, support workflows, and agents as first responders * PMs, marketing, and non-engineers creating pull requests from Slack * AI agent budgets, $1k-$5k per engineer spend, and hybrid frontier/sub-frontier systems * The rise of autonomous coding factories and who Cognition is hiring Walden Yan * X: https://x.com/walden_yan * LinkedIn: https://www.linkedin.com/in/waldenyan/ Cole Murray * X: https://x.com/_colemurray * LinkedIn: https://www.linkedin.com/in/colemurray/ * OpenInspect / Background Agents: https://github.com/ColeMurray/background-agents Timestamps 00:00:00 Introduction00:00:43 Why Everyone Is Building Their Own Devin00:01:57 Devin’s 2025 Ramp: 7x PR Growth and 80% of Commits00:03:49 OpenInspect and the Rise of Open-Source Background Agents00:07:59 What Cognition Actually Sells Beyond Devin00:09:56 Background Agent Architecture: Harness In vs Out of the Box00:12:08 Separating the Brain from the Machine00:14:07 Repo Setup, Secrets, Docker, and Full VMs00:19:13 Why Testing Is Harder Than Computer Use00:22:40 Video Verification and the “I Know It Works” Merge Moment00:23:19 GitHub UX, Devin Review, and AI Code Review00:25:42 MCP, Slack, and Enterprise Agent Integrations00:28:59 Memory, Knowledge, and Always-On Agents00:36:16 Sub-Agents, Multi-Agent Orchestration, and Meta-Devin00:43:55 Vibe Coding, Auto-Merge, and Codebase Decay00:48:38 Agent Infra, VPCs, Cloud Providers, and Fast VM Restore00:52:25 AI Code Smells, Reward Hacking, and Code Review Systems00:56:10 Making Codebases Agent-Ready00:58:30 Windsurf 2.0 and the Local-to-Cloud Agent Handoff01:01:15 SRE Auto-Triage, PMs Shipping Code, and Agent Use Cases01:04:32 Agent Budgets, Hybrid Models, and Autonomous Coding Factories01:06:51 Hiring at Cognition and OpenInspect Consulting01:07:45 Outro Transcript Introduction: Walden Yan, Cole Murray, and Context Engineering Swyx [00:00:00]: All right, we’re in the studio with Walden Yan, co-founder of Cognition, CPO. Walden [00:00:08]: Happy to be here. Swyx [00:00:09]: Which is a cool title. And coiner of context engineering. Walden [00:00:15]: Although I think there are many people who’d used the terms in various ways beforehand, but I did find that people, both internally and externally, enjoyed the upgrade from prompt engineering or model wrapping into maybe a more thoughtful way to build agents. Swyx [00:00:33]: For those who haven’t caught up on that, I have on screen the Don’t Build Multi-Agents post, which you should go read on and we might refer to, and Cole Murray, who created OpenInspect. Cole [00:00:43]: Great to be here. Swyx [00:00:43]: So let’s talk about it. Everyone is building their own Devins. What’s going on? The December Shift: From Handholding Models to Autonomous PRs Cole [00:00:51]: So I think the engineering world is waking up to this idea of background agents, cloud agents, whatever you’d like to call it. And I think we saw a shift around the December timeframe of 2025, where the models Opus 4.5 and GPT 5.2, they reached a capability where we moved away from handholding the model and being able to actually more or less autonomously drive the model. And what I mean by that is that we could pretty much go from a specification to a completed pull request, assuming the spec was good enough, with very little friction. And that paradigm alone, I think, changed a lot of how we interact with agents, and opened this world where background agents became more practical. Swyx [00:01:41]: I think for Cole, everyone experienced this in December, but I feel like there was just this increasing ramp, right? There was this moment which was, I think, Sonnet 3.7, where, You guys rewrote Devin in one night or something. So describe 2025 or how it felt from your side. Walden [00:02:01]: In retrospect, we alw

    1h 8m
  4. 6d ago

    🔬ESM: The Bitter Lesson is Coming for Proteins - Alex Rives, BioHub

    Editor’s note: In our first BioHub pod with Priscilla and Mark they discussed their acquisition of EvoScale, led by Alex Rives, who is now Head of Science at BioHub. With ESM-1 they trained language models on millions of protein sequences drawn from across life, with a simple “next token” objective: predict the amino acids that have been randomly masked out, based on the context of the rest of the sequence. But they soon found that these models also learned biological structure and function, including properties the model had never been explicitly shown AND that this ability scales predictably with compute, leading to ESM2 and ESM3. Today, Alex announced ESMFold 2, an open scientific engine to power prediction, design, and discovery across protein biology. Building on Cryo-EM data (discussed in the CZI pod), ESMFold2 reports state of the art performance on protein interactions, especially antibodies, a critical modality for therapeutics, and evidence that inference time scaling is also working across five targets in cancer and immunology. In a nod to that other famous AI x protein folding project, they are also releasing an atlas of 6.8 billion proteins, and 1.1 billion predicted structures, which you can play around with on their website. We are honored to work with them for this huge release! One of the refrains we’ve heard on the Science pod has been that protein folding, materials design, cellular biology, etc. are very different problems from Language Modeling. They definitely are. Yet Alex Rives and the ESM team at BioHub just released a preprint and model, demonstrating that vanilla BERT-like transformer models trained on sufficiently large and diverse data sets can beat specialized models like AlphaFold3 on some of the hardest protein-related problems. Andrew White had a great segment in our first LS-Science episode that explained how mind blowing AlphaFold2 was when it was released in 2020: it suddenly solved problems on a GPU on your desktop that DESRes had built custom-ASIC supercomputer clusters to solve. John Jumper and Demmis Hassabis received the Nobel Prize in Chemistry for this work. AlphaFold2 took advantage of an very clever observation: if multiple species co-evolve pairs of mutations, this implies that the mutations correspond to parts of the protein that are close in 3d space. This is usually shorthanded as MSAs (multi-sequence alignments), and is the key insight which makes AlphaFold2 so effective. Like other inductive biases, however, it hurts generalization. Scale-pilled before it was cool If you take a look at the timeline for scaling laws for LLMs and release of structure prediction models, the ESM team notably doubled down on their MSAs-be-damned approach after AlphaFold2 released. This obviously requires a great deal of belief in the scale hypothesis. Why the conviction? ESM developed at a time when many of the scaling laws and the “Bitter Lesson” were proving increasingly correct. AlphaFold2’s wild success must have been both exciting and bitterly disappointing. But using MSAs mean that the model is is dependent on training data that contains MSAs in order to be accurate in a given domain. For things like antibodies that don’t have MSAs to train on, AlphaFold tends to do poorly. ESM takes a different approach: learn the relationship between different proteins by unsupervised training on as much diversity as you can find (sound familiar?) and then correlate that back to structures know from the Protein Data Bank (PDB) and other sources. In other words, a World Model. World Model for proteins “World Model” is a hype term that I define like this: Use unsupervised training to learn abstract patterns from the data: * The abstraction should be semantic - novel constructions represent things that obey the rules of the real world * The abstraction should be compositional - recombining different patterns leads to novel and often valid constructions * The abstraction should support generalization - it predicts things in the real world it wasn’t trained on Once you have a world model, you can attach “heads” to it for downstream tasks: predict properties of a protein, decompose its functional features, or search the representation for proteins that meet design criteria. The two big models BioHub just released under MIT license map directly onto this: * World model → ESMC (a model trained on 2.8 billion sequences) * Structure-prediction head → ESMFold2 One of the interesting ways the world model can “predict things” is to generate proteins sequences and then measure the predicted properties, such as binding affinity, in the lab. Alex talks in the episode about validating some of the harder molecules they predicted in the wet-lab. Very cool! Another way is to use mech-interp techniques such as Sparse Auto Encoders (SAEs) to extract semantic features from your model, and then find novel features that predict unknown biology. I won’t spoil this part for you: it was one of the highlights of the episode for me! A cell is a computer We have all heard that genes are like computer programs, but usually the analogy fizzles after that. Of course genes are transcribed into RNA and RNA is translated into proteins, so genes are programs for building proteins, but that carries the analogy only to “binary digits are programs.” Here’s a better analogy: you can think of the cell nucleus as a storage device / storage controller, the ribosome as a JIT-compiler and runtime, and the semantic features that we learn from our world model via SAEs as functions, proteins as processes that interact together in workflows (signalling pathways) to produce behaviors and outputs (phenotypes). Like functions, the SAE features have a hierarchical composition from local, secondary and tertiary structures (mimicing protein structure), but also motifs that are conceptual, such as membrane integrations, disordered regions and disulfide bonds. As we learn to compose these features we into novel protein designs, we move further towards programmable biology. Alex goes into much more detail about this in the episode, as well as: * Principles for new data collection * BioHub’s vision * Modeling the cell Enjoy! Full Video podcast please like and subscribe! * X: https://x.com/alexrives * LinkedIn: This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe

    1h 10m
  5. May 21

    Giving Agents Computers — Ivan Burazin, Daytona

    Take the 2026 AI Engineering Survey and get >$2k in credits and AIE WF tickets! On the product side, everyone is getting Computer - Perplexity, Manus, Cursor, and so on. Meanwhile on the research side, agentic evals like TerminalBench and GDPVal are also assuming computer (Harbor). On both ends, the consolidating LLM OS stack has become a standard toolkit, and Daytona is one of a small set of AI Infra companies that are booming because of it. “The end of localhost” has been Ivan Burazin’s obsession for more than a decade. Something that is all too familiar… Long before agents became the default way people talked about software development, Ivan was already chasing the idea that development should not depend on a fragile local machine. CodeAnywhere, one of the first browser-based IDEs, was an early attempt at that future: move the development environment into the cloud, make setup reproducible, and free developers from the endless “works on my machine” tax. The thesis was directionally right, but the market wasn’t ready yet.However, agents changed that. They do not care about a laptop, desk setup, or favorite editor. They need a computer they can access through an API: something stateful enough to keep working, fast enough to spin up instantly, flexible enough to resize, isolated enough to be safe, and composable enough to run the messy real-world workflows that real software engineering actually requires.Daytona isn’t just selling “sandboxes” in the narrow code-execution sense. It is the latest version of Ivan’s original localhost thesis. In this episode, Daytona’s CEO joins swyx to explain why AI agents need more than code execution boxes: they need composable computers, stateful sandboxes, instant startup, dynamic resources, and infrastructure that can survive workloads going from zero to 100,000 CPUs. We go deep on the new agent compute market: Daytona’s hard pivot from human dev environments to AI sandboxes, the New Year’s Eve MVP that customers begged for, why Daytona runs on bare metal with its own scheduler, how one customer runs almost 850,000 sandboxes a day, and why RL/eval workloads went from 0% to roughly 50% of usage in just months. Ivan also explains why agents need Windows and macOS machines, why CLI may matter more than MCP, why Kubernetes is painful for this workload, and why the future AI cloud may look more like Stripe than AWS. We discuss: * How Daytona grew out of CodeAnywhere, Shift, and the “end of localhost” thesis * Why Daytona pivoted from human dev environments to AI sandboxes * Why agents need composable computers instead of disposable code execution boxes * The New Year’s Eve MVP that customers chased API keys for * Why Daytona chose bare metal, stateful snapshots, and its own scheduler * How Daytona spins up one sandbox in ~60ms and 50,000 sandboxes in ~75 seconds * Why Daytona’s biggest customer runs ~850,000 sandboxes a day * How RL/eval workloads create zero-to-100,000 CPU spikes * Why RL workloads went from 0% to roughly 50% of Daytona usage * Why customers compare Daytona against EKS/GKS and say they’re “never going back” * Why every AI agent may need a computer, including Windows and macOS environments * The Apple licensing constraints that make macOS sandboxes hard * Why CLI gives agents more power than MCP * How open source helps agents integrate Daytona * Why agent-generated PRs may break today’s CI/CD assumptions * Why AI SaaS companies reselling tokens may face a cold shower * Why the AI cloud may look more like Stripe than AWS Ivan Burazin * LinkedIn: https://www.linkedin.com/in/ivanburazin * X: https://x.com/ivanburazin Daytona * Website: https://www.daytona.io * X: https://x.com/daytonaio Timestamps * 00:00:00 Hook * 00:01:12 Introduction * 00:03:15 CodeAnywhere, Shift, and the end of localhost * 00:05:58 What Daytona is: composable computers for AI agents * 00:08:07 The pivot from dev environments to AI sandboxes * 00:10:17 The New Year’s Eve MVP and customers begging for API keys * 00:12:56 Bare metal, stateful sandboxes, and Daytona’s scheduler * 00:17:28 60ms startup, 50,000 sandboxes, and 850K daily runs * 00:21:53 Spiky RL/eval workloads and the new agent infra problem * 00:28:12 RL workloads, Kubernetes pain, and dynamic resizing * 00:33:31 Why every AI agent needs a computer * 00:38:48 macOS sandboxes and Apple’s licensing problem * 00:44:28 Why CLI may matter more than MCP * 00:48:11 Open source, GitHub stars, and agent integration * 00:53:11 Git, CI/CD, and agent collaboration bottlenecks * 00:58:15 Founder life and building a 25-person infra company * 01:02:44 AI SaaS, token resale, and API-first business models * 01:06:10 GPU sandboxes, data centers, and compute growth * 01:09:48 Why the AI cloud may look more like Stripe than AWS * 01:11:26 Closing thoughts Transcript Introduction: Daytona, CodeAnywhere, and the End of Localhost Swyx [00:00:02]: Okay, we’re in the studio with Ivan Burazin, CEO of Daytona. Welcome. Ivan [00:00:07]: Thanks for having me, man. Swyx [00:00:08]: Ivan, you and I go back. Ivan [00:00:10]: Way back. Swyx [00:00:11]: How I don’t even know how, you found, did you reach out or, for Shift. Ivan [00:00:17]: I reached out to you. The reason was you - we were just - we were thinking about I was one of the co-founders of CodeAnywhere, the first browser-based IDE, and so we were thinking a long time of, localhost should die. And you had this article. Swyx [00:00:29]: End of localhost. Ivan [00:00:30]: Then I reached out to you because of that, and then we talked, and I was actually at a different job and learning about I was the head of, developer experience, and you were quite well-versed in that, and I actually reached out to you, among other people, how do we go about that? What are the key things and whatnot at this point in time? And you were nice enough to take the call, and I remember I was late on your call with you. Swyx [00:00:51]: I don’t remember. Ivan [00:00:52]: I remember because I was with my then I’m thinking of a girlfriend or wife at that point in time, I’m not sure. It’s the same person, so that’s great, and I was late ‘cause we were, in, Italy on, vacation, and then I was late for something. I felt so bad, and you were so nice to be, good about. Swyx [00:01:10]: The reason I’m nice is because I’m also late to other people, so it’s like, who’s, who’s without sin here, yeah, so I have to, for those who don’t know, InfoBip Shift, there’s this whole thing that, you did in the past, and, and that was basically one of the inspirations for me starting AI Engineer, which is like, I have to thank you for giving me that push to be like, “Oh, you can, you can build and sell conferences?” Ivan [00:01:34]: I remember you asked you asked me at the beginning to give me advisory shares, and I was so focused on what we were doing, I said no, and I should’ve took the advisory shares. So I’m sorry, dude. But anyway. Swyx [00:01:43]: We’re not, we’re not venture backed. Ivan [00:01:44]: No, it doesn’t matter. Swyx [00:01:45]: It’s Yeah, anyway, so I think what’s impressive about you is that CodeAnywhere is the thing that you’ve been trying to build, and, you kind of put it on hold and then came back after InfoBip. Just give us the story, do you - the story and the origin story, going into Daytona. From CodeAnywhere and Shift to Daytona Ivan [00:02:05]: Sure. Like, really way back, me and my co-founder have been together. I say this, I’ve said this multiple times, it’s like we were married and divorced and married. Some people actually ask me is my co-founder my partner. they thought it literally. It’s not literally, but we have done multiple companies together, and to your point, we had this shift where we went from the CodeAnywhere to the conference called Shift, and then back to, Daytona. We originally started stacking servers, doing like virtualization in the early 2000s and, routers and doing basically all these things, at a foundational level, and that was a services company which we sold to focus on what my co-founder actually invented, which was the very first browser-based IDE, right, I say the first. Before us was actually Heroku. They did it for a very short time until they became Heroku. But outside of them, we were the only one, and it was called. Swyx [00:02:55]: There was Cloud9. Ivan [00:02:57]: Cloud9 came out slightly after us. There was Replit, which came out when we stopped doing it, Replit came out, and they have been successful since then, which is great. There was Nitrous.io. There was quite a few that existed at the time, but it was like too early. But the interesting part is that we, at that point in time, because there was no VS Code, there was no Kubernetes, and Docker had just started when we Or I’m not sure if it was even public at that point in time. And so we had to build everything to the whole stack ourselves and that was the key learning that we brought into and that we’ve been using in Daytona today. So it was super early. There’s about 3 million people used CodeAnywhere. It was slightly, it was angel-backed more than venture-backed. We ended up paying everyone back because it didn’t have that sort of scale. But, three years ago, we started something similar with Daytona, which is not what we are today, but it was automating dev environments for human engineers, the basically the underlying stack of CodeAnywhere. And then we did a hard pivot last January to sandboxes. And so here we are. Swyx [00:04:01]: Historic pivot, yeah, and, it’s one of those things where, I had independently invested in CodeAnywhere, but also in E2B, and then both of you pivoted into the same thing, and I’m like, “F**k.” Ivan [00:04:12]: You invested, you invested in Daytona. You invested in Daytona. But you were the first If we had not got your check, we wouldn’t have done it. Swyx [00:04:18]: No way. I

    1h 10m
  6. May 20

    Railway: The Agent-Native Cloud — Jake Cooper

    Take the 2026 AI Engineering Survey and get >$2k in credits and AIE WF tickets! This was recorded before Railway suffered a major GCP outage on May 19, despite being a multi-AZ, multi-zone mesh ring, with HA fiber interconnects between their Metal > GCP > AWS, because workload discoverability was unintentionally still tied to GCP. All has been resolved with a post-mortem. Railway did not start as an AI infrastructure company. It was founded in 2020 years before agents became the default way people thought about deploying software. Jake Cooper, formerly at Bloomberg and Uber, started Railway with a simple obsession: the activation energy to ship something to production should be near zero. Push code, get a URL, iterate. No Docker files, no Kubernetes manifests, no Ansible scripts stacked on Ansible scripts. For years, this was a slow grind. Railway spent its first 18 months hand-acquiring its first 100 users with Jake personally greeting every Discord signup on a second monitor. Today, Railway has raised $124m and is growing very fast. A 35-person team supports 3 million users, adding roughly 100,000 signups a week. Their bare metal data centers have a 3-month payback period vs. renting in the cloud, with 70% margins funding aggressive cloud bursting when needed. The servers they own have actually appreciated in value as RAM prices have climbed basically meaning the value of their hardware now exceeds the capital they've raised. From rebuilding Railway’s network overlay over a weekend to moving the vast majority of workloads onto its own bare metal data centers, Jake Cooper is trying to build a new cloud for an agent-native world. In this episode, Railway’s founder and “conductor” joins swyx and Alessio to unpack why the next era of software infrastructure is not just “Heroku but newer,” what agents need that humans did not, and why the old deployment loop of Git, PRs, CI/CD, and static cloud resources may be heading for a rewrite. We go deep on Railway’s infrastructure stack: own-metal data centers, three-month cloud payback periods, cloud bursting, data center debt, Railpack, Nixpacks, Temporal, feature flags, Central Station, content-addressable filesystems, agent-safe production forks, and why the CLI may become more important than the canvas in an agent world. Jake also shares the founder journey behind Railway, how the company survived losing $500K/month, why it now serves millions of users with only 35 people, and why he believes the pull request is dying. We discuss: * How Railway went from a slow six-year grind to adding 100,000 users a week * How Railway thinks about agents as the next dominant software species * Why agents need version control, observability, compute, storage, and orchestration at 1000x scale * The economics of Railway’s own-metal data centers and three-month payback * How Railway uses cloud bursting while scaling its own infrastructure * Why data center debt can be a better tool than venture debt for infra startups * Central Station, Railway’s internal system for clustering customer feedback and incidents * Why responsible disclosure and over-communication matter for platforms * Why feature flags, progressive rollouts, and shadow traffic are essential for agents * Temporal’s strengths, pain points, and why workflows matter for agents * Railpack, Nixpacks, Nix, and lazy-loaded content-addressable filesystems * Why “cattle, not pets” may change if you can clone the pets * Why Railway is building a new cloud from scratch instead of copying hyperscalers * The solo founder path, focus, writing, and how Jake thinks about company building Railway: * Website: https://railway.com/ * X: https://x.com/Railway Jake Cooper: * LinkedIn: https://www.linkedin.com/in/thejakecooper/ * X: https://x.com/JustJake Timestamps 00:00:00 Introduction: What Is Railway?00:02:07 Jake’s Path to Railway00:06:13 Railway’s Six-Year Growth Story00:08:52 Rebuilding the Business After the Free Tier00:11:17 Agents as the Next Software Platform00:13:29 Railway’s Infrastructure Philosophy00:15:42 Bare Metal, Cloud Economics, and the Compute Crunch00:17:22 Cloud Bursting and Five-Cloud Networking00:20:20 Data Center Debt and Infra Financing00:23:31 Data Centers in Space00:25:24 What Agents Need From Infrastructure00:28:24 CLIs, Canvas, and Agent-Native UX00:35:15 Central Station, Incidents, and Responsible Disclosure00:40:30 Safe Rollouts, SRE Agents, and Production Forks00:45:00 AI SRE, Specs, Code, and Tests00:48:24 Self-Replicating Infrastructure and the New Serverless00:53:18 Heroku, Temporal, and Workflow Engines01:04:07 Railpack, Nixpacks, and Lazy-Loaded Filesystems01:06:01 Coding Agents, Token Spend, and Roadmap Acceleration01:10:56 The Pull Request Is Dying01:12:28 Feature Flags and the Agent-Era SDLC01:16:15 Cattle, Pets, and Cloning Machines01:19:29 Solo Founder Lessons01:24:12 Focus, GPUs, and Building a New Cloud01:28:20 Closing Thoughts Transcript Alessio [00:00:00]: Hey, everyone. Welcome to the Latent Space Podcast. This is Alessio, founder of Kernel Labs, and I’m joined by Swyx, editor of Latent Space. Swyx [00:00:10]: Hey, hey, hey. Today we’re in the studio with Jake Cooper of Railway. Alessio [00:00:14]: Conductor of Railway. Swyx [00:00:15]: Conductor at Railway. Yeah. Alessio [00:00:16]: Choo-choo. Swyx [00:00:17]: Do you actually have that anywhere, like on your business card? Jake [00:00:20]: We call some of our volunteer moderators conductors. I don’t have a business card. We’re not that big yet. At some point I will. I got handed a nice business card from the Supermicro folks, and I was like, “Damn, this is pretty official.” Swyx [00:00:30]: Business cards are coming back. Jake [00:00:32]: They’re cool. They’re hip. The conductor thing is good. We’re trying to figure out what we want to call each other internally. Some people think it’s super cringe and say, “You don’t need a name for people internally.” Some people want to call each other something. We still don’t have a really good one. Jake [00:00:55]: We’ve got New Railcrews, Trainiacs. Nothing has stuck yet. Swyx [00:01:00]: I like Trainiac. Trainiac sounds good. Railwayians. For those who don’t know, what is Railway? Let’s give people a crisp definition up front. Jake [00:01:09]: Railway is the easiest way to ship anything. You go to the canvas, or you talk with Claude, and you say, “Deploy a Postgres instance, deploy my GitHub repository, run this code,” and you’re off to the races. Swyx [00:01:22]: You’ve got a nice animation on the landing page. Jake [00:01:24]: Thank you. None of my work, by the way. They don’t let me touch the design stuff anymore. Jake [00:01:25]: We want to make it trivially easy not just to deploy things, but to evolve applications over time. Most tooling right now stacks entropy on top of entropy: Docker, Kubernetes, Ansible scripts, and all these other things. If we can version all of your software and keep track of all the changes, then we can make it trivial to clone environments, fork into a parallel universe, get copies of production data, get copies of any services, make changes, validate them, and collapse them back in without reproducing everything across a staging environment. The Railway Origin Story: From Uber Systems to a New Cloud Swyx [00:02:07]: I was looking at your background: Bloomberg, Uber. Nothing immediately stands out as, “This guy is going to found the next great platform as a service.” What prepared you for Railway? Jake [00:02:21]: It was curiosity to keep going deeper. I started out on front-end stuff, working on Wolfram Mathematica and porting it over. Then I briefly moved to Bloomberg, then toward Uber and distributed systems, taking the Jump Bikes systems and moving them to a distributed system built on top of Cadence, the pre-Temporal Temporal. Swyx [00:02:44]: Which, by the way, I’m happy to talk about, pros and cons. Jake [00:02:48]: Totally. Swyx [00:02:51]: But let’s do the Railway story. Jake [00:02:52]: It has been a continual step of wanting an experience. Whether it’s walking up to a bike, unlocking it, and having it work frictionlessly, or something else, the depth required to make that happen follows from the experience. A lot of the work I do, and a lot of the team does, is in service of that experience. We fundamentally don’t care how deep we have to go. We will swim to the bottom of the swimming pool to get the experience. Jake [00:03:17]: I don’t have a physics PhD. I did an EECS degree. It has always been about figuring out the next step: how do we get there? That’s what led to starting Railway for that experience and then moving all the way to bare metal data centers. I was adding patches to the kernel this week to get the experience there because I can see how much better it can be. Swyx [00:03:49]: Other patches to the Linux kernel this week? Jake [00:03:51]: Yeah. Not upstream. Our fork. Swyx [00:03:52]: That’s a flex. Railpack? No, this is different. This is the OS on top of Railpack? Jake [00:03:57]: No, this is an actual kernel patch. It’s always literally: what do we have to do to get that experience? Then figure it out. Anything is figureoutable. Swyx [00:04:10]: Would you send the patch upstream, or does it not fit other use cases? Jake [00:04:13]: Maybe. We have to work out the experience internally. It has to do with the storage layer we’re building for some of the agentic stuff. Maybe it’ll be useful upstream, but it’s deeply useful for us internally. Open Source, Forks, and Non-Deterministic Versioning Swyx [00:04:29]: You mentioned open source before. How do you think about starting from open source, and then coding agents letting you do a lot more from forks of it? Jake [00:04:38]: GitHub’s original sin is that it’s almost a series of broken pointers. You have this thing, then you clone it, and now you’ve lost the whole upstream. How do we make i

    1h 29m
  7. May 18

    The Autonomous Drone Tech Stack & Economics of Drones — Yaroslav Azhnyuk, The Fourth Law & Guest Host Noah Smith, Noahpinion

    The future of war has been evolving before our eyes in Ukraine, yet the west still plans to fight the last war. In this special episode, guest host Noah Smith (@noahpinion) and Brandon Anderson sit down with Yaroslav Azhnyuk (@YaroslavAzhnyuk), a serial tech founder who went from building PetCube to founding The Fourth Law, one of the world’s most advanced AI-guided drone companies. Over two hours we cover the technology, tactics, and geopolitics of drone warfare, and why the modern battlefield has already left the West behind: * Yaroslav’s personal history and the Ukraine war [00:01:04 – 00:14:01] * The modern drone tech stack: why FPV drones are the new god of war, the future of the rifleman, fiber optic vs. AI, five levels of autonomy, and the eight dimensions of the autonomous battlefield [00:14:01 – 01:05:13] * The geopolitics and economics of drones: China’s manufacturing advantage, the drone race, Western defense readiness, countermeasures, and why the gap is widening [01:05:13 – 01:58:57] For those looking for Noah Smith’s commentary, it really gets going around the 00:51:31 mark. Yaroslav Azhnyuk / The Fourth Law: * X: https://x.com/YaroslavAzhnyuk * LinkedIn: https://www.linkedin.com/in/yaroslavazhnyuk/ * The Fourth Law: https://thefourthlaw.ai Noah Smith: * Substack: Noah Smith * X: https://x.com/noahpinion Timestamps 00:00:00 Cold Open: China’s 4 Billion Drones and the Cameras-to-Explosives Pipeline 00:01:04 Introduction: Brandon, Noah Smith, and Yaroslav Azhnyuk 00:05:41 From Tech Entrepreneur to Defense: PetCube, Brave One, and the D3 Fund 00:10:42 The Ethics of Building Weapons: Dual-Use Technology and the Wolf at the Door 00:14:01 The Tech Stack: Cameras, Autonomy Modules, Interceptors, and a Semiconductor Fab 00:18:47 Fiber Optic vs. AI: The Radio Horizon Problem and $32/km Cable 00:25:32 FPV Drones: The New God of War — 70–80% of Frontline Casualties 00:28:28 The Five Levels of Drone Autonomy: From Terminal Guidance to Full Autonomy 00:41:37 The Eight Dimensions of the Autonomous Battlefield 00:45:32 AI Safety and the Morality of Autonomous Weapons 00:51:31 The End of the Rifleman? Noah’s 2013 Prediction vs. Battlefield Reality 01:05:13 China’s Manufacturing Advantage and Western Vulnerabilities 01:24:21 Policy Advice for Western Defense: Defense Valley and the Widening Gap 01:32:54 The Drone Race: Who’s Ahead, Category by Category 01:41:57 Countermeasures: Shotguns, Jammers, Lasers, and Fishnets 01:58:19 The Wedding and Final Takeaway: Be Prepared for War Transcript Cold Open: China, FPV Drones, and the New Warning Sign Yaroslav [00:00:00]: Think about this. Last year, Ukraine produced 4 million FPV drones. Ukraine is not the most industrious nation in the world. China can produce 4 billion of these FPV drones. Noah [00:00:10]: Would you say that right now China is now the supreme conventional military power on Earth, given its ability to manufacture and deploy drones in the quantity and quality that you just described? Yaroslav [00:00:20]: I don’t think we have all the information to claim that but we cannot count it out, and that alone should be a big warning sign. As I say, at some point in my life I went from making cameras that fling treats to pets to cameras that fling explosives to the occupiers. So that’s the short story. And when you think about what your nation, what your patriots are going through, you realize that’s the only morally right thing to do is to fight back, and it is immoral not to fight back, and then the choice becomes very clear. Introduction: Yaroslav Azhnyuk, Petcube, and the Last Flight into Kyiv Brandon [00:01:04]: Welcome to Latent Space. I’m Brandon. I normally do science podcasts, but today we’re going to do something a little bit different. I’m joined by Noah Smith of Noahpinion on Substack and Twitter. And he has lots of interesting things to say about drones. And as a guest, we have Yaroslav Azhnyuk, founder of The Fourth Law and several other, drone-related startups. To get started, it is February 23rd, 2022. You are running a pet startup. You’re connecting pets with their owners. Let’s go in just a little bit of background. How did you get started in tech, and what were you working on before the Ukrainian war started? Yaroslav [00:01:50]: Good to be here. Thank you. On February 23rd, late in the evening, 11:00 PM Kyiv time, my wife and I landed in Kyiv. Actually, then she was a fiance. We came from Lviv, where we were looking at a church, where our wedding should have taken place. And we got into this cab ride from the airport to our home, and the driver was like, “You crazy. Like, everyone’s leaving Kyiv. Why do you come?” We’re like, “What? Nothing’s going to happen. Dude, chill.” And then obviously, eight minutes later, or eight hours later, the bombs fell in the city. It was quite surreal. We probably landed on the last flight that landed in Kyiv, or one of those last flights. My background, I’m a tech guy. Studied applied mathematics in Kyiv Polytechnics, born and raised in Kyiv. My parents are old PhDs from academia, and grandparents too. Like, everything, from linguistics to nuclear physics. And I’m an entrepreneur, so I’ve built a bunch of companies. Petcube is the one you were referencing. So I lived in San Francisco 2014 to 2020, building Petcube, which is one of the leading, pet device companies in the world, selling lots of pet cameras. And then, yeah, as I say, at some point in my life I went from making cameras that fling treats to pets to cameras that fling explosives to the occupiers. So that’s the short story. February 24th: Leaving Kyiv as the Invasion Begins Noah [00:03:28]: February 24th, I guess a few hours after you, go to check out your wedding chapel, what do you do? Yaroslav [00:03:37]: We had a plan for this situation. So my parents and family live in Kyiv, and we’re like, “Okay, this has actually started. The worst has, come true.” And so we basically packed our belongings and got in the car and spent 17 hours driving west. And that was pretty sure most people in our audience watched at least one apocalyptic movie in their life, so that was exactly like that. Like, felt exactly like that. Missiles are falling. Like, there was smoke in Kyiv. Like, my dad and I went, like, to central part of the cities. It’s probably, like Yaroslav [00:04:20]: 800 meters from presidential office, to pick some stuff up at his workplace. Because he’s, like, the head of an academic institution, so he had to get some of the things with him. And super surreal. Like, the streets are empty. Like, the gas stations are out of gas. Like, we found some gas station. We didn’t have, like, spare canisters with us, so we’re like, We figured out, like, the car was diesel, so like, we figured out, if it’s diesel, you can actually store it in plastic, canisters, and we bought some window wash for the cars. We poured it out of the canisters, and we poured the diesel into that. Yeah, so it was like that. And then, like, helping friends get out, like my friend and his dog. Like, we found Like, my brother was also, like, riding in a separate car. We found a place for my friend who didn’t have a car. It was like, yeah, it was like, totally surreal. And we didn’t know of course, and you didn’t know this will last for so long. You didn’t know whether Ukraine will be able to defend Kyiv. And it was like, yeah, very little information and very little insight into future. From Pet Cameras to Defense Tech: Building for Ukraine and the Free World Noah [00:05:42]: What are your thoughts with regards to how do you, defend, Ukraine? So you eventually start building drones Like, what is the process to get from there from where you were building, devices that connect owners with pets to building drones, and what other things did you do to help the war effort in the process? Yaroslav [00:06:07]: It’s definitely non-trivial, right? Like, I didn’t go, to I didn’t get any, like, military education when I was a student. Like, normally, in Ukraine, you would, you would go to like, this military school even if you’re getting higher education in any other, sphere. I decided to skip that which is like, an unusual way to go. And I never thought that I will be somehow engaged in a war effort. Like, what is war? Of course, wars are over. It’s the end of history. So one thing you got to understand about, like, many Ukrainians and like, I guess, it’s also true about most of the people I met here in the US, that your who you are in terms of your nationality is a big part of your identity. So when that gets under attack, it’s something deeper than just the country you live in gets under attack, right? And I Day one, I figured I’m going to I’m going to fight back with everything I can, right? But I didn’t think on day one that I’m actually going to do, weapons. And a bunch of things. We were reaching out to a number of American, congresspeople and senators, and basically advocating for support of Ukraine, for voting for lend lease, which has happened in May 2022, but didn’t actually work as expected. We helped start, Brave One, which is now a very important defense innovation cluster, sort of like a DIU here in the US. We helped start, a fund called D3. It’s like, it was started or co-started by Eric Schmidt, former CEO of Google. So a bunch of these odd things, but then eventually I was like, “Okay,”by 2023 it was obvious this thing, A is going to last a lot more time, and B, that the whole world is shifting and that there’s going to be a new arms race, that the warfare is redefined by drones as platforms. And for the first time in history, you have a platform that is software defined, that can increase your battlefield capabilities, in a in a step change just overnight. So it’s like if you were able to push a software update and get all of your Roman legionnaires a new h

    1h 59m
  8. May 14

    AI-Native Healthcare: 100M Doctor Visits, 10–20 Hours Saved, Prior Auth in Minutes — Janie Lee & Chai Asawa, Abridge

    Special discounts up for AIE Melbourne (LS discount) and AIE World’s Fair (group discounts up to 25% - CFPs still open for Autoresearch and Vertical AI) Cya there! Abridge did not start as an “GPT wrapper”. It was founded in 2018, years before the Cambrian explosion of AI application layer companies. OpenAI launched ChatGPT publicly on November 30, 2022 and by then, Abridge had already spent years doing the unglamorous work of building trust for one of the highest context, most important workflows in healthcare: the conversation between a patient and a clinician. Abridge’s original wedge was clinical documentation. Listen to the visit, generate the note, reduce the clerical burden, and let clinicians spend more time with patients instead of the EHR. By focusing on how doctors actually document, how health systems actually buy, how EHR integration actually works, how clinicians verify outputs, and how missing context during a visit turns into downstream friction across billing, prior authorization, quality, and follow-up, the adoption of LLMs became a force multiplier on a workflow already optimized for sensitive context gathering. The company has scaled fast: Abridge says it is projected to support 80M+ patient-clinician conversations this year across 250 large and complex U.S. health systems, with support for 28+ languages and 50+ specialties. It raised $300M at a $5.3B valuation in June 2025, after a $250M round earlier that year. Today, Janie Lee and Chaitanya “Chai” Asawa of Abridge join us for another crossover pod with Redpoint’s Jacob Effron (who is on the board of Abridge) to dive into how Abridge is building the clinical intelligence layer for healthcare starting with ambient documentation, then expanding into clinical decision support, prior authorization, payer/provider/pharma workflows, and eventually real-time agents that act before, during, and after the patient conversation. We go inside the product, data, infra, evals, workflow, privacy, and org design choices behind bringing AI into one of the highest-stakes enterprise environments from 100M+ medical conversations and specialty-specific evals to real-time alerts, EHR integration, de-identification, clinician-scientist teams, and why healthcare may solve some of the hardest AI problems first. We discuss: * Why Abridge started with clinical documentation, “pajama time,” and saving clinicians 10–20 hours a week * The transition from ambient scribe to clinical intelligence layer: save time, save money, and save lives * Why conversations between patients and clinicians may be the most important workflow in healthcare (patient visit summary feature) * Chai’s “healthcare-coded Glean” framing: context is king, but healthcare raises the stakes on safety, evals, and rollout * Why Abridge wants AI to feel like “air conditioning”: always in the background, but only interrupting when it truly matters * The prior authorization example: turning a denied MRI weeks later into real-time guidance while the patient is still in the room * Why payer policies, EHR data, medical literature, and hospital-specific guidelines make the problem hard, and also create the moat * How Abridge thinks about ambient form factors: mobile, desktop, in-room devices, nursing workflows, multimodality, and future AR * The multi-sided healthcare customer: CMIOs, CFOs, CIOs, clinicians, patients, payers, and pharma * The hardest AI problem at Abridge: high-quality, low-latency, low-cost real-time support in a high-stakes clinical setting * When Abridge uses frontier models vs proprietary models, and why its unique data from medical conversations matters * Why “every agent is a coding agent underneath,” and how the EHR can be thought of as a filesystem for healthcare agents * How Abridge approaches personalization across individual doctors, specialties, and health systems * Why “AI slop” is AI without context, and how edits, memories, and clinician preferences create a data flywheel * Abridge’s eval stack: LFDs, LLM judges, in-house clinicians, third-party evaluators, specialty-specific evals, and progressive rollout * HIPAA, PHI, de-identification, one-way anonymization, customer contracts, and learning from healthcare data safely * What changes when you operate at 100M+ conversations: reliability, cost, post-training, model routing, and infrastructure optimization * Why the same clinical conversation can serve doctors, patients, payers, pharma, and future clinical-trial workflows * How Abridge works with EHRs, and why deep interoperability is table stakes for clinician adoption * Why healthcare AI has regulatory tailwinds, why 80/20 does not work here, and why high-stakes domains may drive AI forward * Why Abridge embeds “clinician scientists” into product and eval teams * What Chai learned from Glean about search, quality, and durable AI infrastructure * Why the future of AI infra may look like context layers, event-driven systems, Kafka, Temporal, sockets, CRDTs, and tools built for humans * Why Janie changed her mind on “PRDs are dead,” and why crisp written clarity matters more in complex AI products * How Abridge uses Claude Code, Cursor, and coding agents internally Abridge: * Website: https://www.abridge.com/ * X: https://x.com/AbridgeHQ Janie Lee: * LinkedIn: https://www.linkedin.com/in/janiejlee Chaitanya “Chai” Asawa: * LinkedIn: https://www.linkedin.com/in/casawa Timestamps 00:00:00 Introduction and what Abridge does 00:02:05 From ambient documentation to clinical intelligence 00:04:04 Clinical decision support and context as king 00:06:57 Alert fatigue, proactive intelligence, and prior authorization 00:12:36 Ambient AI form factors and healthcare customers 00:16:59 The hardest AI problems in healthcare 00:18:26 Frontier models, proprietary data, and model strategy 00:21:07 The EHR as a filesystem for agents 00:24:03 Personalization, memory, and clinician preferences 00:30:40 Evals, LLM judges, and progressive rollout 00:36:47 HIPAA, de-identification, and privacy 00:39:21 100M conversations and operating at scale 00:44:10 EHR integration and the clinical intelligence layer 00:46:39 Healthcare regulation, latency, and high-stakes AI 00:50:11 Clinician scientists and long-tail quality 00:53:04 Lessons from Glean and durable AI infrastructure 00:57:03 The future of agentic healthcare workflows 00:57:34 PRDs, product clarity, and building serious AI products 01:03:11 AI coding tools at Abridge 01:04:06 Outro Transcript Introduction: Abridge, Clinical Intelligence, and the Latent Space x Unsupervised Learning Crossover Swyx [00:00:00]: Okay. This is a special crossover Latent Space Unsupervised Learning pod. Jacob [00:00:07]: Very excited to do this. Jacob [00:00:08]: At this point, we get together once a year. Swyx [00:00:10]: Once a year Jacob [00:00:11]: And this is a fun occasion to get to do it on. Swyx [00:00:13]: I really wanted to talk to Abridge but I felt very underqualified because healthcare is not something we cover very intensely. It just so happens that Redpoint’s our big investors and supporters of Abridge. Jacob [00:00:27]: Anytime you want to have a portfolio company on your podcast Jacob [00:00:29]: Please, by all means. Swyx [00:00:31]: So we’ll introduce our guests. Chai and Janie, welcome to the pod. Janie [00:00:34]: Thanks for having us. Chai [00:00:35]: Thank you. Janie [00:00:35]: We’re excited to be here. Chai [00:00:36]: Thank you. Swyx [00:00:36]: So for listeners, what do you guys do, just to situate you guys in the company? Janie [00:00:42]: Abridge is a clinical intelligence layer for health systems. We really started with documentation and building for clinicians and as we think about reducing the burden that clinicians have, they’re spending 10 to 20 hours a week on documentation. There’s a massive doctor shortage in the country. We also think that conversations between patients and clinicians are probably the most important workflow in healthcare. It’s where care is given and received but if you think about the 20% of our GDP that goes towards healthcare, almost everything is a derivative of that conversation, whether it’s the claim, the payment, the actual diagnosis given, the treatment. And we’ve started with a conversation to reduce the burden for doctors on documentation but we’re really excited about the path ahead as we become this broader clinical intelligence layer. Chai [00:01:34]: I’m Chai. I work on clinical decision support at Abridge. Swyx [00:01:37]: Yes. Chai [00:01:37]: And so as Janie said, we’re uniquely situated where we started off with the clinical note. What I’m really excited about and where we’re expanding towards is what are all the things you can do before the conversation, during the conversation and after the conversation if you did have access to all the context about patients, payer guidelines, medical literature and put that together and to serve, how healthcare could look fundamentally different. Swyx [00:02:01]: And that’s the context engine that you guys have? Chai [00:02:04]: Yes. Swyx [00:02:04]: Is that what it’s called? Okay. Swyx [00:02:05]: So historically, as I understand it, the company started in 2018. A lot of people would be familiar with the AI voice notes form factor that doctors would be “Well, do you consent to being recorded?” It replaces handwriting and what have you. But it sounds like more recently there’s been a big transition in the company. Tell me about the broader transition. From Documentation to Clinical Intelligence: Save Time, Save Money, Save Lives Janie [00:02:26]: So from a transition perspective, we really think about our journey as The first act was: how do we help save time? And that’s where a lot of that original product was. Swyx [00:02:37]: By the way, one of those interesting stats Swyx [00:02:39]: On your landing page was, doctors spend time after hours. Janie [00:02:43]: They c

    1h 5m
4.6
out of 5
101 Ratings

About

The podcast by and for AI Engineers! In 2025, over 10 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0. We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al. Full show notes always on https://latent.space www.latent.space

You Might Also Like