Latent Space: The AI Engineer Podcast

Latent.Space

The podcast by and for AI Engineers! In 2025, over 10 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0. We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al. Full show notes always on https://latent.space www.latent.space

  1. 5D AGO

    🔬Why There Is No "AlphaFold for Materials" — AI for Materials Discovery with Heather Kulik

    Materials science is the unsung hero of the science world. Behind every physical product you interact was decades of research into getting the properties of materials just right. Your gym clothes contain synthetic fibers developed over decades. The glass screen, diodes, and chip substrate technology needed to read this blog post were only viable due to many teams of material scientists. Our guest Prof. Heather Kulik was one of the first material scientists to realize that there was alpha in combining computational tools with data driven modeling — she did AI for science before it was cool. She has a hard-fought perspective for how to succeed in this field. Yes, she believes the wins are real. To get there you must work hard to deeply integrate domain expertise with AI techniques, and also maintain a discriminating mind. Ultimately what matters is you succeed in the lab, and nature doesn’t care about how hyped a model is. These lessons personally resonated with the Latent.Space Science team and our own experience. This episode is a must watch for all aspiring AI for science practitioners. A few highlights: Designing new polymers with AI: Heather’s group recently used AI to design new polymers that are significantly stronger. These materials were created and tested in the lab, and the scientists who built them were surprised by the designs. The AI had figured out certain building blocks could break in a novel way. The AI discovered a purely quantum mechanical effect, and after convincing their lab collaborators to actually synthesize it, the material turned out to be four times tougher! The twenty-two-atom ligand challenge: When asked about the role and need of human scientists, Heather points out that AI has a strong understanding of academic chemistry, but is still lacking intuition. Every time an LLM is updated, Heather asks it to design a ligand that contains exactly twenty-two heavy atoms. She has yet to find one that can succeed at this seemingly simple task that any expert could do in a second! Is this the chemistry counterpart to counting ‘r’s in strawberry? Side note: Heather joked that this comment would date itself immediately, so we decided to see if this was still true three months after recording. We found some interesting results! We asked both Claude and ChatGPT to design a 22 atom ligand for both a metal-organic framework (MOF) and a Kinase protein. * For the Kinase, both models got it right: Claude pulled out RDKit in a python script and iterated on several designs, whereas ChatGPT just one-shotted it. * For MOFs, both models got it wrong, generating ligands with 21, 23, or 24 atoms, yet stubbornly not getting 22 atoms. Is there something different about how LLMs reason in the materials and bio domains? Materials vs biology: The two biggest domains of AI in science have been biology and materials. We asked Heather if there could be an AlphaFold moment for materials. Her answer reframes how we should think about the field: * First, the datasets in material science are woefully lacking in comparison to the bio world. The closest to ground truth in most cases are noisy DFT datasets. These are just approximations to the real world! The datasets that are accurate are all boring, as Heather quipped “We have really good datasets for really boring chemistry.” Furthermore, good experimental structures are hard to come by and require interpretation. So generating generating high-quality, novel datasets at scale would really drive the field forward. * More philosophically, AlphaFold is making predictions in a fairly limited space: there are just twenty amino acids. Sure, even here AlphaFold doesn’t get everything right, but it seems plausible that one could learn the entire design space. For materials, each element is a new set of interactions and chemistry, with little to no transferability. This is a massive open problem in material science that we hope some of the smartest AI scientists will want to work on! The difficulties of trusting the literature: Heather’s team has spent the last few years using NLP and later LLMs to extract data from literature. Even a few thousand data points from these papers can be valuable for guiding her group’s work. One surprising result: sometimes the reported values for a property (say temperature) do not match up with the graphs in the papers! So there’s lots of potential in using LLMs to mine data from the literature, just do it with care. The role of academia in an ever-changing world: One theme that has been running through many of our conversations has been the changing role of the academic — and the scientist — in science. When startups are raising $100s of millions and hyperscalers and Big Pharma are all ramping up AI-for-science efforts, the academic researcher needs both resources and judgement about problems to chase more than ever. Resources include data that is organized for machine learning, access to high throughput experimentation labs, and compute resources. These are all things that academics can build together. More importantly, Heather emphasizes curiosity about problems that haven’t hit the radar of the heavily capitalized AI companies. After so many years on the forefront of AI for Science, Heather’s judgement that Chemical Engineering and Material Science still need curious people asking questions with no clear path to money is a welcome beacon in the AI fog. Full Video podcast Is on Youtube! This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe

    35 min
  2. MAR 20

    Dreamer: the Personal Agent OS — David Singleton

    Mar 23 update for Latent Spacenauts: this episode was recorded before the Dreamer team announced they were joining Meta Superintelligence Labs, and it turned out to be the last interview they did before the news became public. Consider this a snapshot from just before the transition! In 2024, David Singleton left Stripe and joined forces with Hugo Barra for a buzzy stealth startup named /dev/agents. This month they emerged out as Dreamer, a consumer-first platform to discover, build, and use AI agents and agentic apps, centered on a personal “Sidekick” that helps users customize experiences via natural language. Sidekick is nothing less than an “agent that builds agents”, with all the complexity that that entails: You’ve seen many many website builder, app builder, and even agent builder startups by now, but our favorite detail is the sheer amount of work that has gone into the “full stack” nature of the platform, including shipping their own SDK, logging, database, prompt management, serverless functions, and so on. Most platforms restrict the tech stack you can use just to get off the ground — Dreamer does it “right” by letting you push whatever arbitrary code you want to their VMs. Paying the Builders Of course former leaders of Stripe and Android would not stop at just building the tools, but also building the ecosystem. Dreamer is deeply aware of the 4 sided network effect it has going on and is ready to fund all of it. It’s time to Dream! Full Video Episode on youtube. Transcript [00:00:00] Meet Dreamer Purple [00:00:00] swyx: Okay, we’re here in the studio with David Singleton. Welcome. [00:00:08] David Singleton: Hey, Wix. It’s great to be here. [00:00:09] swyx: It’s great to have you. Uh, we have very sympa that your company color is the same as Lean Spaces color. [00:00:15] David Singleton: That’s right. Dreamer Purple. [00:00:17] swyx: It used to be Devrel agents, which I thought was very cool. It’s like you call back to Devrel Payments. [00:00:22] David Singleton: Yeah. [00:00:22] swyx: And you were obviously CTO Stripe. And talk to me about just the origin or thinking process behind Dreamer. Yeah. And maybe, maybe start with like, what, what is Dreamer? [00:00:31] David Singleton: Yeah. [00:00:31] What Is Dreamer [00:00:31] David Singleton: So Dreamer is a new product, uh, which everyone can come and play with today. Um, it’s a place where everyone, literally, everyone can discover, build, and enjoy and use AI agents and agenda apps. [00:00:45] And we really did design it for consumers, for folks who are not necessarily. Uh, have any kind of technical background. It’s really aimed at everyone. I think often of my sister, she’s very smart. She’s not in the slightest bit technical. She has lots of problems in her life that [00:01:00] she would like to be able to have great software and intelligent software to solve. [00:01:04] But you know, even with the rise of tools like Cloud Code and so forth, she’s got no way to get started. And Dreamer is a place where she can come in, grab some intelligent apps that other people in the community have built, start using them right away, and solve real problems in her life. [00:01:19] Sidekick And Waitlist [00:01:19] David Singleton: And at the core, we have a personal agent called the Sidekick. [00:01:24] Um, you can give your sidekick a name, you can give it its own personality, and it really helps you across your entire day, your life. It helps you use all of the agents on the platform, and it also helps you build anything you want. And we’ve been working in this for a little while. We recently launched in beta. [00:01:41] So anyone can go to dreamer.com, join the wait list. Um, and we have many, many, many people in the community now who are building really fun, really powerful, really useful. Agents and the agentic apps for themselves. [00:01:54] swyx: I think we’re gonna go right into a demo. Yeah. I just wanna make an observation that, uh, you, you, [00:02:00] you put discover first before build. [00:02:02] Mm-hmm. But actually, at least for the engineers in the audience. ‘cause we are primarily engineers and you’re primarily targeting consumers, right? [00:02:08] David Singleton: Yeah. [00:02:08] swyx: For engineers. Like, there’s a huge full stack of stuff, which we’re gonna dive into. Let’s write. It’s so impressive. I’m like, holy s**t, this, this is what I’ve always wanted. [00:02:16] Cool. Uh, so, so I think that’s really good and I’ve, in some ways, I think given your background given, uh, Hugo’s, is it Hugo? Hugo. [00:02:24] David Singleton: Hugo. Hugo Bar. Yeah. [00:02:25] swyx: Hugo, it’s not surprising that you can basically kind of build an app store Yeah. For agents. [00:02:30] David Singleton: Yeah. So Hugo was my co-founder. Yeah. Um, Hugo and I met with our other co-founder Nicholas Checkoff in the very early days of Android at Google, where we were building Google’s first mobile apps. [00:02:41] Uh, we then contributed to very core pieces of Android itself. And you’re right, we were really excited about building two things. One, solving a bunch of problems. That this breakthrough technology here I’m talking about mobile needed to have solved in order to make it work for real people at scale. And then secondly, building this ecosystem, um, [00:03:00] of third party developers using the Play Store, um, and able to deliver way more value on the platform than we could have delivered on our own. [00:03:08] And we think about Dreamer in exactly the same way. So I was working at Stripe, as you mentioned, and we had the opportunity to put some of the very first AI agent systems in the world into production. And from the moment we did the first of those, I was just struck with a strong sense of conviction that this is breakthrough technology that’s gonna change how all of us work with computers and phones and so forth, all of the, the technology in our lives, but. [00:03:34] There’s a lot of problems to be solved, for real people to be able to make this approachable. Um, and it really is kind of a direct analog for what we were solving back in the early days of mobile apps at Google and, and Android. So it’s, it’s been fun to bring that to life. [00:03:47] swyx: Yeah. Uh, let’s look at it. [00:03:48] David Singleton: Yeah, let’s take a look. [00:03:49] Dashboard And Daily Briefing [00:03:49] David Singleton: So, uh, dreamer.com, this is our homepage. This is where you can come and, uh, watch some videos about what is here and sign up for the wait list. Once [00:03:57] swyx: you, I, I just wanna say for those listening, ‘cause we have a lot, you [00:04:00] know, switch to YouTube, look at the animations. So much care. [00:04:03] David Singleton: We, we really care about, uh, this product being fun. [00:04:07] Uh, and, and interesting to use. Obviously a lot of people are using it to do real important stuff. You can do real work, uh, here, uh, but also you can build fun things too. Once you get off of our wait list, you’ll come into the product. The first thing that happens is you’ll have a conversation with your side cake, which is this little friendly, uh, character here. [00:04:27] And psychic will seek to get to know you and understand you. What do you care about? And will help you discover and build your first AI agents or agentic apps. After that, you’re, you’re gonna have a dashboard. This is my dashboard. Everyone’s is different. Um, you can see I have a few things here. I have a feed. [00:04:42] So a lot of our agents do things in the background when you’re not looking and the feed is how they let you know what they’ve been up to. I have, uh, some widgets, uh, from apps that I have built. Uh, this one is called Calendar Hero. Uh, this is something that I installed from the gallery. Uh, so built by someone in our community. [00:04:59] It’s a [00:05:00] really powerful calendar app because for each of my meetings, if it’s with someone I don’t already know, well it’ll actually go off and research it, um, and give me both a history of my interactions with those people and also a bunch of, you know, public useful information to, to get started. One of the things I love about this particular app is that every day it generates a podcast, um, a daily briefing. [00:05:24] And one of the things that we’ve done with the platform is we’ve made it possible for all the things that agents do to show up in places that you care about. So if you look over here, this is the screen in my phone, and if I go ahead and open my Apple Podcasts, you can see right here. Your Daily briefing podcast is ready. [00:05:39] This was produced by an agent running in my Dreamer account, and it was very easy by scanning a QR code to connect it to my Apple podcast. That’s what I listened to in the car now every morning. Yeah. On my way to work. [00:05:50] swyx: It, it [00:05:50] David Singleton: preps me for, for my day. [00:05:52] swyx: So one additional bit of context. I asked you immediately after seeing this was like, what, what about, I wanna talk back to my agent and you said you actually started with voice and then you went to [00:06:00] podcasts. [00:06:00] ‘cause it’s nice to have it pre downloaded [00:06:02] David Singleton: that, right? That’s right. Um, yeah, we, you, you can talk to your sidekick. So, you know, on mobile we have, uh, a dreamer app and you can talk to the sidekick right here. Um, but we’ve actually found that making things, uh, show up in the other apps that you already use in your life is incredibly powerful. [00:06:19] So let’s take a look at what’s kind of under the hood here. [00:06:21] Gallery Tools And Payouts [00:06:21] David Singleton: So I already mentioned that we have a gallery, so this is where you’ll find a lot of agents from our community. Uh, there’s. Many at this point, hundreds. And they ar

    1h 4m
  3. MAR 17

    Why Anthropic Thinks AI Should Have Its Own Computer — Felix Rieseberg of Claude Cowork & Claude Code Desktop

    Claude Cowork came out of an accident. Felix and the Anthropic team noticed something interesting with Claude Code: many users were using it primarily for all kinds of messy knowledge work instead of coding. Even technical builders would use it for lots of non-technical work. Even more shocking, Claude cowork wrote itself. With a team of humans simply orchestrating multiple claude code instances, the tool was ready after a brief week and a half. This isn’t Felix’s first rodeo with impactful and playful desktop apps. He’s helped ship the Slack desktop app and is a core maintainer of Electron the open-source software framework used for building cross-platform desktop applications, even putting Windows 95 into an Electron app that runs on macOS, Windows, and Linux. In this episode, Felix joins us to unpack why execution has suddenly become cheap enough that teams can “just build all the candidates” and why the real frontier in AI products is no longer better chat, but trusted task execution. He also shares why Anthropic is betting on local-first agent workflows, why skills may matter more than most people realize, and how the hardest questions ahead are about autonomy, safety, portability, and the changing shape of knowledge work itself. We discuss * Felix’s path: Slack desktop app, Electron, Windows 95 in JavaScript, and now building Claude Cowork at Anthropic * What Claude Cowork actually is: a more user-friendly, VM-based version of Claude Code designed to bring agentic workflows to non-terminal-native users * Why “user-friendly” does not mean “less powerful”: Cowork as a superset product, much like how VS Code initially looked simpler than Visual Studio but became more hackable and extensible * Anthropic’s prototype-first culture: why Cowork was built in 10 days using many pre-existing internal pieces, and how internal prototypes shaped the final product * Why execution is getting cheap: the shift from long memos, specs, and debate toward rapidly building multiple candidates and choosing based on reality instead of theory * The local debate: why Felix thinks Silicon Valley is undervaluing the local computer, and why putting Claude “where you work” is often more powerful * Why Claude gets its own computer: the VM as both a safety boundary and a capability unlock, letting Claude install tools, run scripts, and work more independently without constant approval * Safety through sandboxing: why “approve every command” is not a real long-term UX, and how virtual machines create a middle ground between uselessly safe and dangerously autonomous * How Cowork differs from Claude Code: coding evals vs. knowledge-work evals, different system-prompt tradeoffs, longer planning horizons, and heavier use of planning and clarification tools * Why skills matter: simple markdown-based instructions as a lightweight abstraction layer for reusable workflows, personalized automation, and portable agent behavior * Skills vs. MCPs: why Felix is increasingly interested in file-based, text-native interfaces that tell the model what to do, rather than forcing everything through rigid tool schemas * The portability problem: why personal skills should move across agent products, and the unresolved tension between public reusable workflows and private user-specific context * Real use cases already happening today: uploading videos, organizing files, handling taxes, managing calendars, debugging internal crashes, analyzing finances, and automating repetitive browser workflows * Why AI products should work with your existing stack: Anthropic’s bias toward integrating with Chrome, Office, and existing workflows instead of rebuilding every app from scratch * Computer use one year later: how much better it has gotten, why vision plus browser context is such a superpower, and why letting Claude see the thing it is working on changes everything * Why many “AI verticals” may get compressed: specialized wrappers may matter in the short term, but better general models and stronger primitives could absorb a lot of narrow use cases * The future of junior work: Felix’s concerns about entry-level roles, labor-market disruption, and whether AI can compress early-career learning into denser simulated experience * Why Waterloo grads stand out: internships, shipping experience, and learning how real teams build products versus purely theoretical academic preparation * The agentic future of the desktop: what it means for Claude to have its own computer, whether AI should act on your machine or a remote one, and how intimacy with personal data changes the product design space * Why Electron still mattered: shipping Chromium as a controlled rendering stack, the limits of OS-native webviews, and why browser engines remain one of the great software abstractions * Anthropic’s Labs mentality: wild internal experiments, half-broken future-looking prototypes, and the broader effort to move users from asking questions to delegating increasingly long and valuable tasks * Why the endgame is not just more capability, but more independence: teaching users to trust AI with bigger scopes of work, for longer durations, with fewer interventions Felix Rieseberg * X: https://x.com/felixrieseberg * LinkedIn: https://www.linkedin.com/in/felixrieseberg * Website: https://felixrieseberg.com/ Anthropic * Website: http://anthropic.com Full Video Pod Timestamps 00:00 — Cheap execution and building all the candidates00:44 — Intro in the new Kernel studio02:47 — What Claude Cowork is04:18 — Why user-friendly can be more powerful05:33 — How Anthropic built Cowork07:09 — Prototype-first product development08:00 — Why local computers still matter09:20 — Skills, primitives, and platform leverage12:13 — Cowork’s architecture: VM + Chrome + system prompt15:38 — Felix’s own bug-fixing Cowork workflows17:38 — Local-first agents20:16 — Evals, planning, and knowledge-work optimization23:14 — What Anthropic means by evals24:21 — Scaffolding, tools, and why skills matter27:44 — Demo: YouTube uploads and self-generated skills31:03 — Calendar automation and cleaning your desktop34:47 — Browser context and why DOM access matters37:47 — Skills portability and plugins44:36 — Which AI categories survive?46:19 — Junior jobs, simulated work, and labor disruption52:00 — Gradual takeoff vs big-bang takeoff53:42 — Finance, taxes, and enterprise verticals56:24 — Vision and the improvement in computer use57:31 — Why Claude writes its own scripts58:06 — Should Claude have its own computer?1:01:26 — Windows 95 in JavaScript1:03:19 — VM tradeoffs and sandbox design1:07:23 — Approval fatigue and safe delegation1:11:18 — The future of Cowork1:12:27 — What comes next for agentic knowledge work1:15:13 — Electron, Chromium, and desktop software lessons1:22:16 — Multiplayer agents and coworker-to-coworker workflows1:26:05 — Anthropic Labs and closing thoughts Transcript Alessio: Hey everyone. Welcome to the Latent Space Podcast, our first one in the new studio. This is Alessio, founder of Kernel Labs, and I’m joined by swyx, editor of Latent Space. swyx: Yeah, so nice to be here. Thanks to, uh, TJ, Alessio, Allen helping to set everything up. It looks beautiful. We even have the logo outside. Yeah, kind. Felix: It’s like really nice, right? When you walk in here as a guest, you’re like, ah, this is a serious production. You’re like, feel it immediately. swyx: Yeah. Felix, you’ve been, you’re, you’re currently a product manager of Cowork or, Felix: uh, really Technic swyx: Eng. Yeah. The, the identities are kind of vague member technical staff. Felix: I know member staff is like, the official title will carry around forever. swyx: Yeah. I basically kind of wanted, like we’ve been. Kinda obsessed. I, I’ve been using it a lot, even for managing latent space. Like, uh, cowork helps me upload videos and like title things and like edit and everything. It’s, it’s like really amazing. Alessio: Cool. He said multiple times Cowork has said gi in the group track. swyx: Yeah, yeah, yeah. So, so we have a second, uh, we have a second channel, uh, for latent space tv. Uh, and I, uh, and uh, we basically, this is our Discord meetup. Um, and I I, we have like Claude Coworks, it might be a GI, I don’t know if we, we have, uh, uploaded it yet, but one of the sessions was like a, like a Claude cowork thing. Felix: I, you have to see, I would love to see it. Like, I’m so curious, like one of the most fun parts of my job is like constantly see the weird things people use Cowork for because it’s obviously like very hard for us to actually design for specific use cases we do. But like every single person who’s like most amazed is usually amazed about a thing that I didn’t even expect cowork would be good at. Um, we have a new designer and it’s one of the first small tasks. I was like, Hey, we need like a new emoji for cowork for our internal stock. It’s like a pretty small thing. I like, can you please do it? And he drew an SVG and just gave it to coworker was like, can you animate this emoji? And now it has like this beautiful loopy animation. Um, and I mean, I think obviously this goes down to like, it turns out you can do more things with code than you expected, but it, it’s like that kind of stuff that is really fun to me. So, long story short, I would love to see like, the kind of things you’re doing. swyx: I’ll pull it up. I’ll pull it up. Felix: Yeah. Yeah. swyx: Uh, but before we get into it, I, I think always wanna start with like a top level. What is Claude Cowork for people who haven’t heard of it? Haven’t tried it out. Felix: Okay. Uh, real quick, Claude Cowork is a user friendly version of Claude Code. So the way it basically works is we have Claude Code and for us, fairly impressive agent harness that over December we noticed more and more people are using either, eve

    1h 27m
  4. MAR 12

    Retrieval After RAG: Hybrid Search, Agents, and Database Design — Simon Hørup Eskildsen of Turbopuffer

    Turbopuffer came out of a reading app. In 2022, Simon was helping his friends at Readwise scale their infra for a highly requested feature: article recommendations and semantic search. Readwise was paying ~$5k/month for their relational database and vector search would cost ~$20k/month making the feature too expensive to ship. In 2023 after mulling over the problem from Readwise, Simon decided he wanted to “build a search engine” which became Turbopuffer. We discuss:• Simon’s path: Denmark → Shopify infra for nearly a decade → “angel engineering” across startups like Readwise, Replicate, and Causal → turbopuffer almost accidentally becoming a company • The Readwise origin story: building an early recommendation engine right after the ChatGPT moment, seeing it work, then realizing it would cost ~$30k/month for a company spending ~$5k/month total on infra and getting obsessed with fixing that cost structure • Why turbopuffer is “a search engine for unstructured data”: Simon’s belief that models can learn to reason, but can’t compress the world’s knowledge into a few terabytes of weights, so they need to connect to systems that hold truth in full fidelity • The three ingredients for building a great database company: a new workload, a new storage architecture, and the ability to eventually support every query plan customers will want on their data • The architecture bet behind turbopuffer: going all in on object storage and NVMe, avoiding a traditional consensus layer, and building around the cloud primitives that only became possible in the last few years • Why Simon hated operating Elasticsearch at Shopify: years of painful on-call experience shaped his obsession with simplicity, performance, and eliminating state spread across multiple systems • The Cursor story: launching turbopuffer as a scrappy side project, getting an email from Cursor the next day, flying out after a 4am call, and helping cut Cursor’s costs by 95% while fixing their per-user economics • The Notion story: buying dark fiber, tuning TCP windows, and eating cross-cloud costs because Simon refused to compromise on architecture just to close a deal faster • Why AI changes the build-vs-buy equation: it’s less about whether a company can build search infra internally, and more about whether they have time especially if an external team can feel like an extension of their own • Why RAG isn’t dead: coding companies still rely heavily on search, and Simon sees hybrid retrieval semantic, text, regex, SQL-style patterns becoming more important, not less • How agentic workloads are changing search: the old pattern was one retrieval call up front; the new pattern is one agent firing many parallel queries at once, turning search into a highly concurrent tool call • Why turbopuffer is reducing query pricing: agentic systems are dramatically increasing query volume, and Simon expects retrieval infra to adapt to huge bursts of concurrent search rather than a small number of carefully chosen calls • The philosophy of “playing with open cards”: Simon’s habit of being radically honest with investors, including telling Lachy Groom he’d return the money if turbopuffer didn’t hit PMF by year-end • The “P99 engineer”: Simon’s framework for building a talent-dense company, rejecting by default unless someone on the team feels strongly enough to fight for the candidate —Simon Hørup Eskildsen• LinkedIn: https://www.linkedin.com/in/sirupsen• X: https://x.com/Sirupsen• https://sirupsen.com/aboutturbopuffer• https://turbopuffer.com/ Full Video Pod Timestamps 00:00:00 The PMF promise to Lachy Groom00:00:25 Intro and Simon's background00:02:19 What turbopuffer actually is00:06:26 Shopify, Elasticsearch, and the pain behind the company00:10:07 The Readwise experiment that sparked turbopuffer00:12:00 The insight Simon couldn’t stop thinking about00:17:00 S3 consistency, NVMe, and the architecture bet00:20:12 The Notion story: latency, dark fiber, and conviction00:25:03 Build vs. buy in the age of AI00:26:00 The Cursor story: early launch to breakout customer00:29:00 Why code search still matters00:32:00 Search in the age of agents00:34:22 Pricing turbopuffer in the AI era00:38:17 Why Simon chose Lachy Groom00:41:28 Becoming a founder on purpose00:44:00 The “P99 engineer” philosophy00:49:30 Bending software to your will00:51:13 The future of turbopuffer00:57:05 Simon’s tea obsession00:59:03 Tea kits, X Live, and P99 Live Transcript Simon Hørup Eskildsen: I don’t think I’ve said this publicly before, but I just called Lockey and was like, local Lockie. Like if this doesn’t have PMF by the end of the year, like we’ll just like return all the money to you. But it’s just like, I don’t really, we, Justine and I don’t wanna work on this unless it’s really working. So we want to give it the best shot this year and like we’re really gonna go for it. We’re gonna hire a bunch of people. We’re just gonna be honest with everyone. Like when I don’t know how to play a game, I just play with open cards. Lockey was the only person that didn’t, that didn’t freak out. He was like, I’ve never heard anyone say that before. Alessio: Hey everyone, welcome to the Leading Space podcast. This is Celesio Pando, Colonel Laz, and I’m joined by Swix, editor of Leading Space. swyx: Hello. Hello, uh, we’re still, uh, recording in the Ker studio for the first time. Very excited. And today we are joined by Simon Eski. Of Turbo Farer welcome. Simon Hørup Eskildsen: Thank you so much for having me. swyx: Turbo Farer has like really gone on a huge tear, and I, I do have to mention that like you’re one of, you’re not my newest member of the Danish AHU Mafia, where like there’s a lot of legendary programmers that have come out of it, like, uh, beyond Trotro, Rasmus, lado Berg and the V eight team and, and Google Maps team. Uh, you’re mostly a Canadian now, but isn’t that interesting? There’s so many, so much like strong Danish presence. Simon Hørup Eskildsen: Yeah, I was writing a post, um, not that long ago about sort of the influences. So I grew up in Denmark, right? I left, I left when, when I was 18 to go to Canada to, to work at Shopify. Um, and so I, like, I’ve, I would still say that I feel more Danish than, than Canadian. This is also the weird accent. I can’t say th because it, this is like, I don’t, you know, my wife is also Canadian, um, and I think. I think like one of the things in, in Denmark is just like, there’s just such a ruthless pragmatism and there’s also a big focus on just aesthetics. Like, they’re like very, people really care about like where, what things look like. Um, and like Canada has a lot of attributes, US has, has a lot of attributes, but I think there’s been lots of the great things to carry. I don’t know what’s in the water in Ahu though. Um, and I don’t know that I could be considered part of the Mafi mafia quite yet, uh, compared to the phenomenal individuals we just mentioned. Barra OV is also, uh, Danish Canadian. Okay. Yeah. I don’t know where he lives now, but, and he’s the PHP. swyx: Yeah. And obviously Toby German, but moved to Canada as well. Yes. Like this is like import that, uh, that, that is an interesting, um, talent move. Alessio: I think. I would love to get from you. Definition of Turbo puffer, because I think you could be a Vector db, which is maybe a bad word now in some circles, you could be a search engine. It’s like, let, let’s just start there and then we’ll maybe run through the history of how you got to this point. Simon Hørup Eskildsen: For sure. Yeah. So Turbo Puffer is at this point in time, a search engine, right? We do full text search and we do vector search, and that’s really what we’re specialized in. If you’re trying to do much more than that, like then this might not be the right place yet, but Turbo Buffer is all about search. The other way that I think about it is that we can take all of the world’s knowledge, all of the exabytes and exabytes of data that there is, and we can use those tokens to train a model, but we can’t compress all of that into a few terabytes of weights, right? Compress into a few terabytes of weights, how to reason with the world, how to make sense of the knowledge. But we have to somehow connect it to something externally that actually holds that like in full fidelity and truth. Um, and that’s the thing that we intend to become. Right? That’s like a very holier than now kind of phrasing, right? But being the search engine for unstructured, unstructured data is the focus of turbo puffer at this point in time. Alessio: And let’s break down. So people might say, well, didn’t Elasticsearch already do this? And then some other people might say, is this search on my data, is this like closer to rag than to like a xr, like a public search thing? Like how, how do you segment like the different types of search? Simon Hørup Eskildsen: The way that I generally think about this is like, there’s a lot of database companies and I think if you wanna build a really big database company, sort of, you need a couple of ingredients to be in the air. We don’t, which only happens roughly every 15 years. You need a new workload. You basically need the ambition that every single company on earth is gonna have data in your database. Multiple times you look at a company like Oracle, right? You will, like, I don’t think you can find a company on earth with a digital presence that it not, doesn’t somehow have some data in an Oracle database. Right? And I think at this point, that’s also true for Snowflake and Databricks, right? 15 years later it’s, or even more than that, there’s not a company on earth that doesn’t, in. Or directly is consuming Snowflake or, or Databricks or any of the big analytics databases. Um, and I think we’

    1h 1m
  5. MAR 10

    NVIDIA's AI Engineers: Agent Inference at Planetary Scale and "Speed of Light" — Nader Khalil (Brev), Kyle Kranen (Dynamo)

    Join Kyle, Nader, Vibhu, and swyx live at NVIDIA GTC next week! Now that AIE Europe tix are ~sold out, our attention turns to Miami and World’s Fair! The definitive AI Accelerator chip company has more than 10xed this AI Summer: And is now a $4.4 trillion megacorp… that is somehow still moving like a startup. We are blessed to have a unique relationship with our first ever NVIDIA guests: Kyle Kranen who gave a great inference keynote at the first World’s Fair and is one of the leading architects of NVIDIA Dynamo (a Datacenter scale inference framework supporting SGLang, TRT-LLM, vLLM), and Nader Khalil, a friend of swyx from our days in Celo in The Arena, who has been drawing developers at GTC since before they were even a glimmer in the eye of NVIDIA: Nader discusses how NVIDIA Brev has drastically reduced the barriers to entry for developers to get a top of the line GPU up and running, and Kyle explains NVIDIA Dynamo as a data center scale inference engine that optimizes serving by scaling out, leveraging techniques like prefill/decode disaggregation, scheduling, and Kubernetes-based orchestration, framed around cost, latency, and quality tradeoffs. We also dive into Jensen’s “SOL” (Speed of Light) first-principles urgency concept, long-context limits and model/hardware co-design, internal model APIs (https://build.nvidia.com), and upcoming Dynamo and agent sessions at GTC. Full Video pod on YouTube Timestamps 00:00 Agent Security Basics00:39 Podcast Welcome and Guests07:19 Acquisition and DevEx Shift13:48 SOL Culture and Dynamo Setup27:38 Why Scale Out Wins29:02 Scale Up Limits Explained30:24 From Laptop to Multi Node33:07 Cost Quality Latency Tradeoffs38:42 Disaggregation Prefill vs Decode41:05 Kubernetes Scaling with Grove43:20 Context Length and Co Design57:34 Security Meets Agents58:01 Agent Permissions Model59:10 Build Nvidia Inference Gateway01:01:52 Hackathons And Autonomy Dreams01:10:26 Local GPUs And Scaling Inference01:15:31 Long Running Agents And SF Reflections Transcript Agent Security Basics Nader: Agents can do three things. They can access your files, they can access the internet, and then now they can write custom code and execute it. You literally only let an agent do two of those three things. If you can access your files and you can write custom code, you don’t want internet access because that’s one to see full vulnerability, right? If you have access to internet and your file system, you should know the full scope of what that agent’s capable of doing. Otherwise, now we can get injected or something that can happen. And so that’s a lot of what we’ve been thinking about is like, you know, how do we both enable this because it’s clearly the future. But then also, you know, what, what are these enforcement points that we can start to like protect? swyx: All right. Podcast Welcome and Guests swyx: Welcome to the Lean Space podcast in the Chromo studio. Welcome to all the guests here. Uh, we are back with our guest host Viu. Welcome. Good to have you back. And our friends, uh, Netter and Kyle from Nvidia. Welcome. Kyle: Yeah, thanks for having us. swyx: Yeah, thank you. Actually, I don’t even know your titles. Uh, I know you’re like architect something of Dynamo. Kyle: Yeah. I, I’m one of the engineering leaders [00:01:00] and a architects of Dynamo. swyx: And you’re director of something and developers, developer tech. Nader: Yeah. swyx: You’re the developers, developers, developers guy at nvidia, Nader: open source agent marketing, brev, swyx: and like Nader: Devrel tools and stuff. swyx: Yeah. Been Nader: the focus. swyx: And we’re, we’re kind of recording this ahead of Nvidia, GTC, which is coming to town, uh, again, uh, or taking over town, uh, which, uh, which we’ll all be at. Um, and we’ll talk a little bit about your sessions and stuff. Yeah. Nader: We’re super excited for it. GTC Booth Stunt Stories swyx: One of my favorite memories for Nader, like you always do like marketing stunts and like while you were at Rev, you like had this surfboard that you like, went down to GTC with and like, NA Nvidia apparently, like did so much that they bought you. Like what, what was that like? What was that? Nader: Yeah. Yeah, we, we, um. Our logo was a chaka. We, we, uh, we were always just kind of like trying to keep true to who we were. I think, you know, some stuff, startups, you’re like trying to pretend that you’re a bigger, more mature company than you are. And it was actually Evan Conrad from SF Compute who was just like, you guys are like previous swyx: guest. Yeah. Nader: Amazing. Oh, really? Amazing. Yeah. He was just like, guys, you’re two dudes in the room. Why are you [00:02:00] pretending that you’re not? Uh, and so then we were like, okay, let’s make the logo a shaka. We brought surfboards to our booth to GTC and the energy was great. Yeah. Some palm trees too. They, Kyle: they actually poked out over like the, the walls so you could, you could see the bread booth. Oh, that’s so funny. And Nader: no one else, Kyle: just from very far away. Nader: Oh, so you remember it back Kyle: then? Yeah I remember it pre-acquisition. I was like, oh, those guys look cool, Nader: dude. That makes sense. ‘cause uh, we, so we signed up really last minute, and so we had the last booth. It was all the way in the corner. And so I was, I was worried that no one was gonna come. So that’s why we had like the palm trees. We really came in with the surfboards. We even had one of our investors bring her dog and then she was just like walking the dog around to try to like, bring energy towards our booth. Yeah. swyx: Steph. Kyle: Yeah. Yeah, she’s the best, swyx: you know, as a conference organizer, I love that. Right? Like, it’s like everyone who sponsors a conference comes, does their booth. They’re like, we are changing the future of ai or something, some generic b******t and like, no, like actually try to stand out, make it fun, right? And people still remember it after three years. Nader: Yeah. Yeah. You know what’s so funny? I’ll, I’ll send, I’ll give you this clip if you wanna, if you wanna add it [00:03:00] in, but, uh, my wife was at the time fiance, she was in medical school and she came to help us. ‘cause it was like a big moment for us. And so we, we bought this cricket, it’s like a vinyl, like a vinyl, uh, printer. ‘cause like, how else are we gonna label the surfboard? So, we got a surfboard, luckily was able to purchase that on the company card. We got a cricket and it was just like fine tuning for enterprises or something like that, that we put on the. On the surfboard and it’s 1:00 AM the day before we go to GTC. She’s helping me put these like vinyl stickers on. And she goes, you son of, she’s like, if you pull this off, you son of a b***h. And so, uh, right. Pretty much after the acquisition, I stitched that with the mag music acquisition. I sent it to our family group chat. Oh swyx: Yeah. No, well, she, she made a good choice there. Was that like basically the origin story for Launchable is that we, it was, and maybe we should explain what Brev is and Nader: Yeah. Yeah. Uh, I mean, brev is just, it’s a developer tool that makes it really easy to get a GPU. So we connect a bunch of different GPU sources. So the basics of it is like, how quickly can we SSH you into a G, into a GPU and whenever we would talk to users, they wanted A GPU. They wanted an A 100. And if you go to like any cloud [00:04:00] provisioning page, usually it’s like three pages of forms or in the forms somewhere there’s a dropdown. And in the dropdown there’s some weird code that you know to translate to an A 100. And I remember just thinking like. Every time someone says they want an A 100, like the piece of text that they’re telling me that they want is like, stuffed away in the corner. Yeah. And so we were like, what if the biggest piece of text was what the user’s asking for? And so when you go to Brev, it’s just big GPU chips with the type that you want with swyx: beautiful animations that you worked on pre, like pre you can, like, now you can just prompt it. But back in the day. Yeah. Yeah. Those were handcraft, handcrafted artisanal code. Nader: Yeah. I was actually really proud of that because, uh, it was an, i I made it in Figma. Yeah. And then I found, I was like really struggling to figure out how to turn it from like Figma to react. So what it actually is, is just an SVG and I, I have all the styles and so when you change the chip, whether it’s like active or not it changes the SVG code and that somehow like renders like, looks like it’s animating, but it, we just had the transition slow, but it’s just like the, a JavaScript function to change the like underlying SVG. Yeah. And that was how I ended up like figuring out how to move it from from Figma. But yeah, that’s Art Artisan. [00:05:00] Kyle: Speaking of marketing stunts though, he actually used those SVGs. Or kind of use those SVGs to make these cards. Nader: Oh yeah. Like Kyle: a GPU gift card Yes. That he handed out everywhere. That was actually my first impression of that Nader: one. Yeah, swyx: yeah, yeah. Nader: Yeah. swyx: I think I still have one of them. Nader: They look great. Kyle: Yeah. Nader: I have a ton of them still actually in our garage, which just, they don’t have labels. We should honestly like bring, bring them back. But, um, I found this old printing press here, actually just around the corner on Ven ness. And it’s a third generation San Francisco shop. And so I come in an excited startup founder trying to like, and they just have this crazy old machinery and I’m in awe. ‘cause the the whole building is so physical. Like you’re seeing these machines, they have like pedals to like move these saws and whatever. I don’t know what this machinery is, but I saw all three generations. Like there’s like the grandpa

    1h 24m
  6. MAR 6

    Cursor's Third Era: Cloud Agents

    All speakers are announced at AIE EU, schedule coming soon. Join us there or in Miami with the renowned organizers of React Miami! Singapore CFP also open! We’ve called this out a few times over in AINews, but the overwhelming consensus in the Valley is that “the IDE is Dead”. In November it was just a gut feeling, but now we actually have data: even at the canonical “VSCode Fork” company, people are officially using more agents than tab autocomplete (the first wave of AI coding): Cursor has launched cloud agents for a few months now, and this specific launch is around Computer Use, which has come a long way since we first talked with Anthropic about it in 2024, and which Jonas productized as Autotab: We also take the opportunity to do a live demo, talk about slash commands and subagents, and the future of continual learning and personalized coding models, something that Sam previously worked on at New Computer. (The fact that both of these folks are top tier CEOs of their own startups that have now joined the insane talent density gathering at Cursor should also not be overlooked). Full Episode on YouTube! please like and subscribe! Timestamps 00:00 Agentic Code Experiments00:53 Why Cloud Agents Matter02:08 Testing First Pillar03:36 Video Reviews Second Pillar04:29 Remote Control Third Pillar06:17 Meta Demos and Bug Repro13:36 Slash Commands and MCPs18:19 From Tab to Team Workflow31:41 Minimal Web UI Philosophy32:40 Why No File Editor34:38 Full Stack Cursor Debate36:34 Model Choice and Auto Routing38:34 Parallel Agents and Best Of N41:41 Subagents and Context Management44:48 Grind Mode and Throughput Future01:00:24 Cloud Agent Onboarding and Memory Transcript EP 77 - CURSOR - Audio version [00:00:00] Agentic Code Experiments Samantha: This is another experiment that we ran last year and didn’t decide to ship at that time, but may come back to LM Judge, but one that was also agentic and could write code. So it wasn’t just picking but also taking the learnings from two models or and models that it was looking at and writing a new diff. And what we found was that there were strengths to using models from different model providers as the base level of this process. Basically you could get almost like a synergistic output that was better than having a very unified like bottom model tier. Jonas: We think that over the coming months, the big unlock is not going to be one person with a model getting more done, like the water flowing faster and we’ll be making the pipe much wider and so paralyzing more, whether that’s swarms of agents or parallel agents, both of those are things that contribute to getting much more done in the same amount of time. Why Cloud Agents Matter swyx: This week, one of the biggest launches that Cursor’s ever done is cloud agents. I think you, you had [00:01:00] cloud agents before, but this was like, you give cursor a computer, right? Yeah. So it’s just basically they bought auto tab and then they repackaged it. Is that what’s going on, or, Jonas: that’s a big part of it. Yeah. Cloud agents already ran in their own computers, but they were sort of site reading code. Yeah. And those computers were not, they were like blank VMs typically that were not set up for the Devrel X for whatever repo the agents working on. One of the things that we talk about is if you put yourself in the model shoes and you were seeing tokens stream by and all you could do was cite read code and spit out tokens and hope that you had done the right thing, swyx: no chance Jonas: I’d be so bad. Like you obviously you need to run the code. And so that I think also is probably not that contrarian of a take, but no one has done that yet. And so giving the model the tools to onboard itself and then use full computer use end-to-end pixels in coordinates out and have the cloud computer with different apps in it is the big unlock that we’ve seen internally in terms of use usage of this going from, oh, we use it for little copy changes [00:02:00] to no. We’re really like driving new features with this kind of new type of entech workflow. Alright, let’s see it. Cool. Live Demo Tour Jonas: So this is what it looks like in cursor.com/agents. So this is one I kicked off a while ago. So on the left hand side is the chat. Very classic sort of agentic thing. The big new thing here is that the agent will test its changes. So you can see here it worked for half an hour. That is because it not only took time to write the tokens of code, it also took time to test them end to end. So it started Devrel servers iterate when needed. And so that’s one part of it is like model works for longer and doesn’t come back with a, I tried some things pr, but a I tested at pr that’s ready for your review. One of the other intuition pumps we use there is if a human gave you a PR asked you to review it and you hadn’t, they hadn’t tested it, you’d also be annoyed because you’d be like, only ask me for a review once it’s actually ready. So that’s what we’ve done with Testing Defaults and Controls swyx: simple question I wanted to gather out front. Some prs are way smaller, [00:03:00] like just copy change. Does it always do the video or is it sometimes, Jonas: Sometimes. swyx: Okay. So what’s the judgment? Jonas: The model does it? So we we do some default prompting with sort. What types of changes to test? There’s a slash command that people can do called slash no test, where if you do that, the model will not test, swyx: but the default is test. Jonas: The default is to be calibrated. So we tell it don’t test, very simple copy changes, but test like more complex things. And then users can also write their agents.md and specify like this type of, if you’re editing this subpart of my mono repo, never tested ‘cause that won’t work or whatever. Videos and Remote Control Jonas: So pillar one is the model actually testing Pillar two is the model coming back with a video of what it did. We have found that in this new world where agents can end-to-end, write much more code, reviewing the code is one of these new bottlenecks that crop up. And so reviewing a video is not a substitute for reviewing code, but it is an entry point that is much, much easier to start with than glancing at [00:04:00] some giant diff. And so typically you kick one off you, it’s done you come back and the first thing that you would do is watch this video. So this is a, video of it. In this case I wanted a tool tip over this button. And so it went and showed me what that looks like in, in this video that I think here, it actually used a gallery. So sometimes it will build storybook type galleries where you can see like that component in action. And so that’s pillar two is like these demo videos of what it built. And then pillar number three is I have full remote control access to this vm. So I can go heat in here. I can hover things, I can type, I have full control. And same thing for the terminal. I have full access. And so that is also really useful because sometimes the video is like all you need to see. And oftentimes by the way, the video’s not perfect, the video will show you, is this worth either merging immediately or oftentimes is this worth iterating with to get it to that final stage where I am ready to merge in. So I can go through some other examples where the first video [00:05:00] wasn’t perfect, but it gave me confidence that we were on the right track and two or three follow-ups later, it was good to go. And then I also have full access here where some things you just wanna play around with. You wanna get a feel for what is this and there’s no substitute to a live preview. And the VNC kind of VM remote access gives you that. swyx: Amazing What, sorry? What is VN. And Jonas: just the remote desktop. Remote desktop. Yeah. swyx: Sam, any other details that you always wanna call out? Samantha: Yeah, for me the videos have been super helpful. I would say, especially in cases where a common problem for me with agents and cloud agents beforehand was almost like under specification in my requests where our plan mode and going really back and forth and getting detailed implementation spec is a way to reduce the risk of under specification, but then similar to how human communication breaks down over time, I feel like you have this risk where it’s okay, when I pull down, go to the triple of pulling down and like running this branch locally, I’m gonna see that, like I said, this should be a toggle and you have a checkbox and like, why didn’t you get that detail? And having the video up front just [00:06:00] has that makes that alignment like you’re talking about a shared artifact with the agent. Very clear, which has been just super helpful for me. Jonas: I can quickly run through some other Yes. Examples. Meta Agents and More Demos Jonas: So this is a very front end heavy one. So one question I was swyx: gonna say, is this only for front Jonas: end? Exactly. One question you might have is this only for front end? So this is another example where the thing I wanted it to implement was a better error message for saving secrets. So the cloud agents support adding secrets, that’s part of what it needs to access certain systems. Part of onboarding that is giving access. This is cloud is working on swyx: cloud agents. Yes. Jonas: So this is a fun thing is Samantha: it can get super meta. It Jonas: can get super meta, it can start its own cloud agents, it can talk to its own cloud agents. Sometimes it’s hard to wrap your mind around that. We have disabled, it’s cloud agents starting more cloud agents. So we currently disallow that. Someday you might. Someday we might. Someday we might. So this actually was mostly a backend change in terms of the error handling here, where if the [00:07:00] secret is far too large, it would oh, this is actually really cool. Wow. That’s the Devrel tools. That

    1h 7m
  7. MAR 5

    Every Agent Needs a Box — Aaron Levie, Box

    The reception to our recent post on Code Reviews has been strong. Catch up! Amid a maelstrom of discussion on whether or not AI is killing SaaS, one of the top publicly listed SaaS companies in the world has just reported record revenues, clearing well over $1.1B in ARR for the first time with a 28% margin. As we comment on the pod, Aaron Levie is the rare public company CEO equally at home in both worlds of Silicon Valley and Wall Street/Main Street, by day helping 70% of the Fortune 500 with their Enterprise Advanced Suite, and yet by night is often found in the basements of early startups and tweeting viral insights about the future of agents. Now that both Cursor, Cloudflare, Perplexity, Anthropic and more have made Filesystems and Sandboxes and various forms of “Just Give the Agent a Box” cool (not just cool; it is now one of the single hottest areas in AI infrastructure growing 100% MoM), we find it a delightfully appropriate time to do the episode with the OG CEO who has been giving humans and computers Boxes since he was a college dropout pitching VCs at a Michael Arrington house party. Enjoy our special pod, with fan favorite returning guest/guest cohost Jeff Huber! Note: We didn’t directly discuss the AI vs SaaS debate - Aaron has done many, many, many other podcasts on that, and you should read his definitive essay on it. Most commentators do not understand SaaS businesses because they have never scaled one themselves, and deeply reflected on what the true value proposition of SaaS is. We also discuss Your Company is a Filesystem: We also shoutout CTO Ben Kus’ and the AI team, who talked about the technical architecture and will return for AIE WF 2026. Full Video Episode Timestamps * 00:00 Adapting Work for Agents * 01:29 Why Every Agent Needs a Box * 04:38 Agent Governance and Identity * 11:28 Why Coding Agents Took Off First * 21:42 Context Engineering and Search Limits * 31:29 Inside Agent Evals * 33:23 Industries and Datasets * 35:22 Building the Agent Team * 38:50 Read Write Agent Workflows * 41:54 Docs Graphs and Founder Mode * 55:38 Token FOMO Culture * 56:31 Production Function Secrets * 01:01:08 Film Roots to Box * 01:03:38 AI Future of Movies * 01:06:47 Media DevRel and Engineering Transcript Adapting Work for Agents Aaron Levie: Like you don’t write code, you talk to an agent and it goes and does it for you, and you may be at best review it. That’s even probably like, like largely not even what you’re doing. What’s happening is we are changing our work to make the agents effective. In that model, the agent didn’t really adapt to how we work. We basically adapted to how the agent works. All of the economy has to go through that exact same evolution. Right now, it’s a huge asset and an advantage for the teams that do it early and that are kinda wired into doing this ‘cause you’ll see compounding returns. But that’s just gonna take a while for most companies to actually go and get this deployed. swyx: Welcome to the Lane Space Pod. We’re back in the chroma studio with uh, chroma, CEO, Jeff Hoover. Welcome returning guest now guest host. Aaron Levie: It’s a pleasure. Wow. How’d you get upgraded to, uh, to that? swyx: Because he’s like the perfect guy to be guest those for you. Aaron Levie: That makes sense actually, for We love context. We, we both really love context le we really do. We really do. swyx: Uh, and we’re here with, uh, Aaron Levy. Welcome. Aaron Levie: Thank you. Good to, uh, good to be [00:01:00] here. swyx: Uh, yeah. So we’ve all met offline and like chatted a little bit, but like, it’s always nice to get these things in person and conversation. Yeah. You just started off with so much energy. You’re, you’re super excited about agents. I love Aaron Levie: agents. swyx: Yeah. Open claw. Just got by, got bought by OpenAI. No, not bought, but you know, you know what I mean? Aaron Levie: Some, some, you know, acquihire. Executive swyx: hire. Aaron Levie: Executive hire. Okay. Executive hire. Say, swyx: hey, that’s my term. Okay. Um, what are you pounding the table on on agents? You have so many insightful tweets. Why Every Agent Needs a Box Aaron Levie: Well, the thing that, that we get super excited by that I think is probably, you know, should be relatively obvious is we’ve, we’ve built a platform to help enterprises manage their files and their, their corporate files and the permissions of who has access to those files and the sharing collaboration of those files. All of those files contain really, really important information for the enterprise. It might have your contracts, it might have your research materials, it might have marketing information, it might have your memos. All that data obviously has, you know, predominantly been used by humans. [00:02:00] But there’s been one really interesting problem, which is that, you know, humans only really work with their files during an active engagement with them, and they kind of go away and you don’t really see them for a long time. And all of a sudden, uh, with the power of AI and AI agents, all of that data becomes extremely relevant as this ongoing source of, of answers to new questions of data that will transform into, into something else that, that produces value in your organization. It, it contains the answer to the new employee that’s onboarding, that needs to ramp up on a project. Um, it contains the answer to the right thing to sell a customer when you’re having a conversation to them, with them contains the roadmap information that’s gonna produce the next feature. So all that data. That previously we’ve been just sort of storing and, and you know, occasionally forgetting about, ‘cause we’re only working on the new active stuff. All of that information becomes valuable to the enterprise and it’s gonna become extremely valuable to end users because now they can have agents go find what they’re looking for and produce new, new [00:03:00] value and new data on that information. And it’s gonna become incredibly valuable to agents because agents can roam around and do a bunch of work and they’re gonna need access to that data as well. And um, and you know, sometimes that will be an agent that is sort of working on behalf of, of, of you and, and effectively as you as and, and they are kind of accessing all of the same information that you have access to and, and operating as you in the system. And then sometimes there’s gonna be agents that are just. Effectively autonomous and kind of run on their own and, and you’re gonna collaborate and work with them kind of like you did another person. Open Claw being the most recent and maybe first real sort of, you know, kind of, you know, up updating everybody’s, you know, views of this landscape version of, of what that could look like, which is, okay, I have an agent. It’s on its own system, it’s on its own computer, it has access to its own tools. I probably don’t give it access to my entire life. I probably communicate with it like I would an assistant or a colleague and then it, it sort of has this sandbox environment. So all of that has massive implications for a platform that manage that [00:04:00] enterprise data. We think it’s gonna just transform how we work with all of the enterprise content that we work with, and we just have to make sure we’re building the right platform to support that. swyx: The sort of shorthand I put it is as people build agents, everybody’s just realizing that every agent needs a box. Yes. And it’s nice to be called box and just give everyone a box. Aaron Levie: Hey, I if I, you know, if we can make that go viral, uh, like I, I think that that terminology, I, that’s the swyx: tagline. Every agent Aaron Levie: needs a box. Every agent needs a box. If we can make that the headline of this, I’m fine with this. And that’s the billboard I wanna like Yeah, exactly. Every agent needs a box. Um, I like it. Can we ship this? Like, swyx: okay, let’s do it. Yeah. Aaron Levie: Uh, my work here is done and I got the value I needed outta this podcast Drinks. swyx: Yeah. Agent Governance and Identity Aaron Levie: But, but, um, but, but, you know, so the thing that we, we kind of think about is, um, is, you know, whether you think the number 10 x or a hundred x or whatever the number is, we’re gonna have some order of magnitude more agents than people. That’s inevitable. It has to happen. So then the question is, what is the infrastructure that’s needed to make all those agents effective in the enterprise? Make sure that they are well governed. Make sure they’re only doing [00:05:00] safe things on your information. Make sure that they’re not getting exposed. The data that they shouldn’t have access to. There’s gonna be just incredibly spectacularly crazy security incidents that will happen with agents because you’ll prompt, inject an agent and sort of find your way through the CRM system and pull out data that you shouldn’t have access to. Oh, we Jeff Huber: have God, Aaron Levie: right? I mean, that’s just gonna happen all over the place, right? So, so then the thing is, is how do you make sure you have the right security, the permissions, the access controls, the data governance. Um, we actually don’t yet exactly know in many cases how we’re gonna regulate some of these agents, right? If you think about an agent in financial services, does it have the exact same financial sort of, uh, requirements that a human did? Or is it, is the risk fully on the human that was interacting or created the agent? All open questions, but no matter what, there’s gonna need to be a layer that manages the, the data they have access to, the workflows that they’re involved in, pulling up data from multiple systems. This is the new infrastructure opportunity in the era of agents. swyx: You have a piece on agent identities, [00:06:00] which I think was

    1h 17m
  8. FEB 27

    METR’s Joel Becker on exponential Time Horizon Evals, Threat Models, and the Limits of AI Productivity

    This is a free preview of a paid episode. To hear more, visit www.latent.space AIE Europe CFP and AIE World’s Fair paper submissions for CAIS peer review are due TODAY - do not delay! Last call ever. We’re excited to welcome METR for their first LS Pod, hopefully the first of many: METR are keepers of currently the single most infamous chart in AI: But every Latent Space reader should be sophisticated enough to know that the details matter and that hype and hyperbole go hand in hand in AI social media, because the millions of impressions that got, by people who don’t understand or care about the nuances, disclaimers, and error bars, far outreaches the 69k views on the corrections by the people who actually made the chart: There’s a lot of nuance both in making benchmarks (as we discovered with OpenAI on our SWE-Bench Verified podcast) and in extrapolating results from them, especially where exponentials and sigmoids are concerned. METR’s Long Horizons work itself has known biases that the authors have responsibly disclosed, but go far too underappreciated in the pursuit of doomer chart porn. If you’re interested in a short, sharable TED talk version of this pod, over at AIE CODE we were blessed to feature Joel twice, as a stage talk and with a longer form small workshop with Q&A: We also make sure cover some of METR’s lesser known work on Threat Evaluation but also Developer Productivity, where 2x friend of the pod and now Zyphra founder Quentin Anthony was the ONLY productive participant! Finally, if you’re the sort to read these show notes to the end, then you definitely deserve some pictures of Joel shredding the guitar at Love Band Karaoke which we mention at the end: Full Video Pod Timestamps 00:00 What METR Means00:39 Podcast Intro With Joel01:39 ME vs TR03:33 Time Horizon Origin Story04:56 Picking Tasks And Biases09:13 Time Horizon Misconceptions11:37 Opus 4.5 And Trendlines14:27 Productivity Studies And Explosions29:50 Compute Slows Progress30:47 Algorithms Need Compute32:45 Industry Spend and Data34:57 Clusters and Shipping Timelines36:44 Prediction Markets for Models38:10 Manifold Alpha Story43:04 Beyond Benchmarks Evals51:39 METR Roadmap and Farewell Transcript

    56 min
4.6
out of 5
97 Ratings

About

The podcast by and for AI Engineers! In 2025, over 10 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0. We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al. Full show notes always on https://latent.space www.latent.space

You Might Also Like