The new AIEWF website is live! CFPs close in 2 days and we will run our first New Engineer Orientation this weekend, get your tickets booked ASAP as they -will- sell out. Take the AI Engineering Survey and get >$2k in credits and free AIE WF tickets! One of the central tensions in the agents industry is that even while there are major decacorn agent labs like Sierra, Decagon, Notion and Cursor being built up, it is also true that it has never been easier to DIY agents, with a plethora of agent frameworks like LangGraph and Pydantic and Flue, and managed agents from Anthropic and Gemini and Amazon. There has been a wave of companies building their own background agents from Shopify to Stripe to Paradigm to Razorpay, and even Cognition’s friends Ramp have built their own coding agent with other friend Modal. You’d think Cognition might feel a bit threatened, but they’re not - even after all this, they were way oversubscribed for the $1B Series D they just announced: Walden Yan, coiner of context engineering and Chief Product Officer/Cofounder of Cognition, invited OpenInspect’s Cole Murray to talk about why the Devin is in the Details. Full conversation live on the pod today: In retrospect, async agents were the most AGI pilled bet you could make in 2024 - the models weren’t good enough yet to vibecode, and people didn’t trust AI enough to let it rip, nobody (including early Cognition) was sure about the form factors. Now it is obvious: * The first wave of AI coding tools made the developer faster but remain heavily in the loop. Copilor and Cursor’s tab autocomplete are prime examples However, the workflow was still heavily centered around and bottlenecked by the developer’s local workflow: a developer in an IDE, watching the model, accepting or rejecting changes, and pushing code one interaction at a time. * The second wave was local agents: Claude Code, Windsurf, Cursor’s agents pane: first one and increasingly many terminals all running concurrently. * The current Age of Async Agents points to a different future focused more on agent orchestration which drives end-to-end development. According to previous guest Steve Yegge, there are finer-grained 8 levels to agent adoption, but we have collapsed it into three. As Cursor’s Michael Truell put it in The third era of AI software development: Cursor is no longer primarily about writing code. It is about helping developers build the factory that creates their software. This factory is made up of fleets of agents that they interact with as teammates: providing initial direction, equipping them with the tools to work independently, and reviewing their work. The agent should not sit solely inside the developer’s flow. It should be setup to work in the background so that you can give it a task, a repo, a machine, a shell, a browser, tests, memory, and review loops to go do the work somewhere else. In less than a year, the sentiment has shifted from avoiding multi-agent systems: to suggesting approaches that actually work: From coining “context engineering” to building the infrastructure behind Devin’s 7x PR growth and jump from 16% to 80% of commits across Cognition repos, Walden Yan has had a front-row seat to the background-agent shift. In this episode, Cognition co-founder and CPO Walden Yan joins swyx alongside Cole Murray, creator of OpenInspect, to unpack why everyone is building their own Devin, what changed after the December 2025 model inflection, and why “spec to pull request” is now becoming a real production workflow. We go deep on the architecture of background agents: harness-in-the-box vs out-of-the-box, why Devin separates the “brain” from the machine, why repo setup is still one of the hardest problems, why Docker is not always enough, and how full VMs, snapshots, scoped secrets, GitHub bots, Slack integrations, and video-based testing all fit together. Walden and Cole also dig into memory, MCP limitations, multi-agent orchestration, AI code review, SRE auto-triage, PMs shipping code from Slack, Windsurf 2.0, hybrid frontier/sub-frontier systems, and the real failure mode of uncontrolled vibe coding: your codebase regressing to your worst engineer. And as agents eat software… and software eats the world… you can draw the conclusion on what is next: We discuss: * Why the engineering world is waking up to background agents and cloud agents * The December 2025 model inflection that made spec-to-PR workflows practical * Devin’s 7x merged PR growth and rise from 16% to 80% of commits * Why Cole built OpenInspect as an open-source background-agent system * The economics of $20/seat agent products and why monetization is tricky * What Cognition actually sells beyond Devin: infra, onboarding, integrations, and adoption * Harness in the box vs out of the box, and why architecture matters * Why Devin separates the brain from the machine for security and permissions * Repo setup, scoped secrets, Docker Compose, and agent-ready dev environments * Why full VMs matter when agents need to run real applications and test them * Android, macOS, Windows, nested virtualization, and machine-specific agent work * Why testing is much harder than “computer use” * Screenshots, video verification, and the “I know it works” merge moment * GitHub UX, Devin Review, AI reviewers, and agents responding to PR comments * Why MCP alone is not enough for first-class Slack and enterprise integrations * Memory, Knowledge, skills, Claude.md, and why retrieval is still unsolved * Devin’s auto-generated memories and the challenge of memory pruning * Always-on agents as permanent PMs for issues, tickets, and product areas * Sub-agents, meta-Devin management, and what multi-agent systems actually add * Why pure auto-merge vibe coding breaks down after about two weeks * AI code smells, lint rules, reward hacking, and Semgrep for agent-written code * GitAI, inline context, and preserving the “why” behind code changes * Local testing, mock servers, older codebases, and preparing companies for agents * Windsurf 2.0 and the handoff between local foreground agents and cloud background agents * SRE auto-triage, support workflows, and agents as first responders * PMs, marketing, and non-engineers creating pull requests from Slack * AI agent budgets, $1k-$5k per engineer spend, and hybrid frontier/sub-frontier systems * The rise of autonomous coding factories and who Cognition is hiring Walden Yan * X: https://x.com/walden_yan * LinkedIn: https://www.linkedin.com/in/waldenyan/ Cole Murray * X: https://x.com/_colemurray * LinkedIn: https://www.linkedin.com/in/colemurray/ * OpenInspect / Background Agents: https://github.com/ColeMurray/background-agents Timestamps 00:00:00 Introduction00:00:43 Why Everyone Is Building Their Own Devin00:01:57 Devin’s 2025 Ramp: 7x PR Growth and 80% of Commits00:03:49 OpenInspect and the Rise of Open-Source Background Agents00:07:59 What Cognition Actually Sells Beyond Devin00:09:56 Background Agent Architecture: Harness In vs Out of the Box00:12:08 Separating the Brain from the Machine00:14:07 Repo Setup, Secrets, Docker, and Full VMs00:19:13 Why Testing Is Harder Than Computer Use00:22:40 Video Verification and the “I Know It Works” Merge Moment00:23:19 GitHub UX, Devin Review, and AI Code Review00:25:42 MCP, Slack, and Enterprise Agent Integrations00:28:59 Memory, Knowledge, and Always-On Agents00:36:16 Sub-Agents, Multi-Agent Orchestration, and Meta-Devin00:43:55 Vibe Coding, Auto-Merge, and Codebase Decay00:48:38 Agent Infra, VPCs, Cloud Providers, and Fast VM Restore00:52:25 AI Code Smells, Reward Hacking, and Code Review Systems00:56:10 Making Codebases Agent-Ready00:58:30 Windsurf 2.0 and the Local-to-Cloud Agent Handoff01:01:15 SRE Auto-Triage, PMs Shipping Code, and Agent Use Cases01:04:32 Agent Budgets, Hybrid Models, and Autonomous Coding Factories01:06:51 Hiring at Cognition and OpenInspect Consulting01:07:45 Outro Transcript Introduction: Walden Yan, Cole Murray, and Context Engineering Swyx [00:00:00]: All right, we’re in the studio with Walden Yan, co-founder of Cognition, CPO. Walden [00:00:08]: Happy to be here. Swyx [00:00:09]: Which is a cool title. And coiner of context engineering. Walden [00:00:15]: Although I think there are many people who’d used the terms in various ways beforehand, but I did find that people, both internally and externally, enjoyed the upgrade from prompt engineering or model wrapping into maybe a more thoughtful way to build agents. Swyx [00:00:33]: For those who haven’t caught up on that, I have on screen the Don’t Build Multi-Agents post, which you should go read on and we might refer to, and Cole Murray, who created OpenInspect. Cole [00:00:43]: Great to be here. Swyx [00:00:43]: So let’s talk about it. Everyone is building their own Devins. What’s going on? The December Shift: From Handholding Models to Autonomous PRs Cole [00:00:51]: So I think the engineering world is waking up to this idea of background agents, cloud agents, whatever you’d like to call it. And I think we saw a shift around the December timeframe of 2025, where the models Opus 4.5 and GPT 5.2, they reached a capability where we moved away from handholding the model and being able to actually more or less autonomously drive the model. And what I mean by that is that we could pretty much go from a specification to a completed pull request, assuming the spec was good enough, with very little friction. And that paradigm alone, I think, changed a lot of how we interact with agents, and opened this world where background agents became more practical. Swyx [00:01:41]: I think for Cole, everyone experienced this in December, but I feel like there was just this increasing ramp, right? There was this moment which was, I think, Sonnet 3.7, where, You guys rewrote Devin in one night or something. So describe 2025 or how it felt from your side. Walden [00:02:01]: In retrospect, we alw