We bring Paul Stack back to cover the parts we skipped last time. What changed when the models got better and we moved from one-shot Gen AI to agentic, human-in-the-loop work? How do plan mode and tight prompts stop AI from going rogue? Want to hear how six branches, git worktrees, and a TypeScript CLI came together?
We are always happy to answer any questions, hear suggestions for new episodes, or hear from you, our listeners.
DevSecOps Talks podcast LinkedIn page
DevSecOps Talks podcast website
DevSecOps Talks podcast YouTube channel
Summary
In this episode, Mattias, Andre, and Paulina welcome back returning guest Paul from System Initiative to continue a conversation that started in the previous episode about their project Swamp. The discussion digs into how AI-assisted software development has changed over the past year, and why the real shift is not "AI writes code" but humans orchestrating multiple specialized agents with strong guardrails. Paul walks through the practical workflows, multi-layered testing, architecture-first thinking, cost discipline, and security practices his team has adopted — while the hosts push on how this applies across enterprise environments, mentoring newcomers, and the uncomfortable question of who is responsible when AI-built software fails.
Key Topics The industry crossroads: layoffs, fear, and a new reality
Before diving into technical specifics, Paul acknowledges that the industry is at "a real crazy crossroads." He references Block (formerly Square) cutting roughly 40% of their workforce, citing uncertainty about what AI means for their teams. He wants to be transparent that System Initiative also shrank — but clarifies the company did not cut people because of AI. The decision to reduce headcount came before they even knew what they were going to build next, let alone how they would build it. AI entered the picture only after they started prototyping the next version of their product.
Block's February 2026 layoffs, announced by CEO Jack Dorsey, eliminated over 4,000 positions. The move was framed as an AI-driven restructuring, making it one of the most visible examples of AI anxiety playing out in real corporate decisions.
From GenAI hype to agentic collaboration
Paul explains that AI coding quality shifted significantly around October–November of the previous year. Before that, results were inconsistent — sometimes impressive, often garbage. Then the models improved dramatically in both reasoning and code generation.
But the bigger breakthrough, in his view, was not the models themselves. It was the industry's shift from "Gen AI" — one-shot prompting where you hand over a spec and accept whatever comes back — to agentic AI, where the model acts more like a pair programmer. In that setup, the human stays in the loop, challenges the plan, adds constraints, and steers the result toward something that fits the codebase.
He gives a concrete early example: System Initiative had a CLI written in Deno (a TypeScript runtime). Because the models were well-trained on TypeScript libraries and the Deno ecosystem, they started producing decent code. Not beautiful, not perfectly architected — but functional. When Paul began feeding the agent patterns, conventions, and existing code to follow, the output became coherent with their codebase.
This led to a workflow where Paul would open six Claude Code sessions at once in separate Git worktrees — isolated copies of the repository on different branches — each building a small feature in parallel, feeding them bug reports and data, and continuously interacting with the results rather than one-shotting them.
Git worktrees let you check out multiple branches of the same repository simultaneously in separate directories. Each worktree is independent, so you can work on several features at once and merge them back via pull requests.
He later expanded this by running longer tasks on a Mac Mini accessible via Tailscale (a mesh VPN), while handling shorter tasks on his laptop — effectively distributing AI workloads across machines.
Why architecture matters more than ever
One of Paul's strongest themes is that AI shifts engineering attention away from syntax and back toward architecture. He argues that AI can generate plenty of code, but without design principles and boundaries it will produce spaghetti on top of existing spaghetti.
He introduces the idea of "the first thousand lines" — an anecdote he read recently claiming that the first thousand lines of code an agent helps write determine its path forward. If those lines are well-structured and follow clear design principles, the agent will build coherently on top of them. If they are messy and unprincipled, everything after will compound the mess.
Paul breaks software development into three layers:
- Architecture — design patterns like DDD (Domain-Driven Design), CQRS (Command Query Responsibility Segregation)
- Patterns — principles like DRY (Don't Repeat Yourself), YAGNI (You Aren't Gonna Need It), KISS (Keep It Simple)
- Taste — naming conventions, module layout, project structure, Terraform module organization
He argues the industry spent the last decade obsessing over "taste" while often mocking "ivory tower architects" — the people who designed systems but didn't write code. In an AI-driven world, those architectural concerns become critical again because the agent needs clear boundaries, domain structure, and intent to produce coherent output.
Paulina agrees and observes that this trend may also blur traditional specialization lines, pushing engineers toward becoming more general "software people" rather than narrowly front-end, back-end, or DevOps specialists.
Encoding design docs, rules, and constraints into the repo
Paul describes how his team makes architecture actionable for AI by encoding system knowledge directly into the repository. Their approach has several layers:
Design documents — Detailed docs covering the model layer (the actual objects, their purposes, how they relate), workflow construction (how models connect and pass data), and expression language behavior. These live in a /design folder in the open-source repo and describe the intent of every part of the system.
Architectural rules — The agent is explicitly told to follow Domain-Driven Design: proper separation between domains, infrastructure, repositories, and output layers. The DDD skill is loaded so the agent understands and maintains bounded contexts.
Code standards — TypeScript strict mode, no any types, named exports, passing lint and format checks. License compliance is also enforced: because the project is AGPL v3, the agent cannot pull in dependencies with incompatible licenses.
Skills — A newer mechanism for lazy-loading contextual information into the AI agent. Rather than stuffing everything into one enormous prompt, skills are loaded on demand when the agent encounters a specific type of task. This keeps context windows lean and focused.
AGPL v3 (GNU Affero General Public License) is a copyleft license that requires anyone who runs modified software over a network to make the source code available. This creates strict constraints on what dependencies can be used.
Multi-agent development: the full chain
A major part of the discussion centers on how Paul's team works with multiple specialized AI agents rather than a single all-knowing assistant. The chain looks like this:
-
Issue triage agent — When a user opens a GitHub issue, an agent evaluates whether it is a legitimate feature request or bug report. The agent's summary is posted back to the issue immediately, creating context for later stages.
-
Planning agent — If the issue is legitimate, the system enters plan mode. A specification is generated and posted for the user to review. Users can push back ("that's not how I think it should work"), and the plan is revised until everyone agrees.
-
Implementation agent — The code is written based on the approved plan, with all the design docs, architectural rules, and skills loaded as context.
-
Happy-path reviewer — A separate agent reviews the code against standards, checking that it loads correctly and appears to function.
-
Adversarial reviewer — Added just days before the recording, this agent is told: "You are a grumpy DevOps engineer and I want you to pull this code apart." It looks for security injection points, failure modes, and anything the happy-path reviewer might miss.
Both review agents write their findings as comments on the pull request, creating a visible audit trail. The PR only merges when both agents approve. If the adversarial agent flags a security vulnerability, the implementation goes back for changes.
Paul says this "Jekyll and Hyde" review setup caught a path traversal bug in their CLI during its first week. While the CLI runs locally and the risk was limited, it proved the value of adversarial review.
Path traversal is a vulnerability where an attacker can access files outside the intended directory by manipulating file paths (e.g., using ../ sequences). Even in CLI tools, this can expose sensitive files on a user's machine.
Mattias compares the overall process to a modernized CI/CD pipeline — the same stages exist (commit, test, review, promote, release), but AI replaces some of the manual implementation steps while humans stay focused on architecture, review, and acceptance.
Why external pull requests are disabled
Ficha técnica
- Programa
- FrecuenciaQuincenal
- Publicación11 de marzo de 2026 a las 23:18 UTC
- Duración53 min
- Episodio94
- ClasificaciónApto
