Check out the conversation on Apple, Spotify, and YouTube. Brought to you by: * Maven - Get a $675 discount off Gabor’s course with my code * Amplitude - The market-leader in product analytics * Testkube - The leading test orchestration platform * Land PM Job - My 12-week AI PM + Job Search Course starts Monday! * Product Faculty - Get $550 off their #1 AI PM Certification with code AAKASH550C7 Today’s episode Here’s the problem with most Claude Cost demos: they stop at the prototype. Nobody shows what happens next. You try to add a second feature. The first one breaks. The styling reverts to default. The code is so tangled that you spend more time debugging than you saved by generating. Gabor Mayer showed me what happens when you stop treating Claude Code like a magic prompt box and start treating it like a team. He is a PM at Google. He has not written production code in 15 years. But over the past several months, he has been building real mobile apps using 21 specialized Claude Code agents. Not prototypes that live in a demo. Apps that are on the App Store. In today’s episode, he walked through the entire workflow live and share all the resources free. If you want access to my AI tool stack - Dovetail, Arize, Linear, Descript, Reforge Build, DeepSky, Relay.app, Magic Patterns, Speechify, and Mobbin - grab Aakash’s bundle. Do you want to become an AI PM? I’ve created a course for you. Starts next week. Newsletter deep dive Thank you for having me in your inbox. Here is the complete guide to building a full AI development team in Claude Code: * Why one-prompt vibe coding fails * The 21-agent team architecture * The spec-first workflow * From design to code without touching either * What changes when PMs actually build Save this. The full 10-step playbook on one page. Everything below is the why and how behind each step. 1. Why one-prompt vibe coding fails Every PM I know has built something with Bolt, Lovable, or Replit. The prototype looks great. It runs. It impresses people in a Slack message. Then you try to ship it to real users. And you hit a wall. Blocker 1 - Context compression silently destroys your spec This is the failure mode that nobody talks about in tutorials. When you give one agent one massive prompt, the model compresses context. Details get dropped. Not randomly. Strategically. The model decides what is “important” and what is not. In the episode, Gabor defined a complete color palette. Oranges, neutrals, specific accent tones. The agent received everything. The output used none of it. The layout was there. The structure was solid. But every color was a default. The reason is straightforward. When the context window is full, visual styling details are lower priority than functional logic. So the model drops them. Silently. Without warning. Without an error message. You just get generic output and wonder what went wrong. The fix is not better prompts. It is context engineering. Smaller, scoped tasks. Each agent gets only the context it needs for its specific job. The designer agent gets the brand guideline. The CTO agent gets the architecture spec. Neither gets the full 50-page document. Blocker 2 - AI-generated code compiles but is not maintainable A Reddit comment that hit home for Gabor - “Vibe coding is just the rebranding of unmaintainable, low-quality source code.” This is the real prototype-to-production gap. The code works today. You can demo it. You can push it to TestFlight. But the moment you touch it to add a feature, three other features break. No naming conventions. Circular references between modules. Zero comments explaining why anything was built the way it was. The fix is a dedicated code quality agent. Gabor calls his the Spaghetti Agent. It runs after every sprint and checks naming conventions, circular references, comment coverage, and structural debt. When he ran it on his codebase for the first time, it caught issues he never would have found manually. If you are building anything beyond a one-off demo, this agent is not optional. I covered similar quality patterns in my AI testing guide and my AI evals deep dive. Blocker 3 - No dependency mapping means cascading failures When you build without organizing work into sprints, agents try to build features that depend on code that does not exist yet. Front-end components reference API endpoints that have not been created. Database queries call tables that have not been defined. The Atlassian MCP currently cannot create sprints directly in JIRA. That is a real limitation. Gabor uses tags as a workaround. He tags tickets as Sprint 1, Sprint 2, Sprint 3 and maps dependencies between them manually before starting the build. Without this step, the entire multi-agent workflow falls apart. Every PM who has gone from prototype to production with AI agents has hit at least one of these blockers. The ones who shipped figured out the workarounds. The ones who quit assumed the tools were the problem. Here is what the three blockers look like side by side, and what flips the moment you stop one-prompting and start running a team. 2. The 21-agent team architecture You do not need 21 agents to start. Three will get you surprisingly far. But understanding the full architecture shows you where the complexity lives and which roles to add as your projects grow. Here is the full roster: four clusters, 21 roles, and the markdown file pattern that makes them portable across every project you build next. 2a. The core agents every PM needs The System Analyst is the linchpin. It breaks down product requirements into technical specifications. It asks clarifying questions one at a time. It documents decisions in Confluence. It creates tickets in JIRA. Without this agent, every other agent operates on incomplete context. In the episode, the system analyst asked 14 clarifying questions before a single line of documentation was written. Vector DB choice. Usage limit mechanics. Conversation history handling. Search fallback strategy. API provider. Minimum iOS version. Screen count. Naming conventions. Each question one at a time so the answers stay deep. The prompt pattern that makes this work - “Please act like a good system analyst. Ask clarifying questions until you have a complete and comprehensive understanding. Ask questions one at a time. Do not start writing documentation until all questions are answered.” Two critical instructions. “One at a time” prevents the agent from dumping 25 questions at once. “Do not start writing” stops it from jumping ahead before the spec is complete. Different LLMs have different tendencies. Some love to start coding instantly. You need to explicitly constrain them. This is the same principle behind the prompt engineering techniques that work across any AI tool. The Spaghetti Agent handles code maintainability. Naming conventions. Circular references. Comment quality. Structural debt. Born from that Reddit comment. When Gabor ran it on his codebase for the first time, it caught problems he never knew existed. The UX Flow Architect creates clickable prototypes using Figma’s built-in prototyping arrows. This is a small but important detail. The early versions of this agent placed visual drawn arrows between screens instead of using Figma’s actual prototyping connections. The prototype looked like it had navigation. But when you clicked play, nothing happened. It took months of iteration to fix. Each agent has a specific Claude Code agent markdown file that defines its role, its constraints, and its interaction patterns. The setup mirrors how you would build a Claude Code Team OS for a human team. 2b. The real blockers nobody warns you about The Figma MCP color problem. When you connect Claude Code to Figma through the MCP and pass it your full specification, the screens look structurally correct but the colors are wrong. Not slightly wrong. Completely wrong. The model compressed the context and dropped your entire visual identity. The fix is to pass the brand guideline as a separate, focused input to the Designer Agent. Never bundle it with the functional spec. The Atlassian MCP sprint limitation. The MCP currently cannot create sprints directly in JIRA. Gabor uses tags as a workaround. Sprint 1, Sprint 2, Sprint 3. It works. But it means dependency mapping is a manual step in the system analyst prompt, not an automated feature. The consumer app vs Claude Code gap. An agent role you set up in the Claude consumer app does not automatically transfer to Claude Code. You need to define agents separately in both environments. The system analyst in your consumer app conversation is a different instance from the system analyst in your Claude Code agent folder. Your AI PM stack needs to account for this separation. The $200 Max plan economics. On the Max plan, a major build session uses roughly 10% of your monthly allocation. That means you get about 10 full build sessions per month. For a side project, that is plenty. For a production workflow with daily iterations, you need to be deliberate about when you run multi-agent sprints. 2c. Why reusable agents beat fresh setups Every painful lesson, every edge case fix, every API workaround gets encoded into the agent markdown file. The next project starts from a position of strength. The Spaghetti Agent that took weeks to calibrate on project one is immediately useful on project two. The UX Flow Architect that took months to stop drawing fake arrows works correctly from day one on every subsequent project. This is the compound interest of building with agents. The first project is slow. The second is faster. By the fifth, your agent team is genuinely effective. Gabor’s Maven course walks through the full setup at maven.com/gabor/productbuilder. The 21 agents are not the point. The point is that every role on a software team can be replicated by a scoped, reusable AI agent. Start with three. Add roles when you hit friction. 3. The spec-f