Agents and Engineers: Agentic AI with Dan Gerlanc

Dan Gerlanc

5.0 (12)
Technology
Updated Weekly

The podcast about AI, agentic software engineering, and entrepreneurship. Each episode is a conversation with people building and building with AI and agentic systems. Join me as I follow the stories, the behind-the-scenes, and the real people behind the code.

6d ago

The AI Skill Flip

Sheamus McGovern founded ODSC roughly twelve years ago and now splits his time between the conference business and a role as venture partner and Head of AI at Cortical Ventures. His book, The AI Skill Flip, came out of a pattern he kept hitting: data scientists and software engineers coming to him asking whether AI was going to take their jobs and what they should do about it. He wanted to write something that sat between the doom narrative and the utopian one, both of which he thinks are wrong. The "flip" in the title is the observation that the balance of skills has shifted rather than disappeared. Five years ago a software engineer spent most of their time writing raw code. Now much of that time goes to judging and evaluating what the model produced and thinking further up the stack. The same flip applies in marketing, where the skill becomes knowing what good looks like and what persona you're targeting rather than producing the asset yourself. Asked what separates people who get real value from AI from people who don't, Sheamus lands on three things. First is passion, the plain will to get a good outcome, which he compares to what separates a strong startup founder from an average one. Second is creativity, which he argues AI increases rather than eliminates, because models are sycophantic and will happily build exactly what you asked for. His example is watching people reach for Replit, Base44, or Lovable and build a dashboard, when the real question is whether a dashboard is even the right artifact in a world of agentic workflows. Dan pushes the point further, noting that dashboards existed because software was expensive to build, so you built one thing and maintained it. Third is judgment. AI is excellent at producing output and terrible at judging its value, which Sheamus frames as another instance of the automation paradox. On whether judgment can be taught, Sheamus starts at the engineering level with evaluations. Traditional numeric metrics still apply, but open-ended evaluation is the hard part, and he watched engineers struggle with it while building his first RAG and QA systems. His QA team's honest response was that the system was generating text all the time and they had no idea how to test it. Above the engineering layer, judgment comes down to domain expertise and knowing what good looks like. He tells a story about generating a thirty-page contract with Claude Code, sending it to his lawyer for a quick review, and getting billed ten hours anyway, because the lawyer still had to read every word and apply their own judgment. The other half of teaching judgment is teaching people how AI actually works, so they neither trust it completely nor dismiss it. Once you see it as sophisticated pattern recognition rather than magic, the failure modes become predictable. It works well on established APIs and badly on new libraries. He'd asked Perplexity that morning for the top twenty personal AI frameworks and got Gemini and other Google products instead of OpenClaw and the other recent entrants. Dan asks whether prompt design still matters now that agentic loops can ask their own clarifying questions, and both agree the vocabulary has moved faster than the practice. People stopped talking about context engineering and started talking about the harness, but skills and memory are still context engineering wearing different clothes. Sheamus's view is that prompt engineering, context engineering, and skills are all the same underlying muscle, and the people who learned the first are quick at the third. Most teams are still doing a poor job of it, partly because the models are good enough to paper over sloppy input and hand back generic, unoptimized, expensive results. He describes users bouncing from the $20 plan to the $100 plan and still running out of tokens with no idea where they went. The deeper problem, and the one he spends a chapter on, is that knowledge work is both open-ended and unstructured. You get stuck debugging, a five-minute task becomes two hours, and workflows you assumed were deterministic turn out not to be. His own research pipeline pulling papers from Google Scholar and arXiv keeps hitting that wall, because author institutions appear below the names on one paper, on the left-hand side of another, and not at all on a third. His own use of AI changed substantially over the two and a half years he spent writing the book, which included interviews with about thirty people. The first version of Cortical's VC sourcing pipeline was hand-architected, with AI slotted into specific stages, ChatGPT or Claude to read reports and Perplexity to search. Now a single agent with the right skills and system prompts can do the whole thing end to end. He runs a personal assistant built from Claude Code and a bit of Codex that pulls attachments from his Google Calendar, cross-references his CRM and the Crunchbase API, and prepares the monthly list of two hundred startups he has to review. What used to be three hours of review is now thirty minutes of review plus a couple of hours improving the skills, which he readily admits is not obviously a time saving yet. He's also building a speaker CRM for ODSC, which has around eight thousand speakers in its database and about a thousand submissions per conference, using planning mode and Replit, sometimes adding features live during the meeting where they're requested. That leads to the conversation's real tension. Sheamus calls it the AI credit card, and warns that AI technical debt is piling up fast. He is emphatic that this means more engineers, not fewer, ranting about a post he'd seen that morning claiming software engineering wouldn't exist as a role by 2027. If you can produce code at 10x or 100x, someone has to evaluate, judge, and maintain all of it. Dan agrees on maintenance being the dominant cost but presses on a different point: not every piece of software is worth owning. Sheamus pushes back and they agree to disagree. His counterargument is that the gatekeepers on software development are gone, and the spreadsheet, still the most-used application in most companies, is going to be replaced by generated software and generative UI, interfaces that appear when you ask for them and get thrown away afterward. Dan draws the line at company size, where a startup should build and a large company already paying for tools has to weigh marginal value against maintenance cost. On junior engineers, Sheamus introduces the idea of cognitive debt alongside technical debt. Every task you offload, you also stop practicing, and he offers himself as evidence that his Python is worse than it was two years ago. His advice to juniors is uncompromising: learn the classical skills anyway, including how languages handle memory and which libraries matter, because that's what makes you good at reading and judging generated code. Then learn how AI works on top of that, which he thinks makes juniors potentially more employable than mid-level engineers because they can be AI-native from the start. He's also more hopeful than most about the junior hiring market, attributing much of the slowdown to pandemic over-hiring and, per a study he'd read, possibly to remote work leaving nobody in the office to train new staff. He closes on what he wants from companies, which is real training rather than buying a tool and declaring victory. His model is a matrix: universal skills like prompt engineering, governance, risk assessment, and evaluation across the top, then domain-specific practice underneath. He also makes a pitch for data literacy, having watched software engineers stare at a loss function or an R-squared with no idea what to do with it. Full episode notes
Jul 14

The New Sport of Programming

Matthew Rocklin is an open-source software developer best known for creating Dask, a Python library for scalable parallel and distributed computing. He has contributed to projects including Toolz, SymPy, and Theano, worked at Anaconda and NVIDIA on the RAPIDS ecosystem, and founded Coiled, a company focused on running Dask in the cloud. He has a PhD in computer science from the University of Chicago. Dan and Matthew discuss how Coiled changed after Matthew stepped away from its ambitious, VC-backed growth path. The company went from roughly fifty people at its peak to three part-time engineers, while making more money and operating more smoothly. Matthew now connects agents to Slack, email, QuickBooks, the bank, the calendar, the codebase, and company context, which lets them spot problems that cross disciplines. Matthew describes a lightweight operating system for the company. The structure is simple, a Git repository of Markdown files records customers, employees, systems, and his own context. Agents answer ad hoc questions, produce a daily brief, and run a separate monthly cadence for close and overages. They remain read-only for now. Matthew is willing to have them issue invoices when asked, but not to cut checks or reboot virtual machines on their own. The conversation turns to the personal cost of this new leverage. Matthew does not miss writing every line of code, but he has had to reshape his attention around seven parallel agent sessions and long-running turns. He compares agentic programming to a new sport. The work demands more inspiration and creates a more addictive waiting state, so walks, breaks, and deliberate distance from the screen matter as much as technical skill. Agents have fovever changed the culture of OSS. Matthew expects conservative open source projects to protect stability while more experimental projects split off and evolve quickly. His own Frisky project rebuilds parts of Dask in Rust, runs about a hundred times faster in his early testing, and exposes telemetry so agents can understand distributed state. It is promising but still breaks often, which makes it a useful example of the stability and speed trade-off. Matthew argues that software has no intrinsic value. It matters because it cures a disease, helps someone find a home, or automates a useful outcome. When agents make implementation cheap, programmers must let go of much of the craft they built and focus on problem selection, system design, and judgment. He sees the same opening in front-end work, where agents helped him explore TypeScript, visual design, and user workflows that he had previously avoided. Dan and Matthew discuss strategies for verifying agentic output. Matthew gives agents tests, benchmarks, telemetry, line-count signals, and independent reviews, then asks them to demonstrate that his specific concerns are handled. He says the bottleneck is now his own ability to make decisions across many threads, not agent intelligence. Good context and feedback systems matter more to him than a more capable model, and he ends by encouraging programmers to play, take bigger swings, and build their own things. Full episode notes Click here to view the episode transcript. Chapters (00:00) - Coiled after the VC-backed growth phase (04:01) - A repository of context and a daily company brief (07:25) - Opportunity cost in a high-churn era (10:29) - Rewiring attention for agentic programming (15:43) - Seven sessions, long turns, and more walks (18:34) - What happens to work relationships (21:49) - Open source stability versus AI-driven change (23:56) - Frisky and the Rust rewrite of Dask (25:40) - What software is worth when it is cheap to build (29:31) - Why Matthew started building front ends (32:58) - Languages matter less than the user experience (36:08) - Higher-level programming and formal verification (38:12) - Ambition, inhibition, and larger agent tasks (40:41) - Feedback systems for checking agent work (45:04) - Context and feedback beat raw model intelligence (49:17) - AGENTS.md, documentation, and learning with local models (57:48) - Build your own things ⠀ Links from the show -------------------- Dask Coiled Frisky Qwen 3 Vite py-spy formal verification ⠀ Guests ------- Matthew Rocklin, Founder & CEO, Coiled Computing Website LinkedIn ⠀ Follow the podcast ------------------- LinkedIn Threads Instagram TikTok ⠀ Follow Dan Gerlanc ------------------- X LinkedIn Threads Bluesky
Jul 7

Skills, Context, and Trust: The New Agentic Coding Stack

Dan and Jonathan Bown open with the talk Jonathan gave at ODSC, "Practical Agent Ops: From POC to Prod with MLflow 3.0." MLflow 3.0 arrived last summer as the first stable release built for generative AI rather than traditional machine learning, and Jonathan's team used it to build an agent for pre-enrollment students. The centerpiece of that work was evaluation-driven development. Instead of jumping straight into a working prototype they aligning the business up front on what quality actually looks like before signing off on a model with inherently non-deterministic output. The initial key to success was an Excel file. In it, the data science team had already assembled 150 ground truth examples, but left them untested and set aside while engineers focused on code. Jonathan's team paused the coding work and ran a simple foundation model against those examples first, landing at what amounted to a coin flip of useful versus hallucinated answers. From there they refined the examples with the business, loaded them into MLflow's evaluation datasets built from live traces, and iterated by versioning prompts and agent configurations. Tooling came up repeatedly. MLflow's open source repo now ships a skill file that plugs into coding tools like Claude Code, which Jonathan called a game changer for keeping up with an API that changes at roughly a release a month. The Databricks AI Dev Kit, released around March, bundles skills for the Databricks SDK, CLI, data engineering, and analytics work, usable either inside Databricks' Genie Code pane or in outside tools such as Claude Code, AWS Kiro, or Google Antigravity. Jonathan said installing it produced a dramatic jump in output accuracy compared to coding assistants working from stale or incomplete context about Databricks and MLflow APIs. Dan raised the idea that LLMs and agentic tools are becoming users of software in their own right, alongside humans, and Jonathan tied that to broader changes at WGU: more of the business, not just engineers, now writes system prompts and builds their own copilot-style agents. His own day to day has moved from core development toward AI enablement, meaning security review, best practices, and helping non-technical staff adopt evaluation-driven habits for the prompts and agents they build themselves. Jonathan's path to WGU ran through Pentara, a biostatistics consultancy, and Zions Bancorporation, where he did quant finance work before a stint simulating financial products for WGU students. He became a founding member of WGU's MLOps team in 2023, when the university's machine learning was still traditional work like random forests and ensembles for predicting student outcomes, well before Databricks had built out MLOps tooling. Dan connected this to Hamel Husain's essay "The Revenge of the Data Scientist", and Jonathan agreed that evaluation-driven development brings the work full circle: checking evals and correctness is the generative AI analogue of checking a confusion matrix. The pre-enrollment agent's rollout became the clearest illustration of the method. The first release, a bare foundation model with no WGU context, drew heavy negative feedback from the employees testing it, some of whom wanted to cancel the initiative. Jonathan's team treated that feedback as fuel, folding the failed questions into an evaluation dataset and iterating until they reached roughly 82 percent correctness and near-total relevance, at which point the same employees became enthusiastic supporters. He credited MLflow's architecture for building subject matter experts directly into the agent ops workflow rather than treating evaluation as a purely technical exercise. Jonathan was candid about where his trust runs out. He does not trust a tool's first output even after a full planning session, citing a Kiro planning cycle from the day before that failed on the first try despite extensive back and forth. He is cautious about MLflow's fast release cadence outpacing its own skill files, and notably guarded about tools like OpenClaw and Claude Cowork that can reach into email or personal documents. Given how much effort WGU puts into protecting student data, he extends the same caution to his own personal information and limits what such agents can access. On his team, Jonathan resists banning AI-generated code or stigmatizing it in review, and instead pushes everyone toward reviewing code outside their usual specialty, using AI review tools like Amazon Q or GitHub Copilot as a starting point rather than a final answer. He pushed back on the idea that tool usage equals productivity, warning about AI slop and noting that some of the heaviest users he knows are not the most productive. The thread ties back to evaluation-driven development's real thesis: start from value, not from the tool, a point he illustrated with WGU's Academic Virtual Assistant pilot, where a surprising result showed that students chatting with the assistant were more likely, not less, to still reach out to a human mentor afterward. Full episode notes Click here to view the episode transcript. Chapters (00:00) - Introducing Jonathan Bown (00:56) - ODSC talk: escaping POC prison with MLflow 3.0 (05:11) - The forgotten Excel file: rebuilding around evals (09:36) - MLflow skills and the Databricks AI Dev Kit (15:00) - When AI becomes the user of your software (18:16) - How the day-to-day has changed in six months (22:16) - Centralizing prompts, evals, and best practices (25:38) - From quant finance to founding WGU's MLOps team (31:02) - The Revenge of the Data Scientist (35:01) - Has the job gotten easier or harder? (37:33) - Thinking ten steps ahead with agentic coding tools (42:15) - Leveling up junior engineers instead of gatekeeping review (51:17) - Where trust breaks down: OpenClaw and personal data (56:48) - The mental toll of managing agents versus writing code (59:27) - How much detail agentic tools actually need in a prompt (01:07:14) - Value over software: the Academic Virtual Assistant's surprise result ⠀ Links from the show -------------------- MLflow 3.0 Databricks AI Dev Kit Genie Code ODSC (Open Data Science Conference) OpenClaw AWS Kiro Google Antigravity The Revenge of the Data Scientist Amazon Q ⠀ Guests ------- Jonathan Bown, Principal ML Engineer, WGU Website LinkedIn ⠀ Follow the podcast ------------------- LinkedIn Threads Instagram TikTok ⠀ Follow Dan Gerlanc ------------------- X
Jul 2

When Software Gets Cheap, Focus Gets Expensive

Dan and Greg open with how agentic development has changed since the early days of Copilot. At the time, Greg was at GitHub, and he saw AI mostly help with boilerplate and editor completions. Cursor-style agents were the next widely-used advancement bringing session history and integrated team-wide practices. By June 2026, capable models and harnesses are common inside engineering teams, so the gap between teams increasingly comes from context engineering, repository structure, and whether old team shapes still align with the new ways of building software. For small teams and startups, the leverage of AI is a double-edged sword. Greg describes how SpecStory's original extensions required real sweat equity to reverse engineer chat-log formats across Cursor, Copilot, Claude Code, Amp, and other tools. Now, much of that surface can now be maintained by a fraction of one person's time. The danger is that easy MVPs can trick founders into believing they have validated a market. When the marginal cost of software falls, founders have to spend more of their scarce attention on demand, willingness to pay, distribution, and the routes to customers. The conversation turns to Greg's book, 25 Patterns in Agentic Engineering. He explains how he mined roughly 1,300 preserved SpecStory sessions and nearly 5,000 commits to extract durable patterns from his own agentic practice. Two patterns stand out. First, when code becomes free, verification becomes the bottleneck. Second, between agents turns, docs are the persistent API of the system. For Greg, as-built architecture documents are practical maps that let both humans and agents recover the shape of a subsystem without re-reading the entire codebase every time. Greg's development practice has changed accordingly. He favors trunk-based development and says his team uses almost no pull requests for everyday development, partly because agent-generated diffs arrive at a volume he does not want to review line by line. He prefers local agents over cloud agents that containerize the repo and open PRs later, because steering an agent while it runs keeps his mental model intact. Long unattended runs still make sense to him, but only when they start from a clear goal and a more detailed rider, with phased commits and verification points he can inspect after a walk or a night away. Dan and Greg also dig into coordination at larger scale. Greg is skeptical that issue trackers were ever clean or current enough to describe day-to-day engineering, but he sees issues becoming useful as specs with provenance and evidence that can be handed to agents. Personally, he runs several projects at once, usually three to five, with local agents in permissive modes, and rotates attention while long runs execute. That power is not free. He describes the dopamine loop of watching ideas come to life, the temptation to keep agents busy overnight, and the scarcity mindset created by subsidized access to frontier models. The episode closes with where Greg still does not trust the tools. Copywriting and visual design still require heavy human intervention because the models can blur rather than sharpen the message. He frames taste less as a mystical trait and more a selection amongst trade-offs and the ability to connect ideas in understandable ways. Coding has benefited from benchmarks and verifiable answers; much of the rest of the world is less tractable because there is no single ground truth for what "good" means. Full episode notes Click here to view the episode transcript. Chapters (00:00) - Introduction and guest background (00:55) - What agentic teams are running into (06:56) - Startup leverage, MVP traps, and maintaining SpecStory (09:26) - When software gets cheaper, distribution matters more (12:31) - Hand-written code, craft, and code as liability (16:21) - Mining 1,300 sessions into 25 patterns (19:01) - Verification and as-built architecture docs (23:55) - Co-writing docs with LLMs (25:15) - Keeping docs fresh through skills, Git, and verbose commits (27:50) - Trunk-based development for agentic teams (30:26) - Local steering versus cloud-agent pull requests (32:14) - Goal and rider plans, long runs, and Gas Town (35:52) - Replacing issue trackers with weekly docs (38:19) - Larger teams and issues as agent-ready specs (42:45) - Parallel projects and concentration limits (44:47) - Local agents, permissions, and risk judgment (46:57) - The cognitive pull of managing agents (51:58) - Scarcity, token costs, and model choice (58:57) - Copy, design, naming, and taste (01:05:04) - Why creative output resists verification (01:07:12) - Closing ⠀ Links from the show -------------------- Hardcore Agentic Engineering for builders who ship SpecStory Stoa 25 Patterns in Agentic Engineering AI Essentials for Tech Executives Meditations on Tech Beyond Code-Centric Goal Engineering WebRTC CRDT Trunk-based development Steve Yegge's Gas Town Dead Reckon Devin DORA Bear DeepSeek Qwen Yann LeCun ⠀ Guests ------- Greg Ceccarelli, Co-Founder & CPO, SpecStory Website Blog LinkedIn ⠀ Follow the podcast ------------------- LinkedIn Threads Instagram TikTok ⠀ Follow Dan Gerlanc ------------------- X LinkedIn Threads Bluesky
Jun 18

From Supervising AI to Building Systems for It

Dan and Eleanor open by discussing how fast software engineering has changed. In the last six months, Eleanor's practice flipped from treating AI as a messy assistant that needs close supervision to building systems that put the agents on the path to success. She now writes essentially no code herself, arguing that the models have become good enough that her involvement mostly makes the results worse. This journey starts from babysitting agents locally to delegating to async, cloud-based agents like GitHub Copilot, Cursor, Devin, OpenHands, or Factory. Eleanor warns that the home-grown terminal "loops" everyone is building right now are great for learning but too brittle to scale. Next up, what does an agent engineering system actually need? Eleanor recommends starting with a sandboxed, execution environment (usually containers), careful configuration over how the agent reaches the outside world (MCP servers and selective network access), a way to see across multiple repositories, and layered rules via AGENTS.md and skills. Eleanor makes the case that async delegation is a forcing function for better specifications. Deterministic feedback like static analysis and test suites are the single biggest factor in work quality because "you can't control AI with AI." She has moved to fully test-driven development and notes that current-generation models no longer find unintended workarounds to tests (e.g., deleting them) the way Claude 4 and early GPT-5 once did. Dan and Eleanor turn to adoption and skills, including how to get better at using AI with deliberate practice. Eleanor explains why she moved using Python, which she was most familiar with from use over her career, to statically typed languages like TypeScript and Go for agent work, why supply chain risk at her healthcare company has her questioning every dependency, and why she dislikes the term "junior developer." Curiosity and systems thinking, not tenure, are what matter now. The episode closes on verification and scale. Eleanor distrusts any output she can't verify, doesn't miss hand-writing code, and argues that inventing new ways to verify, including more formal methods, is the real bottleneck now that models are cheap and strong. On team size, she pushes back on the "small teams" consensus, pointing to the success of large open-source communities. Eleanor remarks that software development has become a sub-branch of systems engineering, and anyone not practicing this now will be shocked in a matter of months. Full episode notes Click here to view the episode transcript. Chapters (00:00) - Introduction (00:58) - The flip: from supervising AI to getting out of its way (03:14) - Cloud-based agents vs. rolling your own (06:08) - The primitives every agent system needs (07:43) - Why async delegation beats local babysitting (11:02) - Writing specs: Codex, Repo Prompt, and markdown (12:21) - Guardrails: AGENTS.md, skills, and deterministic checks (14:19) - Going fully test-driven (17:19) - How engineers really adopt (and hide) AI (19:38) - Getting better through deliberate practice (21:12) - From experiment to reusable skill to library (24:26) - Choosing a language: Python, TypeScript, Go (26:56) - Supply chain risk and distributing specs (29:06) - Beyond 'junior': curiosity over tenure (31:03) - Systems thinking as the durable skill (38:02) - Where Eleanor still doesn't trust AI (39:40) - Not missing the keyboard (42:43) - Keeping up with a fast-moving field (44:50) - What teaching reveals (48:27) - Verification as the real bottleneck (50:41) - Team size and open source at scale (55:48) - Closing: take agents seriously Links from the show -------------------- GitHub Copilot coding agent Devin OpenHands Factory Codex Repo Prompt AGENTS.md Model Context Protocol (MCP) Anthropic 'when AI builds itself' Lovable Vercel Formal verification UML Jimini Health Guests ------- Eleanor Berger, Member of the Technical Staff, Jimini Health Website LinkedIn X Follow the podcast ------------------- LinkedIn Threads Instagram TikTok Follow Dan Gerlanc ------------------- X LinkedIn Threads Bluesky
Jun 4

Are We All Managers Now?

Dan, Angie Jones, and Demetrios Brinkman open with a discussion of the Agentic AI Foundation ("AAIF"), founded by Anthropic, OpenAI, and Block in December 2025 and now home to roughly 180 member companies. AAIF recently launched an ambassador program (apply [here](https://aaif.io/ambassadors/)) and has upcoming events across the globe from [AGNTCon](https://events.linuxfoundation.org/agntcon-mcpcon-north-america/) in San Jose to gatherings in Amsterdam, India, Tokyo, and Seoul. A recurring theme is that the whole industry is learning agentic engineering together. So get out of your "lab" and compare notes! You don't have to do all of this R&D on your own (well, maybe some of it, but it doesn't hurt to collaborate). Everything is changing. And quickly. Angie marks the release of Claude Opus 4.5 as when agentic engineering became viable. Where engineers once obsessed over context engineering and priming a repo so an agent had a chance, the latest frontier models often just need to be pointed at a codebase and told the problem. Drawing on her time leading agentic AI at Block, Angie describes the agent they build that can hold a world model across 25,000 codebases. They paired this agent with cloud workstations where an agent picks up a Jira ticket, clones the repo, and opens a PR without anyone babysitting a terminal. With this kind of firepower comes new problems that look less like coding and more like management. Demetrios argues the unglamorous topic of governance — keeping teams aligned, codifying security practices, deciding what belongs in "the harness" — are the new challenges companies are grappling with. Sandboxes and cloud workers have gone mainstream. The group pushes back on the wave of AI-justified layoffs, worrying that companies are cutting the very mentorship and middle-layer "glue" needed to steer agents. They also dig into tokenomics: budgets blown by mid-year, tools that can cost more than the engineer using them, and Angie's hard-won lesson at Block that getting 95% of engineers onto coding agents produced no velocity until she funded a small group of "AI champions" to learn the tools properly. Tokens, everyone agrees, are not the same as value. As to what the group has found effective for agentic engineering, Angie makes the case for RPI (Research, Plan, Implement) from HumanLayer and for adversarial review. A 32-file refactor that earned a clean pass from Codex made her a believer. Alongside review skills, the [Council of Mine MCP server](https://github.com/block/mcp-council-of-mine), and Jesse Vincent's [Superpowers](https://github.com/obra/superpowers) skill pack; Dan adds Wes McKinney's [RoboRev](https://github.com/wesm/roborev) for continuous background review. The episode closes on the human side: whether "we're all managers now," the identity crisis facing engineers who loved the craft, how Angie found the same flow state building agents that she once found writing code, and how all of this democratizes building for non-engineers. A few quick stops to discuss the token-saving Caveman skill, naming your agents, and a duck-themed calendar app. There's still no free lunch, Dan notes, but the price has come down. At least until the next model drops. Full episode notes Click here to view the episode transcript. Chapters (00:00) - Welcome and introductions (01:49) - Inside the Agentic AI Foundation and the ambassador program (03:38) - A global slate of events and meetups (07:13) - What engineers are doing differently than six months ago (10:16) - Agentic engineering at enterprise scale and cloud workers (12:32) - Governance, the harness, and sandboxes (15:10) - Do we still need managers and the human 'glue'? (21:14) - The bill comes due: AI tool budgets (23:39) - Tokens aren't velocity and the 'AI champions' experiment (28:25) - Front-loading design versus vibe coding (30:15) - RPI and Codex as co-reviewer (34:15) - Adversarial review, Council of Mine, and Superpowers (39:34) - Robo Rev and the QA-agent pattern (42:58) - Agents, data analysis, and specifying the problem (46:33) - Are we all managers now? (48:00) - The Caveman skill and the limits of saving tokens (51:49) - Naming agents, Codex pets, and Quakpit (56:16) - Managing agents versus the joy of writing code (01:02:07) - Democratizing building and the falling price of software Links from the show -------------------- Agentic AI Foundation RPI (Research, Plan, Implement) Superpowers roborev Council of Mine Caveman cmux context rot LLM Council (Andrej Karpathy) MLOps Community Davis Treybig Quakpit Dask Flying Toasters (After Dark) Broomy Guests ------- Angie Jones, VP, Agentic AI Foundation LinkedIn Demetrios Brinkman, Founder, MLOps Community Website LinkedIn Follow the podcast ------------------- LinkedIn Threads Instagram TikTok Follow Dan Gerlanc ------------------- X LinkedIn Threads Bluesky
May 21

Claude-maxxing: Burning $10K in tokens for only $50 with a custom software factory

Dan and Ian Stokes-Rees, founder and CEO of PNI AI Studio, open by discussing the thesis of Ian's company: an opinionated stack of open-source tools wrapped in agentic AI so business analysts, managers, and finance teams can get the capabilities of a senior data scientist without learning Python, SQL, or R. Ian's primary target is financial services, where an estimated 200+ million weekly Excel users still run human-driven, tacit-knowledge processes. He frames a second opportunity, capturing the "AI exhaust" coming out of those workflows, as the seed for a follow-on product. The conversation turns to how Ian actually builds his product. Ian walks through a three-phase evolution: Cursor as a coding assistant, prompt-based Claude Code generation, and finally a full agentic team modeled on Steve Yegge's "Gas Town" post. Today he runs six to ten Claude Code agents in named roles. Xavier and Yasmin are Agile Process Managers, Anne is the Principal Architect. Now add in software engineers, QA engineers, a test engineer, and a release manager. The agents' operating manual is a roughly 5,000-line AGENTS.md tree spread across about 45 markdown files and served via MkDocs. The Kanban lives in GitHub Projects, milestones serve as sprints, story points and labels drive the workflow, and a "kaizen accumulator" task captures learnings each sprint that get translated into process changes at the start of the next one. Next up, diving into token-maxxing. Ian explains why he keeps hitting Claude Max 20x weekly limits on day three of a sprint — five software engineering agents plus two QA agents burning tokens in parallel — and the management tricks he's adopted: Caveman to enforce terse prompts, templated processes, a catalog of deterministic scripts behind self-documenting skills, pre-commit hooks, and roughly a dozen CI gates that run Claude and Codex reviews against PR templates. Still, not everything is perfect in agent-land. Ian describes his agents as "solid second quartile" engineers. They're fast, pleasant, and (currently) inexpensive, but wrong in meaningful ways on one PR in five. Vibe coding works for prototypes and small reports, but serious systems still need human-driven design thinking, separation of concerns, and testing discipline. Perhaps the current moment is an "interregnum" between 25 years of established software practice and an agent-native future. Could this one day be a software factory with human "forepersons" running follow-the-sun shifts over agents that never sleep? The episode closes with a warning about "AI brain fry" that comes from work products arriving ten times faster than humans produce them. Full episode notes Click here to view the episode transcript. Chapters (00:00) - Welcome and Ian's background (01:09) - PNI AI Studio: enterprise AI analytics for Excel users (04:10) - Financial services, Excel jockeys, and AI exhaust (08:26) - Three phases from Cursor to an agent team (10:28) - Building from Steve Yegge's Gas Town (12:21) - A six-to-ten agent Scrum team (15:38) - Paperclip and migrating to a server (17:35) - Hitting Claude Max 20x weekly limits (19:47) - Token management: Caveman, skills, scripts (21:13) - Mining AI session exhaust and Diane.ai (28:37) - Agentic development as engineering craft (31:12) - Sprint planning in the interregnum (36:50) - GitHub Projects as Kanban, milestones as sprints (42:20) - The agents.md file tree and skills (50:23) - CI gates and code review (53:06) - What you still don't trust from agents (57:51) - Why vibe coding doesn't scale yet (59:41) - AI brain fry: managing agents vs. humans (01:02:06) - A software factory with human foremen (01:05:43) - Wrap up and how to reach Ian Links from the show -------------------- PNI AI Studio Anaconda Steve Yegge's Gas Town Paperclip GitHub Projects MkDocs Caveman When Using AI Leads to "Brain Fry" (HBR) Jensen Huang GTC 2026 keynote Kaizen Extreme Programming (XP) Guests ------- Follow the podcast ------------------- LinkedIn Threads Instagram TikTok Follow Dan Gerlanc ------------------- X LinkedIn Threads Bluesky
May 9

Vibe Coding in the Physical World: Robotics, Circuits, and Dangerous Permissions

Dan and Greg discuss Revise Robotics, where Greg serves as founding engineer building robotic systems that refurbish discarded corporate laptops for donation. The episode opens with a description of how AI vision models allow robots to navigate unfamiliar BIOS screens and unpredictable laptop states dynamically — a capability that wasn't feasible a few years ago. Greg reflects on how LLM-powered vision surprised even him as a "second gift," enabling a kind of general adaptability that previously would have required exhaustively pre-coded state machines. The conversation digs into Greg's hands-on experience using Claude for hardware projects, most vividly illustrated by an Arduino RPC library he built on a Raspberry Pi in under five minutes — a task he estimates would have taken a full day by hand. Greg draws a sharp distinction between projects where AI delivers near-100x speedup (well-defined problems with existing patterns and a testable harness) versus cases where it gets confidently stuck in loops. His Minivac 601 circuit simulator project becomes the central cautionary example: months of fruitless AI-assisted attempts to simulate relay circuits collapsed once he realized he needed a real physics engine rather than asking the AI to re-derive Kirchhoff's laws from scratch. A recurring theme is the tension between speed and trust. Greg describes his journey from clicking "yes" to every Claude permission prompt, to briefly trying sandboxing tools like Nono, to ultimately running Claude with dangerously-skip-permissions locally — partly out of pragmatism, partly because he concluded the permission theater wasn't actually catching anything. He shares his "committee of elders" technique, routing important decisions through Claude, Gemini, and ChatGPT simultaneously and only proceeding when all three agree. Dan shares his MMI hook tool, which intercepts Claude's bash calls to enforce conventions like always using uv instead of raw Python. The episode closes with a candid discussion of the emotional and societal costs of this pace. Greg describes a new kind of frustration — distinct from normal debugging — when an AI tool fails after drawing you deep into a rabbit hole. He and Dan also address broader concerns: the acceleration of security vulnerabilities, the environmental cost of GPU compute, and AI-driven job displacement. Both acknowledge they can't stop using these tools even as they see the harms compounding, and end on a cautiously hopeful note about open-source and local models eventually offering more control. Chapters: (00:00) - Introduction and guest background (01:20) - Revise Robotics: refurbishing laptops with robots (04:16) - AI vision models navigating unpredictable hardware (07:36) - LLMs as a force multiplier for small teams (11:42) - Who gets the most out of working with LLMs? (15:34) - Claude hooks and the MMI permission tool (17:51) - Going dangerously: skipping Claude permissions (22:19) - Hardware with Claude: the Arduino library story (27:23) - Estimating the 100x speedup (30:23) - Vibe-coding the office network with MicroTik (35:57) - The committee of elders: multi-model verification (44:07) - Where AI fails: the Minivac 601 circuit simulator (54:27) - 3D and CAD as another AI blind spot (55:54) - Closed loops, tests, and why they make AI coding work (59:47) - Mental fatigue from AI-assisted development (01:05:29) - Security risks and societal costs of AI acceleration (01:10:36) - Open-source and local models as a path forward Click here to view the episode transcript.
May 7

Welcome to Agents and Engineers

Welcome to Agents and Engineers: The podcast about Agentic AI and Software Engineering. I'm your host, Dan Gerlanc. Each episode is a conversation with people whose daily lives most intersect with AI and agentic systems. These are the people building the agents we interact with every day, like Claude and Codex. They are the engineers building software in a radically new way, by written spec and probabilistic iteration, repeated until the program works. Beyond the technical deep dives, we explore what AI and Agents mean for the future of software engineering. We discuss the psychological and philosophical implications of our interactions with this technology. Join me as I follow the stories, the behind-the-scenes, and the real people behind the code, the agents and the engineers.

Trailer

Welcome to Agents and Engineers

Welcome to Agents and Engineers: The podcast about Agentic AI and Software Engineering. I'm your host, Dan Gerlanc. Each episode is a conversation with people whose daily lives most intersect with AI and agentic systems. These are the people building the agents we interact with every day, like Claude and Codex. They are the engineers building software in a radically new way, by written spec and probabilistic iteration, repeated until the program works. Beyond the technical deep dives, we explore what AI and Agents mean for the future of software engineering. We discuss the psychological and philosophical implications of our interactions with this technology. Join me as I follow the stories, the behind-the-scenes, and the real people behind the code, the agents and the engineers.

out of 5

12 Ratings

Stay in the loop

5d ago

Aaaadaaaam

Love hearing about the trending topics and staying up to date on tech. Great insight from guests.
Data Engineering Enthusiast

May 24

thompsonjjet23

Super interesting conversation about how AI is transforming the tech workflows across data science and engineering! Excited to keep listening
A needed addition

May 14

Sheamus Mcg

Great first interview. Looking forward to hearing more

Creator

Dan Gerlanc
Years Active

2026
Episodes

9
Rating

Clean
Show Website

Agents and Engineers: Agentic AI with Dan Gerlanc

Technology

Technology

Updated Semiweekly
Technology

Technology

Updated Weekly
Technology

Technology

Updated Semiweekly

Agents and Engineers: Agentic AI with Dan Gerlanc

Episodes

The AI Skill Flip

The New Sport of Programming

Skills, Context, and Trust: The New Agentic Coding Stack

When Software Gets Cheap, Focus Gets Expensive

From Supervising AI to Building Systems for It

Are We All Managers Now?

Claude-maxxing: Burning $10K in tokens for only $50 with a custom software factory

Vibe Coding in the Physical World: Robotics, Circuits, and Dangerous Permissions

Welcome to Agents and Engineers

Trailer

Welcome to Agents and Engineers

Stay in the loop

Data Engineering Enthusiast

A needed addition

About

Information

You Might Also Like

Agents and Engineers: Agentic AI with Dan Gerlanc

Episodes

The AI Skill Flip

The New Sport of Programming

Skills, Context, and Trust: The New Agentic Coding Stack

When Software Gets Cheap, Focus Gets Expensive

From Supervising AI to Building Systems for It

Are We All Managers Now?

Claude-maxxing: Burning $10K in tokens for only $50 with a custom software factory

Vibe Coding in the Physical World: Robotics, Circuits, and Dangerous Permissions

Welcome to Agents and Engineers

Trailer

Ratings & Reviews

About

Information

You Might Also Like