How I AI

Claire Vo

How I AI, hosted by Claire Vo, is for anyone wondering how to actually use these magical new tools to improve the quality and efficiency of their work. In each episode, guests will share a specific, practical, and impactful way they’ve learned to use AI in their work or life. Expect 30-minute episodes, live screen sharing, and tips/tricks/workflows you can copy immediately. If you want to demystify AI and learn the skills you need to thrive in this new world, this podcast is for you.

  1. Sonnet 5 review: I ran 64 generations to find out if it's worth it

    21 hr ago

    Sonnet 5 review: I ran 64 generations to find out if it's worth it

    I’ve been testing every major frontier model release since the start of the year, and when Anthropic dropped Sonnet 5, I wanted more than a vibe check. I got tired of one-off tests I couldn’t repeat or compare over time, so I built something better: the How I AI Bench, a repeatable eval harness I constructed live using Claude Code while recording this episode. I ran Sonnet 5 blind against four other frontier models (Sonnet 4.6, Opus 4.8, GPT-5.5, and Gemini 3 Pro) across PRD quality, prototype generation, agentic task completion, and agent personality. The results were not what I expected. What you’ll learn: What Anthropic claims Sonnet 5 improves over Sonnet 4.6, and where the benchmark data actually backs that upHow I built the How I AI Bench in under 45 minutes using Claude Code, starting from my own stored session historyWhy I combined human vibe scoring (70%) with LLM as judge scoring (30%) instead of trusting either aloneHow to set up a local HTML scoring page so you can rate AI outputs on gut feel and export those scores as JSONWhich model I recommend for PRDs, which for complex prototypes, and which for chatting with an agent daily— Brought to you by: Runway—The creative AI platform for images, video and more Hyperagent—Deploy fleets of agents that handle real work — In this episode, we cover: (00:00) Sonnet 5 is out (01:55) What Anthropic claims (04:02) Why I’m done with one-off vibe checks (05:05) Building the How I AI Bench live with Claude Code (07:42) The scoring system (10:43) Agent voice eval (11:57) Quick recap (13:58) Results: The How I AI index leaderboard (21:21) What I’m improving for the next run (22:16) Generating a Claire-weighted index (23:53) Model-by-task recommendations — Tools referenced: • Claude Sonnet 5: https://www.anthropic.com/news/claude-sonnet-5 • Claude Opus 4.8: https://www.anthropic.com/news/claude-opus-4-8 • GPT-5.5 (OpenAI): https://openai.com/index/introducing-gpt-5-5/ • Gemini 3 Pro (Google DeepMind): https://deepmind.google/models/gemini/pro/ • Cursor: https://www.cursor.com/ — Other references: • SWE-bench Pro (agentic coding benchmark referenced): https://www.swebench.com/ — Where to find Claire Vo: ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo — Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email jordan@penname.co.

    26 min
  2. No Figma. No Jira. No docs. How Gusto built a new product line with Claude Code | Eddie Kim (CTO)

    2 days ago

    No Figma. No Jira. No docs. How Gusto built a new product line with Claude Code | Eddie Kim (CTO)

    Eddie Kim is the co-founder and CTO of the payroll and HR platform Gusto, which just crossed $1 billion in revenue and serves more than 500,000 small businesses. Recently he did something most CTOs don’t: he went back to writing code. With three other engineers and one designer, Eddie built Gusto Cofounder, a net-new AI product, from zero code to a tier-one launch in 10 weeks. He walks through how that team actually worked, why they threw out nearly every process, and how anyone can copy the approach. What you’ll learn: The trash-can method: how to write, review, and delete a full PR as a product decision instead of a planning docThe two-tool agent stack behind Gusto CofounderThe exact “perma-Zoom” setup that replaced standups, retros, and Slack threads for 10 weeksHow a designer with no engineering background hit the 94th percentile for shipping codeThe eval-first workflow Eddie uses to fix real customer bugs with Claude CodeHow a non-technical leader can prototype an idea to win buy-in, then carry it all the way to production-quality code— Brought to you by: Magic Patterns—Prototypes that look like your product Jira Product Discovery—Prioritize with insights, build with confidence — In this episode, we cover: (00:00) Intro: five people, 10 weeks (02:38) The origins of Cofounder (08:32) Inside the 10-week build process (12:50) Building with no PMs (14:38) The “trash can” method (17:15) The stack architecture (19:10) Shipping to production from day one (22:03) How a designer became a top engineer (29:05) Demo: Cofounder over text and Slack (31:45) Demo: running a real payroll (36:26) Live coding with evals in Claude Code (39:39) Recap: prototype, small team, permission (43:17) Lightning round (48:44) Where to find Eddie and Cofounder — Tools referenced: • Gusto Cofounder (early access/waitlist): https://gusto.com/cofounder • Claude Code (Anthropic): https://claude.ai/code • Cloudflare Workers: https://workers.cloudflare.com/ • Vercel AI SDK: https://sdk.vercel.ai/ • DX (engineering analytics): https://getdx.com/ • Wispr Flow (voice-to-text): https://wisprflow.ai • OpenClaw: https://openclaw.ai/ — Other references: • Gusto (the main product, “Gusto Classic”): https://gusto.com • Mindbody (referenced as customer data source): https://www.mindbodyonline.com/ — Where to find Eddie Kim: LinkedIn: https://www.linkedin.com/in/edawerd/ — Where to find Claire Vo: ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo — Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email jordan@penname.co.

    52 min
  3. GLM 5.2: why I’m replacing Opus in Claude Code with this new model

    24 Jun

    GLM 5.2: why I’m replacing Opus in Claude Code with this new model

    I put GLM 5.2, the open-weight coding model from Z.AI, through four real tasks inside my actual codebase: a codebase architecture audit, a UI redesign, and a 45-minute autonomous bug-hunting session pulling from Sentry and Vercel logs. Total cost: $3.36 for roughly 6 million tokens, a prioritized bug-fix dashboard I’m actually shipping from, and a landing page redesign that matched Chat PRD’s design system on the first try. What you’ll learn: What “open-weight” actually means and why it matters for cost and vendor independenceHow to connect GLM 5.2 to Cursor and Claude CodeHow it performs on codebase exploration and autonomous architecture summarization in a real production Next.js appWhether GLM 5.2 can match an existing design systemHow the model handles a 45-minute long-running autonomous taskWhere GLM 5.2 stumbled The actual cost breakdown— Brought to you by: Mercury—Radically different banking loved by over 300K entrepreneurs — In this episode, we cover: (00:00) What open-weight models are and why GLM 5.2 is worth testing (01:38) GLM 5.2 model overview (04:02) Capabilities and benchmark results (06:02) How to set up GLM 5.2 in Cursor (08:37) How to set up GLM 5.2 in Claude Code (11:04) Live test 1: codebase exploration and architecture audit on ChatPRD (12:43) Live test 2: generating an HTML architecture and roadmap page (16:37) Live test 3: redesigning the How I AI landing page in Cursor (20:57) Live test 4: 45-minute autonomous task, pulling Sentry errors and Vercel logs (22:35) Where it struggled (23:49) My verdict on the output (25:23) Cost breakdown — Tools referenced: z.ai: https://z.aiGLM 5.2: https://z.ai/blog/glm-5.2OpenRouter: https://openrouter.aiCursor: https://cursor.comClaude Code: https://docs.anthropic.com/en/docs/claude-codeSentry: https://sentry.ioVercel: https://vercel.com— Other references: SWE-Bench Pro leaderboard (coding benchmark scores referenced in episode): https://www.swebench.comFrontier Suite and Post-Train Bench (additional benchmarks cited): https://scale.com/leaderboardUse Claude Code with OpenRouter: https://openrouter.ai/docs/cookbook/coding-agents/claude-code-integration— Where to find Claire Vo: ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo — Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email jordan@penname.co.

    27 min
  4. How Claude Mythos found a 15-year-old bug in Mozilla Firefox | Brian Grinstead

    22 Jun

    How Claude Mythos found a 15-year-old bug in Mozilla Firefox | Brian Grinstead

    Brian Grinstead is a distinguished engineer at Mozilla, where he’s worked on Firefox and the web platform since 2013 (he joined to help launch Firefox DevTools). Recently he and his team pointed an agentic bug-finding pipeline at Firefox—a codebase with tens of thousands of files and tens of millions of lines of code—and shipped a record month of security fixes. The viral chart everyone saw gave the credit to Anthropic’s new Mythos model. Brian’s take is that the harness and pipeline did just as much of the work, and he walks through exactly how it runs and how anyone can build a starter version. What you’ll learn: How to build a basic bug-finding harness by running Claude Code or Codex with one prompt and the -p flag, no SDK requiredWhy pointing an agent at a whole codebase fails, and how an LLM judge can score and rank files before you spend any computeHow a verifier subagent kills false positives by catching the agent when it cheatsThe goal-loop pattern: give an agent a tightly scoped problem, a clear pass/fail signal, and let it retry far past the point a human would quitWhy teams that already invested in fuzzing, CI, and dev tooling are so far aheadHow to weigh model versus harness, and why Brian splits the credit close to 50-50How a non-engineer can reuse the same score, verify, and fix the loop for design quality, conversion rate, or tech debtWhy AI-generated patches still can’t ship on their own, and where humans stay in the loop— Brought to you by: WorkOS—Make your app enterprise-ready today Metaview—The agentic recruiting platform for winning teams — In this episode, we cover: (00:00) Introduction to Brian Grinstead (02:43) The viral chart: Firefox Security Bug Fixes by Month (05:32) How the custom harness works (10:22) Goal loops and guardrails (14:45) How they built it (16:55) Real bugs, including a 15-year-old one (23:00) Open-sourcing it (26:26) Why humans still review every fix (32:30) Live demo and prioritizing files (40:18) Mobilizing the team and recap (42:33) Lightning round — Tools referenced: • Claude Code: https://claude.ai/code • Claude Agent SDK: https://code.claude.com/docs/en/agent-sdk/overview • Codex: https://openai.com/index/openai-codex/ • OpenAI Agent SDK: https://developers.openai.com/api/docs/guides/agents • VS Code: https://code.visualstudio.com/ • Docker: https://www.docker.com/ • Firefox: https://www.mozilla.org/firefox/ • Address Sanitizer: https://github.com/google/sanitizers • RLBox: https://rlbox.dev/ — Other references: • Mozilla Bug Bounty Program: https://www.mozilla.org/security/bug-bounty/ • Mozilla GitHub: https://github.com/mozilla — Where to find Brian Grinstead: LinkedIn: https://www.linkedin.com/in/bgrins/ GitHub: https://github.com/bgrins — Where to find Claire Vo: ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo — Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email jordan@penname.co.

    48 min
  5. How to design AI agent loops: schedules, goals, and subagents in Claude Code and Codex

    17 Jun

    How to design AI agent loops: schedules, goals, and subagents in Claude Code and Codex

    I break down every loop type from scratch—what a heartbeat, cron, hook, and goal loop actually are, when each one fits, and the five things any effective loop needs before it touches production. Then I build two live loops: a daily aging-PR reviewer in Claude Code that schedules itself at 10:15 a.m. and spins off its own subagents, and a weekly skills-identification loop in Codex that spawns goal-based subagents to validate its own output in real time. What you’ll learn: The plain-English definition of a loop—and why it’s just an automated prompt, not a scary new paradigmThe four loop types (heartbeat, cron, hook, and goal) and when each one actually fits your workflowHow to think about loop design using the “onboarding an employee” mental modelThe five things every effective loop needs: work trees, skills, plugins/connectors, subagents, and state trackingHow to build a scheduled PR-review routine in Claude Code that babysits aging PRs and alerts your teamHow to set up a weekly skills-identification automation in Codex that spawns its own validating subagentsWhy goal-based loops are the hardest to write well—and where most people burn tokens for nothingThe two warning signs that your loop is going to get expensive before it gets useful— Brought to you by: WorkOS—Make your app enterprise-ready today Runway—The creative AI platform for images, video, and more — In this episode, we cover: (00:00) Prompts are out and loops are in (02:30) Defining a loop (03:03) The four ways to automate a prompt: heartbeat, cron, hooks, and goals (06:03) Five things every effective loop needs (09:26) The “onboarding an employee” framework for designing loops (11:58) Live build #1: Daily aging PR loop in Claude Code (17:08) Subagents inside loops (19:00) Live build #2: Weekly skills identification loop in Codex (22:57) Watching subagents spin up in real time (25:28) Warning signals around loops (27:31) What listeners are doing with loops — Tools referenced: • Claude Code: https://claude.ai/code • Codex: https://chatgpt.com/codex • OpenClaw: https://openclaw.ai/ — Other references: • Claire’s article “Why OpenClaw Feels Alive Even Though It’s Not”: https://x.com/clairevo/article/2017741569521271175 • Addy Osmani’s article on loop engineering: https://addyosmani.com/blog/loop-engineering/ • Using Goals in Codex: https://developers.openai.com/cookbook/examples/codex/using_goals_in_codex — Where to find Claire Vo: ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo — Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email jordan@penname.co.

    29 min
  6. Claude Fable 5 review: what the new Mythos model gets right (and very wrong)

    9 Jun

    Claude Fable 5 review: what the new Mythos model gets right (and very wrong)

    Claude Fable 5 is the first Mythos-class intelligence model to be generally available, and I got early access to test it before launch. In this episode, I walk through what Anthropic is promising, what actually stood out when I used it on real work, and where I think it fits in your AI stack. — In this episode, we cover: (00:00) Introduction: Fable 5 is finally here (00:31) What Anthropic says about the model (05:14) Token-intensive by design (06:28) Safety classifiers and the new fallback concept (07:46) Is this or is this not Mythos? (08:30) New product launches: Managed Agents and more (09:20) Crushing benchmarks (09:55) What it’s actually like to use (the good and the bad) (11:40) Test 1: product graph spec (12:56) Test 2: designing a skills registry (14:04) Conservative on execution (14:43) Test 3: multi-agent orchestration (15:39) My takeaways — Tools referenced: • Claude Fable 5: https://www.anthropic.com/news/claude-fable-5-mythos-5 • Claude Managed Agents: https://platform.claude.com/docs/en/managed-agents/overview — Other reference: • SWBench Pro benchmark: https://www.swebench.com/ — Where to find Claire Vo: ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo — Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email jordan@penname.co.

    17 min
  7. Shopping with Claude: How to find quality brands, automate returns, and buy things that last 100 years | Nicole Ruiz

    8 Jun

    Shopping with Claude: How to find quality brands, automate returns, and buy things that last 100 years | Nicole Ruiz

    Nicole Ruiz is a writer and parent who has built a comprehensive AI-powered shopping system to help her family buy high-quality, long-lasting items while avoiding the noise of drop-shipping brands, paid ads, and poorly made products. She writes an interview series on Substack about how technology is changing the household. What you’ll learn: How to build a Claude Project with custom instructions for vetting brands based on heritage, craftsmanship, and return policiesThe shopping criteria that help surface century-old manufacturers over trendy direct-to-consumer brandsHow to use Claude to search through trusted vendor websites that have terrible UXWhy AI actually helps small artisans and heritage brands compete against Amazon’s infrastructureHow to use Claude Cowork to automate returns by finding receipts in your email and drafting refund requestsThe technique for getting Claude to analyze whether a brand is legitimate or just a drop-shipping operationHow to shop within a specific budget or with gift cards using AI assistance— Brought to you by: Orkes—The enterprise platform for reliable applications and agentic workflows Metaview—The agentic recruiting platform for winning teams — In this episode, we cover: (00:00) Introduction to Nicole and AI-powered shopping (02:29) The problem (04:55) Building a Claude Project for household purchasing (07:44) The “anti-to-do list” concept for reducing mental overhead (10:30) Shopping for a can opener: the system in action (15:53) How AI helps century-old brands with terrible websites (18:45) Processing returns with Claude Cowork (25:06) Using gift cards strategically (26:33) Vetting brands (29:40) Recap, lightning round, and final thoughts — Tools referenced: • Claude: https://claude.ai/ • Claude Cowork: https://www.anthropic.com/product/claude-cowork — Other references: • Boston General Store: https://bostongeneralstore.com/ • L.L.Bean: https://www.llbean.com/ • Manufactum: https://www.manufactum.com/ • 5 OpenClaw agents run my home, finances, and code | Jesse Genet: https://www.lennysnewsletter.com/p/5-openclaw-agents-run-my-home-finances • From a $6.90 newsletter to $3M API: How a non-coder built Memelord | Jason Levin: https://www.lennysnewsletter.com/p/from-a-690-newsletter-to-3m-api-how — Where to find Nicole Ruiz: X: https://x.com/nwilliams030 Substack (The Third Oikos): https://www.thirdoikos.com/ — Where to find Claire Vo: ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo — Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email jordan@penname.co.

    37 min
  8. Gemini Omni: Clone yourself with AI in under 15 minutes

    3 Jun

    Gemini Omni: Clone yourself with AI in under 15 minutes

    In this experimental episode, I document my real-time attempt to create an AI avatar of myself using Google Flow and the new Gemini Omni video generation model. I walk through the entire process—from scanning my face with my phone to generating a complete one-minute hype video for the podcast, all in about 15 minutes. What you’ll learn: How to create an AI avatar using Google Flow in under five minutesWhy video AI tools unlock creative possibilities for people with zero video production skillsThe step-by-step process of generating a full storyboard using AI as your creative producerHow to use character consistency features to generate multiple video scenes with the same avatarThe uncanny-valley moments you’ll encounter when your AI clone doesn’t quite nail emotions or physicsHow to stitch together AI-generated scenes into a complete video using built-in editing tools— Brought to you by: Merge—Connective infrastructure for production AI Jira Product Discovery—Prioritize with insights, build with confidence — In this episode, we cover: (00:00) Getting started with Google Flow and Gemini Omni (01:38) The avatar creation process: scanning and photo capture (02:55) Using Flow to brainstorm a hype video storyboard (06:59) Generating the first video scene with the avatar (08:41) Troubleshooting: accidentally generating images instead of videos (09:32) Generating all seven scenes for the complete video (11:37) Reviewing the avatar videos (13:13) Stitching the videos together in the browser-based editor (14:32) The complete How I AI hype video (15:32) What worked and what didn’t (19:04) Final thoughts — Tools referenced: • Google Flow: https://labs.google/fx/tools/flow • Gemini Omni: https://gemini.google/overview/video-generation/ • Veo 3: https://deepmind.google/technologies/veo/ — Where to find Claire Vo: ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo — Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email jordan@penname.co.

    21 min

About

How I AI, hosted by Claire Vo, is for anyone wondering how to actually use these magical new tools to improve the quality and efficiency of their work. In each episode, guests will share a specific, practical, and impactful way they’ve learned to use AI in their work or life. Expect 30-minute episodes, live screen sharing, and tips/tricks/workflows you can copy immediately. If you want to demystify AI and learn the skills you need to thrive in this new world, this podcast is for you.

You Might Also Like