Platform Engineering Playbook Podcast

vibesre

The Platform Engineering Playbook Podcast is where AI meets open-source infrastructure knowledge—and you're part of the editorial process. Every episode is researched, scripted, and produced with AI, then reviewed by the community and published on GitHub for anyone to improve. Facing tool sprawl across 130+ platforms? Justifying PaaS costs to your CFO? Navigating the Shadow AI crisis hitting 85% of organizations? We tackle the messy realities of platform engineering that most content avoids, delivering data-backed insights and decision frameworks you can use Monday morning. Built for senior engineers, SREs, and DevOps practitioners with 5+ years in production, we dissect cloud economics, AI governance, infrastructure trade-offs, and career strategy—with the receipts to back it up. Think we got something wrong? Have better data? Open a pull request at platformengineeringplaybook.com. This is infrastructure podcasting as a living document, where the community keeps us honest and the content gets better with every contribution. Read the playbook at https://platformengineeringplaybook.com

  1. -1 DIA

    Why AI Code Is Killing Your Monitoring Budget

    **Is your monitoring bill about to explode? AI-generated code is creating 10x more observability data than human-written code.** In this deep dive episode of Platform Engineering Playbook, we unpack the hidden observability crisis that's quietly hitting DevOps teams everywhere. While AI accelerates development, it's also flooding your monitoring systems with unprecedented amounts of telemetry data. **What You'll Learn:** ✅ Why AI-generated code produces exponentially more observability data ✅ How to manage exploding monitoring costs without losing visibility ✅ Practical strategies for optimizing telemetry in AI-heavy environments ✅ Real-world approaches to selective instrumentation and data sampling **Episode Breakdown:** 0:00 - Cold Open: The 10x observability data problem 2:15 - Industry news roundup 8:30 - Deep Dive Act 1: Understanding the AI observability explosion 18:45 - Deep Dive Act 2: Technical analysis and root causes **Today's News Coverage:** • CNCF's new etcd debugging improvements for Kubernetes • Uber's MySQL consensus architecture breakthrough • Cloudflare's Account Abuse Protection launch • GitLab Container Virtual Registry updates Perfect for platform engineers, DevOps leads, and SREs dealing with modern observability challenges in AI-driven development environments. **Sources & References:** - https://devops.com/ai-is-forcing-devops-teams-to-rethink-observability-data-management/ - https://www.cncf.io/blog/2026/03/12/making-etcd-incidents-easier-to-debug-in-production-kubernetes/ - https://www.infoq.com/news/2026/03/uber-mysql-uptime-consensus/ - https://blog.cloudflare.com/account-abuse-protection/ - https://about.gitlab.com/blog/using-gitlab-container-virtual-registry-with-docker-hardened-images/ #PlatformEngineering #DevOps #CloudNative #Kubernetes

    22 min
  2. -2 DIAS

    How Karpenter Fixes Kubernetes Autoscaling

    **Are you throwing money away on Kubernetes compute costs?** 87% of clusters waste up to half their resources on idle nodes - but there's a solution that's changing everything. In today's Platform Engineering Playbook, we dive deep into **Karpenter**, the game-changing autoscaler that's revolutionizing how teams think about Kubernetes resource management. You'll discover why traditional cluster autoscaling falls short and how Karpenter's architecture solves real-world scaling challenges. **What You'll Learn:** ✅ Why 87% of K8s clusters are bleeding money on unused compute ✅ Karpenter's under-the-hood architecture and decision-making process   ✅ Practical evaluation framework for adopting Karpenter in your platform ✅ Latest platform engineering news from Microsoft Azure AI agents, KubeCon India 2026, and more **Timestamps:** 0:00 - Cold Open: The Kubernetes Cost Crisis 2:15 - Today's Platform Engineering News 8:30 - Deep Dive: Karpenter vs Traditional Autoscaling Perfect for platform engineers, DevOps teams, and cloud architects looking to optimize their Kubernetes infrastructure costs and performance. **Sources & References:** - Understanding Karpenter architecture: https://www.datadoghq.com/blog/karpenter-architecture/ - Microsoft Azure Skills Plugin: https://devops.com/microsoft-azure-skills-plugin-gives-ai-coding-agents-a-playbook-for-cloud-deployment/ - KubeCon India 2026 Schedule: https://www.cncf.io/announcements/2026/03/10/cncf-unveils-kubecon-cloudnativecon-india-2026-schedule/ - Cloudflare Security Insights: https://blog.cloudflare.com/attack-surface-intelligence/ - Monitor Karpenter with Datadog: https://www.datadoghq.com/blog/monitor-karpenter-datadog/ #PlatformEngineering #DevOps #CloudNative #Kubernetes

    18 min
  3. -3 DIAS

    AI Is Not the Problem — Your Infrastructure Is

    **Why do 70% of AI projects crash and burn before they ever see production?** Spoiler alert: it's not the AI that's broken. In today's Platform Engineering Playbook, we're diving deep into the AI infrastructure crisis that's keeping CTOs awake at night. While everyone's racing to deploy the latest AI models, most organizations are discovering their legacy systems simply can't handle the load. **What You'll Learn:** • The real reason AI projects fail (hint: it's your infrastructure) • How to build a unified data fabric that actually works • Which legacy systems are sabotaging your AI ambitions • Practical strategies for modernizing without breaking everything **Episode Breakdown:** 00:00 - Cold Open: The 70% AI failure rate 02:15 - Platform Engineering News Roundup 08:30 - Deep Dive: The AI Infrastructure Disconnect 15:45 - Building Unified Data Fabrics **Today's News:** Cloudflare & Mastercard's new security partnership, Amazon's R8g instance expansion, Pulumi's Google Sign-In support, Amazon vs. Perplexity AI legal battle, and Together AI's GPU cluster improvements. Perfect for platform engineers, DevOps teams, and technical leaders navigating the AI transformation. **Sources & References:** - AI Infrastructure Crisis Roadmap: https://thenewstack.io/ai-infrastructure-crisis-roadmap/ - Cloudflare & Mastercard Security Partnership: https://blog.cloudflare.com/attack-surface-intelligence/ - Amazon EC2 R8g Regional Expansion: https://aws.amazon.com/about-aws/whats-new/2026/03/amazon-ec2-r8g-instances-additional-regions/ - Pulumi Google Sign-In: https://www.pulumi.com/blog/pulumi-cloud-now-supports-google-sign-in/ - Amazon vs. Perplexity Legal Update: https://www.businessoffashion.com/news/technology/amazon-wins-court-order-blocking-perplexity-ai-shopping-bots/ - Together AI GPU Clusters: https://www.together.ai/blog/new-in-together-gpu-clusters-autoscaling-observability-self-healing #PlatformEngineering #DevOps #CloudNative #Kubernetes

    19 min
  4. -3 DIAS

    Why Kubernetes Doesn’t Scale Without an IDP

    **Why do 97% of companies using Kubernetes never scale beyond their original expert team?** It's not a skills problem - it's an architecture problem that Internal Developer Platforms (IDPs) are uniquely positioned to solve. In today's episode of Platform Engineering Playbook, we dive deep into the Kubernetes scaling crisis and explore how IDPs can democratize container orchestration across your entire engineering organization. Plus, we cover the latest platform engineering news that's shaping the industry. **What You'll Learn:** • The real reason most Kubernetes deployments stay trapped in expert-only silos • How IDPs solve the complexity problem without dumbing down capabilities   • Tactical frameworks for deciding if your organization actually needs an IDP • Breaking news: Pulumi's expanded VCS support, Netflix's massive PostgreSQL migration, and Apono's game-changing Grafana integration **Timestamps:** 0:00 - Cold Open: The 97% Problem 2:15 - Industry News Roundup 8:30 - Deep Dive: The Kubernetes Scaling Crisis 15:45 - How IDPs Bridge the Expert Gap Whether you're a platform engineer, DevOps lead, or engineering manager struggling with Kubernetes adoption, this episode gives you concrete strategies to scale your platform beyond the experts who built it. **Sources & References:** • Why IDPs are the Only Way to Scale Kubernetes Beyond Experts: https://cloudnativenow.com/social-facebook/why-idps-are-the-only-way-to-scale-kubernetes-beyond-experts/ • Expanded Version Control Support in Pulumi Cloud: https://www.pulumi.com/blog/expanded-version-control-support/ • Apono integration for Grafana: https://grafana.com/blog/apono-integration-for-grafana-enabling-just-in-time-access-for-data-sources/ • Netflix Automates RDS PostgreSQL to Aurora PostgreSQL Migration: https://www.infoq.com/news/2026/03/netflix-automates-rds-aurora/ #PlatformEngineering #DevOps #CloudNative #Kubernetes

    17 min
  5. -4 DIAS

    The AWS Cost That Doesn’t Show Up in Cost Explorer

    **What if your AWS bill has a hidden line item costing you thousands that doesn't even show up in Cost Explorer?** Today on Platform Engineering Playbook, we expose the sneaky cloud costs that are bleeding your budget dry and dive deep into the AWS Well-Architected Framework's six pillars to help you architect cost-efficient, secure platforms. **What You'll Learn:** ✅ How to identify and eliminate hidden AWS costs using the Well-Architected Framework ✅ Practical steps platform engineers can take TODAY to optimize cloud spending ✅ Real-world analysis of cost optimization strategies that actually work **Episode Breakdown:** 🎯 Cold Open: The hidden AWS cost crisis 📊 Deep Dive Act 1: Setting up the hidden cost problem 🔍 Deep Dive Act 2: AWS Well-Architected Framework analysis with expert insights ⚡ Deep Dive Act 3: Actionable takeaways for platform engineers 📰 Industry News: NanoClaw's containerized AI agents, OpenLens alternatives, incident management at Port, DevOps job opportunities, and GitHub Codespaces incidents Whether you're managing multi-million dollar cloud infrastructures or optimizing costs for growing startups, this episode delivers the framework and tactics you need to stop hidden costs from destroying your budget. **Sources & References:** - AWS Well-Architected Framework Hidden Costs: https://aws.amazon.com/blogs/architecture/the-hidden-price-tag-uncovering-hidden-costs-in-cloud-architectures-with-the-aws-well-architected-framework/ - NanoClaw Containerized AI Agents: https://thenewstack.io/nanoclaw-containerized-ai-agents/ - OpenLens Alternatives Guide: https://feeds.dzone.com/link/23568/17293987/best-openlens-alternatives-for-kubernetes-visibility - Port Incident Management: https://www.port.io/blog/how-ai-would-have-handled-a-real-incident-at-port - DevOps Job Opportunities: https://devops.com/five-great-devops-job-opportunities-179/ - GitHub Codespaces Incident: https://www.githubstatus.com/incidents/tp8m3544w2g8 #PlatformEngineering #DevOps #CloudNative #Kubernetes

    19 min
  6. 6/03

    87% of Ansible Playbooks Are Broken (AI Just Proved It)

    **87% of production Ansible playbooks have critical flaws - but AI just revealed how to fix them.** Today's Platform Engineering Playbook dives deep into how AI is revolutionizing infrastructure automation and Ansible development. We'll explore groundbreaking research showing most production playbooks lack proper error handling, and how collaborative AI approaches are changing the game for platform engineers. **What You'll Learn:** • Why most Ansible deployments are more fragile than you think • How to leverage AI to identify and fix critical infrastructure code issues • Real-world case studies of AI-assisted Ansible improvement • Latest developments in route optimization algorithms (RADAR) • Pulumi's massive 20x performance improvements now in GA • AWS Lambda's new Kiro power for durable functions **Timestamps:** 0:00 Cold Open - The Ansible Crisis 2:15 Today's Platform Engineering News 8:30 Deep Dive: AI + Ansible Collaboration Whether you're managing infrastructure at scale or just starting your platform engineering journey, this episode delivers actionable insights you can implement immediately. Learn how top engineering teams are using AI not to replace their expertise, but to amplify it. **Sources & References:** • How to collaborate with AI to improve your Ansible skills: https://developers.redhat.com/articles/2026/03/04/how-collaborate-ai-improve-your-ansible-skills • RADAR: Learning to Route with Asymmetry-aware DistAnce Representations: https://arxiv.org/abs/2603.03388 • Now GA: Up to 20x Faster Pulumi Operations for Everyone: https://www.pulumi.com/blog/journaling-ga/ • Accelerate Lambda durable functions development with new Kiro power: https://aws.amazon.com/about-aws/whats-new/2026/03/lambda-durable-kiro-power/ • How we would have managed a recent incident at Port with an incident agent: https://www.port.io/blog/how-we-would-have-managed-a-recent-incident-at-port-with-an-incident-agent • Scaling AI opportunity across the globe: Learnings from GitHub and Andela: https://github.blog/developer-skills/career-growth/scaling-ai-opportunity-across-the-globe-learnings-from-github-and-andela/ #PlatformEngineering #DevOps #CloudNative #Kubernetes

    15 min
  7. 5/03

    GrafanaCON 2026: The Agenda That Signals the Future of Observability

    **GrafanaCON 2026 just dropped their agenda, and every attendee will build an AI agent from scratch on day one. What does this tell us about the future of platform engineering?** In today's Platform Engineering Playbook, we dissect the GrafanaCON 2026 agenda to uncover what it reveals about emerging trends in observability and platform tooling. We analyze why hands-on AI workshops are becoming conference staples and what this means for platform teams in 2026. **What You'll Learn:** • How GrafanaCON's AI-first approach signals industry shifts • Strategic insights for platform teams from the conference agenda • Hidden cloud costs exposed by AWS's Well-Architected Framework • Release platform migration strategies that actually work • Why traditional ITOps fails with AI incident management **Timestamps:** 00:00 Cold Open - GrafanaCON's AI Agent Challenge 02:15 Today's Platform Engineering News 08:30 Deep Dive: GrafanaCON 2026 Agenda Analysis Whether you're planning conference attendance or building your 2026 platform strategy, this episode breaks down the signals that matter for platform engineering leaders. **Sources & References:** • GrafanaCON 2026 agenda: https://grafana.com/blog/grafanacon-2026-agenda/ • AWS Hidden Cloud Costs: https://aws.amazon.com/blogs/architecture/the-hidden-price-tag-uncovering-hidden-costs-in-cloud-architectures-with-the-aws-well-architected-framework/ • Release Platform Migration Strategy: https://launchdarkly.com/blog/release-platform-migration/ • Datadog Synthetic Monitoring: https://www.datadoghq.com/blog/simplifying-troubleshooting-with-synthetic-monitoring/ • AI Incident Management Evolution: https://thenewstack.io/ai-incident-management-evolution/ #PlatformEngineering #DevOps #CloudNative #Kubernetes

    18 min
  8. 4/03

    Can AI Run Your Production Systems?

    What if your observability stack could debug and fix production issues while you sleep? That future might be closer than you think. In today's Platform Engineering Playbook, we explore the cutting edge of agentic AI in observability systems and break down the biggest platform engineering news shaping March 2026. **🎯 WHAT YOU'LL LEARN:** • How self-healing observability stacks are revolutionizing platform operations • Whether AI agents can truly handle your system's edge cases • Practical evaluation criteria for agentic observability tools • Critical security updates from Datadog's OCI protection expansion • Confluent's game-changing Kafka platform updates with A2A support **⏰ TIMESTAMPS:** 0:00 Cold Open - The Future of Self-Debugging Systems 1:30 Today's Platform Engineering Headlines 8:45 Deep Dive: Agentic Observability - The Setup 15:20 Can AI Handle Your Edge Cases? - The Analysis **💡 WHY LISTEN:** Get actionable insights on emerging platform technologies, real-world implementation strategies, and stay ahead of industry trends that will impact your infrastructure decisions. Perfect for platform engineers, SREs, and DevOps professionals navigating the evolving landscape of autonomous systems. **Sources & References:** • https://grafana.com/blog/the-rise-of-agentic-ai-in-production-can-observability-systems-run-themselves/ • https://www.datadoghq.com/blog/cloud-security-oci/ • https://thenewstack.io/confluent-kafka-a2a-agents/ • https://npmx.dev/blog/alpha-release • https://blog.cloudflare.com/bootstrap-mtc/ • https://www.bbc.com/news/articles/cgk28nj0lrjo #PlatformEngineering #DevOps #CloudNative #Kubernetes

    18 min

Sobre

The Platform Engineering Playbook Podcast is where AI meets open-source infrastructure knowledge—and you're part of the editorial process. Every episode is researched, scripted, and produced with AI, then reviewed by the community and published on GitHub for anyone to improve. Facing tool sprawl across 130+ platforms? Justifying PaaS costs to your CFO? Navigating the Shadow AI crisis hitting 85% of organizations? We tackle the messy realities of platform engineering that most content avoids, delivering data-backed insights and decision frameworks you can use Monday morning. Built for senior engineers, SREs, and DevOps practitioners with 5+ years in production, we dissect cloud economics, AI governance, infrastructure trade-offs, and career strategy—with the receipts to back it up. Think we got something wrong? Have better data? Open a pull request at platformengineeringplaybook.com. This is infrastructure podcasting as a living document, where the community keeps us honest and the content gets better with every contribution. Read the playbook at https://platformengineeringplaybook.com

Talvez também goste