Platform Engineering Playbook Podcast

vibesre

The Platform Engineering Playbook Podcast is where AI meets open-source infrastructure knowledge—and you're part of the editorial process. Every episode is researched, scripted, and produced with AI, then reviewed by the community and published on GitHub for anyone to improve. Facing tool sprawl across 130+ platforms? Justifying PaaS costs to your CFO? Navigating the Shadow AI crisis hitting 85% of organizations? We tackle the messy realities of platform engineering that most content avoids, delivering data-backed insights and decision frameworks you can use Monday morning. Built for senior engineers, SREs, and DevOps practitioners with 5+ years in production, we dissect cloud economics, AI governance, infrastructure trade-offs, and career strategy—with the receipts to back it up. Think we got something wrong? Have better data? Open a pull request at platformengineeringplaybook.com. This is infrastructure podcasting as a living document, where the community keeps us honest and the content gets better with every contribution. Read the playbook at https://platformengineeringplaybook.com

  1. 6H AGO

    The Data Canary Pattern: How Netflix Prevents Bad Metadata Deploys

    **What happens when 2 billion daily metadata events could crash Netflix's entire platform with one bad transformation?** Today's Platform Engineering Playbook dives deep into Netflix's Data Canary system - a masterclass in building trust and validation into your data pipelines at scale. Plus, we cover the latest platform engineering news that's reshaping how we deploy and monitor distributed systems. **What You'll Learn:** • How Netflix validates massive data transformations without risking production • Container readiness strategies for Spring Boot in Kubernetes environments   • LinkedIn's redesigned SAST pipeline using GitHub Actions and CodeQL • Why GitOps is becoming essential for platform engineering teams • Datadog's new LLM observability tools with Google's Agent Development Kit **Episode Chapters:** 0:00 - Cold Open: Netflix's 2 billion event challenge 2:15 - Platform engineering news roundup 8:30 - Deep Dive: Netflix Data Canary system breakdown 15:45 - Trust frameworks for platform validation Whether you're scaling data pipelines, improving deployment reliability, or building platform trust frameworks, this episode delivers actionable insights from real-world implementations at companies like Netflix and LinkedIn. **Sources & References:** • Netflix Data Canary: https://netflixtechblog.medium.com/the-data-canary-how-netflix-validates-catalog-metadata-18b699d58e36?source=rss-c3aeaf49d8a4------2 • Spring Boot Container Readiness: https://medium.com/@AlexanderObregon/container-readiness-checks-for-spring-boot-deployments-535ab60ca32a • LinkedIn SAST Pipeline: https://www.infoq.com/news/2026/02/linkedin-redesigns-sast-pipeline/ • GitOps Course: https://platformengineering.org/blog/announcing-new-course-gitops-for-platform-engineering • AWS Revenue Growth: https://techcrunch.com/2026/02/05/aws-revenue-continues-to-soar-as-cloud-demand-remains-high/ • Datadog LLM Observability: https://www.infoq.com/news/2026/02/datadog-google-llm-observability/ #PlatformEngineering #DevOps #CloudNative #Kubernetes

    15 min
  2. 1D AGO

    Claude Opus 4.6: The First AI That Feels Like a Teammate

    **Claude Opus 4.6 just demolished GPT-4 on every coding benchmark - and it's about to reshape how we think about platform engineering automation.** In today's episode, we break down Anthropic's game-changing AI release and what it means for platform teams worldwide. We dive deep into the autonomous capabilities that could revolutionize how we handle infrastructure operations, but also explore the new risks this creates for production environments. **What You'll Learn:** • How Claude Opus 4.6's coding performance impacts platform tooling decisions • Why autonomous AI operations require new safety frameworks • Practical strategies for identifying AI automation opportunities in your platform • Analysis of Resolve AI's $125M funding and the AI SRE market explosion • Key Kubernetes updates affecting platform teams **Timestamps:** 0:00 Cold Open - Claude's benchmark dominance 2:15 Today's platform engineering news roundup 8:30 Deep Dive: Claude Opus 4.6 for platform teams 15:45 Risk analysis: Autonomous AI in production Whether you're evaluating AI tools for your platform team or wondering how to safely implement autonomous operations, this episode gives you the framework to make informed decisions without getting caught up in the hype. **Sources & References:** • Claude Opus 4.6: https://www.anthropic.com/news/claude-opus-4-6 • Anthropic AI upgrade - Reuters: https://www.reuters.com/business/retail-consumer/anthropic-releases-ai-upgrade-market-punishes-software-stocks-2026-02-05/ • Resolve AI $125M funding: https://techcrunch.com/2026/02/04/ai-sre-resolve-ai-confirms-125m-raise-unicorn-valuation/ • Kubernetes OpenAPI updates: https://github.com/kubernetes/kubernetes/pull/136582 • Prow contributors: https://docs.prow.k8s.io/docs/getting-started-develop • Video mesh recovery research: https://arxiv.org/abs/2602.04257 #PlatformEngineering #DevOps #CloudNative #Kubernetes

    16 min
  3. 2D AGO

    Autonomous AI in DevOps Is Here — And Most Teams Are Doing It Wrong

    **Will 87% of DevOps teams really be obsolete by 2026?** As AI agents take control of production infrastructure, we're witnessing the biggest transformation in platform engineering history. In today's episode, we dive deep into **autonomous AI agents in DevOps workflows** and explore how they're reshaping everything from monitoring to incident response. You'll discover real-world examples of AI agents managing production systems, plus critical insights on when and how to safely implement these powerful tools in your own infrastructure. **What You'll Learn:** • How AI agents are revolutionizing observability and SRE practices • Practical implementation strategies for autonomous monitoring systems • Why you should wait at least 4 months before deploying AI agents in production • The latest trends in GenAI and OpenTelemetry integration • Kubernetes IPv6 adoption and what it means for your platform **Episode Chapters:** 0:00 - Cold Open: The AI DevOps Revolution 2:15 - Today's Platform Engineering News 8:30 - Deep Dive: AI Agents in Production (Setup) 15:45 - Real-World Implementation Examples Whether you're a platform engineer, SRE, or DevOps leader, this episode provides actionable insights for navigating the AI-driven future of infrastructure management. **Sources & References:** • MCP-Powered Agentic AI in DevOps: https://devops.com/mcp-powered-agentic-ai-in-devops-building-secure-scalable-multi-agent-pipelines-for-autonomous-sre-and-observability/ • Observability trends for 2026: https://www.elastic.co/blog/2026-observability-trends-generative-ai-opentelemetry • CNCF LFX Mentorship 2025: https://www.cncf.io/blog/2026/02/04/cncf-celebrates-successful-mentees-from-lfx-mentorship-2025-term-3/ • CNCF Fluid with Amazon EKS: https://aws.amazon.com/blogs/containers/build-deep-learning-model-training-apps-using-cncf-fluid-with-amazon-eks/ • Agent-Assisted Intelligent Observability: https://www.infoq.com/articles/agent-assisted-intelligent-observability/ • Kubernetes and IPv6: https://cloudnativenow.com/features/kubernetes-and-ipv6-together-at-last/ #PlatformEngineering #DevOps #CloudNative #Kubernetes

    19 min
  4. 3D AGO

    Kubernetes Is Retiring Ingress NGINX (And 50% of Clusters Aren’t Ready)

    "90% of Kubernetes clusters are running Ingress NGINX—abandoned in 16 months with zero maintainers left! What does this mean for your production systems? In this episode, we dive deep into the urgent need for migration and the alternatives available as the clock ticks down. With the retirement of Ingress NGINX set for March 2026, it's critical to understand how this affects millions of deployments worldwide. If you're among the half still relying on Ingress NGINX, you can't afford to miss this episode. 🔑 What you'll learn: - The migration timeline and key deadlines you need to know. - How to audit your current Ingress resources and document your custom annotations. - The best practices for transitioning to Gateway API alternatives. - Actionable steps to ensure your infrastructure remains secure and efficient. This isn't just a technical shift; it's a fundamental change that could impact your entire architecture. Stay informed and prepared as we break down everything you need to know to navigate this transition successfully.  [Timestamps below]" ⏱️ TIMESTAMPS: 00:00:00 - Cold Open 00:00:30 - Intro 00:01:05 - Deep Dive - Act 1: The Setup 00:07:05 - Deep Dive - Act 2: The Analysis 00:13:05 - Deep Dive - Act 3: Takeaways 00:19:05 - News 00:23:55 - Outro 📌 IN THIS EPISODE: --- 🎙️ Platform Engineering Playbook 🔗 https://platformengineeringplaybook.com 📧 Subscribe for weekly platform engineering insights #PlatformEngineering #DevOps #Kubernetes #CloudNative #SRE #Podcast

    19 min
  5. 4D AGO

    OpenAI’s New macOS App: Is Agentic Coding Finally Here?

    **OpenAI just made 73% of coding assistants obsolete overnight - but what does this mean for platform engineers?** Today's episode breaks down OpenAI's game-changing macOS app for "agentic coding" and its massive implications for platform engineering workflows. We'll analyze why this isn't just another coding assistant, but a fundamental shift in how we approach infrastructure automation and developer tooling. **What You'll Learn:** ✅ Deep dive into OpenAI's new agentic coding capabilities and competitive advantages ✅ Critical risks platform teams need to consider (hallucinations, security, dependency management) ✅ How enterprise desktop computing is shifting toward immutable Linux systems ✅ Latest updates on Kubernetes pod checkpoint/restore functionality ✅ Azure Red Hat OpenShift's new managed identity features **Episode Timestamps:** 00:00 Cold Open - OpenAI's market disruption 02:15 Introduction & News Overview 05:30 Deep Dive Act 1 - OpenAI's New macOS App 12:45 Deep Dive Act 2 - Competitive Analysis & Scale Advantages Perfect for platform engineers, DevOps professionals, and engineering leaders who need to stay ahead of rapidly evolving AI tooling and infrastructure trends. **Sources & References:** - OpenAI macOS app launch: https://techcrunch.com/2026/02/02/openai-launches-new-macos-app-for-agentic-coding/ - Azure Red Hat OpenShift managed identity GA: https://www.redhat.com/en/blog/general-availability-managed-identity-and-workload-identity-microsoft-azure-red-hat-openshift - Octelium v0.24 release: https://github.com/octelium/octelium - AWS GovCloud cost allocation: https://aws.amazon.com/about-aws/whats-new/2026/01/aws-flexible-cost-allocation-govcloud/ - Enterprise immutable Linux adoption: https://thenewstack.io/why-enterprise-businesses-should-adopt-immutable-linux-for-the-desktop/ - Kubernetes checkpoint/restore: https://cloudnativenow.com/features/kubernetes-begins-work-on-pod-checkpoint-restore/ #PlatformEngineering #DevOps #CloudNative #Kubernetes

    14 min
  6. 5D AGO

    98% of Container CVEs Are Hiding Where You’re Not Scanning

    **Are your container security scans missing 98% of critical vulnerabilities?** New research from Chainguard reveals a shocking blind spot that could be exposing your infrastructure to massive security risks. In today's Platform Engineering Playbook, we unpack this bombshell finding and explore why traditional container scanning is failing at scale. You'll discover where these hidden vulnerabilities are lurking, why your current tools aren't catching them, and most importantly - what you can do about it. **What You'll Learn:** • Why 98% of container CVEs hide outside the top 20 images • The computational costs of comprehensive vulnerability scanning • How to implement "image genealogy tracking" for better security • AWS's massive $581M Air Force cloud deal and what it means for enterprise adoption • Liya Linux: A new Arch-based distro proving GUI doesn't mean performance sacrifice **Timestamps:** 0:00 Cold Open - The 98% Problem 1:30 Today's Platform Engineering News 3:45 Deep Dive: Container Security's Hidden Crisis 12:20 Analysis: Why Current Tools Fall Short Perfect for platform engineers, DevOps teams, and security professionals who need to stay ahead of emerging threats while building scalable infrastructure. **Sources & References:** • Chainguard Container CVE Research: https://www.infoq.com/news/2026/01/chainguard-opensource-vulns/?utm_campaign=infoq_content&utm_source=infoq&utm_medium=feed&utm_term=global • Liya Linux Coverage: https://thenewstack.io/liya-linux-proves-high-performance-doesnt-require-a-command-line/ • AWS Air Force Deal: https://www.washingtontechnology.com/contracts/2026/01/aws-lands-581m-sole-source-deal-under-air-force-cloud-one-program/411104/?oref=wt-homepage-river #PlatformEngineering #DevOps #CloudNative #Kubernetes

    13 min
  7. JAN 31

    Why Forward-Deployed Engineers Are Making $300K+ (And Why Companies Are Desperate for Them)

    Why are forward-deployed engineers making 40% more than traditional backend developers, and why can't companies hire enough of them? In today's Platform Engineering Playbook, we dive deep into tech's hottest new role and explore three critical platform engineering developments reshaping the industry. **What You'll Learn:** • The explosive rise of forward-deployed engineers and why they're commanding premium salaries • Real-world case studies from Snowflake and financial services implementations • Three essential skill areas every successful FDE needs to master • How Artera is revolutionizing prostate cancer diagnostics with AWS architecture • Cloudflare's innovative approach to vertical microfrontends • Advanced PostgreSQL debugging techniques with Datadog's EXPLAIN ANALYZE **Episode Chapters:** 0:00 Cold Open - The 40% salary premium mystery 1:30 Introduction & Today's Focus 3:00 Deep Dive Act 1 - Forward-Deployed Engineers Explained 8:45 Deep Dive Act 2 - Real-World Analysis & Case Studies Whether you're a platform engineer looking to advance your career or an engineering leader trying to understand this emerging role, this episode provides actionable insights backed by real industry data and case studies. **Sources & References:** - Why the forward-deployed engineer is tech's hottest job: https://thenewstack.io/why-the-forward-deployed-engineer-is-techs-hottest-job/ - How Artera enhances prostate cancer diagnostics using AWS: https://aws.amazon.com/blogs/architecture/how-artera-enhances-prostate-cancer-diagnostics-using-aws/ - Building vertical microfrontends on Cloudflare's platform: https://blog.cloudflare.com/vertical-microfrontends/ - Debug PostgreSQL query latency faster with EXPLAIN ANALYZE in Datadog Database Monitoring: https://www.datadoghq.com/blog/database-monitoring-explain-analyze/ #PlatformEngineering #DevOps #CloudNative #Kubernetes

    12 min

About

The Platform Engineering Playbook Podcast is where AI meets open-source infrastructure knowledge—and you're part of the editorial process. Every episode is researched, scripted, and produced with AI, then reviewed by the community and published on GitHub for anyone to improve. Facing tool sprawl across 130+ platforms? Justifying PaaS costs to your CFO? Navigating the Shadow AI crisis hitting 85% of organizations? We tackle the messy realities of platform engineering that most content avoids, delivering data-backed insights and decision frameworks you can use Monday morning. Built for senior engineers, SREs, and DevOps practitioners with 5+ years in production, we dissect cloud economics, AI governance, infrastructure trade-offs, and career strategy—with the receipts to back it up. Think we got something wrong? Have better data? Open a pull request at platformengineeringplaybook.com. This is infrastructure podcasting as a living document, where the community keeps us honest and the content gets better with every contribution. Read the playbook at https://platformengineeringplaybook.com