Platform Engineering Playbook Podcast

vibesre

The Platform Engineering Playbook Podcast is where AI meets open-source infrastructure knowledge—and you're part of the editorial process. Every episode is researched, scripted, and produced with AI, then reviewed by the community and published on GitHub for anyone to improve. Facing tool sprawl across 130+ platforms? Justifying PaaS costs to your CFO? Navigating the Shadow AI crisis hitting 85% of organizations? We tackle the messy realities of platform engineering that most content avoids, delivering data-backed insights and decision frameworks you can use Monday morning. Built for senior engineers, SREs, and DevOps practitioners with 5+ years in production, we dissect cloud economics, AI governance, infrastructure trade-offs, and career strategy—with the receipts to back it up. Think we got something wrong? Have better data? Open a pull request at platformengineeringplaybook.com. This is infrastructure podcasting as a living document, where the community keeps us honest and the content gets better with every contribution. Read the playbook at https://platformengineeringplaybook.com

  1. 3 HR AGO

    Databricks Lakebase vs Postgres: The AI Database Shift

    **Is PostgreSQL really obsolete for AI workloads?** Databricks just dropped Lakebase and it's shaking up everything we thought we knew about database architecture for machine learning pipelines. In today's Platform Engineering Playbook, we're diving deep into Databricks' game-changing announcement and what it means for your data infrastructure strategy. Plus, we're covering the week's biggest platform engineering news that's reshaping how we build scalable systems. **What You'll Learn:** • Why Databricks believes traditional PostgreSQL falls short for AI workloads • Technical breakdown of Lakebase architecture and its key innovations • Practical decision framework: when to adopt Lakebase vs. stick with existing solutions • AWS expands Elemental Media Services to Malaysia • Elastic Cloud Serverless doubles Azure region availability • Hybrid Kubernetes strategies for enterprise-scale deployments • OpenTelemetry's 2025 achievements and 2026 roadmap **Timestamps:** 0:00 Cold Open - PostgreSQL vs AI Reality Check 2:15 Databricks Lakebase Deep Dive 15:30 Platform Engineering News Roundup Whether you're architecting data platforms, evaluating database solutions for ML workloads, or staying current with cloud-native trends, this episode delivers actionable insights you can implement immediately. **Sources & References:** • https://www.infoq.com/news/2026/02/databricks-lakebase-postgresql/ • https://aws.amazon.com/about-aws/whats-new/2026/02/elemental-Malaysia/ • https://www.elastic.co/blog/elastic-cloud-now-available-azure-virginia-singapore-spain-frankfurt • https://aws.amazon.com/blogs/containers/running-containerized-hybrid-nodes-with-amazon-elastic-kubernetes-service/ • https://cloudnativenow.com/contributed-content/hybrid-cloud-at-enterprise-scale-private-kubernetes-for-portability-and-control/ • https://opentelemetry.io/blog/2026/2025-year-in-review/ #PlatformEngineering #DevOps #CloudNative #Kubernetes

    19 min
  2. 1 DAY AGO

    How to Secure AI Agents with MCP, OPA & Ephemeral Runners

    **Your AI agents have root access to your infrastructure right now - and you don't even know it.** What happens when we give AI agents the keys to our entire platform? In today's Platform Engineering Playbook, we dive deep into the hidden security risks of AI infrastructure automation and explore practical solutions for implementing least-privilege access controls. **What You'll Learn:** • How to secure AI agents with least-privilege gateway patterns using MCP and OPA • Databricks' new Lakebase PostgreSQL database designed specifically for AI workloads • Uber's Uforwarder: A scalable Kafka consumer proxy revolutionizing event-driven microservices • Why Kubernetes 1.35 signals the future of AI orchestration • Latest AWS updates including Claude Sonnet 4.6 in Bedrock and new agent plugins **Timestamps:** 0:00 - Cold Open: The AI Security Wake-Up Call 2:15 - Platform Engineering News Roundup 8:30 - Deep Dive: Securing AI Infrastructure Access 15:45 - Real-World Implementation Strategies Perfect for platform engineers, DevOps professionals, and infrastructure teams navigating the intersection of AI and cloud-native technologies. Get actionable insights to secure your AI-driven infrastructure before it's too late. **Sources & References:** - Building a Least-Privilege AI Agent Gateway: https://www.infoq.com/articles/building-ai-agent-gateway-mcp/ - Databricks Lakebase PostgreSQL: https://www.infoq.com/news/2026/02/databricks-lakebase-postgresql/ - KubeCon SecurityCon Deep Dive: https://www.cncf.io/blog/2026/02/23/kubecon-cloudnativecon-europe-2026-co-located-event-deep-dive-open-source-securitycon/ - Uber's Uforwarder: https://www.infoq.com/news/2026/02/uber-uforwarder-kafka-push-proxy/ - AWS Weekly Roundup: https://aws.amazon.com/blogs/aws/aws-weekly-roundup-claude-sonnet-4-6-in-amazon-bedrock-kiro-in-govcloud-regions-new-agent-plugins-and-more-february-23-2026/ - Kubernetes 1.35 AI Signals: https://www.cncf.io/blog/2026/02/23/kubernetes-as-ais-operating-system-1-35-release-signals/ #PlatformEngineering #DevOps #CloudNative #Kubernetes

    20 min
  3. 2 DAYS AGO

    Cloudflare Takes Down the Internet Again — With a Config Change

    **What happens when a single configuration change takes down 20% of the internet for six hours?** In this episode of Platform Engineering Playbook, we dissect the massive Cloudflare outage from February 20th, 2026 - a catastrophic failure that started with a routine BYOIP pipeline update and ended with Cloudflare accidentally deleting their own customers' networks. **What You'll Learn:** • The technical breakdown of how Cloudflare's configuration change cascaded into a global outage • Critical lessons for platform engineers about configuration management and deployment pipelines • Real-world AI use cases that are actually working in production environments • Infrastructure gaps that are secretly sabotaging AI productivity initiatives • HTTP/3 implementation strategies using nginx and FreeBSD **Episode Timestamps:** 0:00 - Cold Open: The 30-minute warning 2:30 - Today's Platform Engineering News 8:15 - Deep Dive Act 1: What Really Happened at Cloudflare Whether you're building resilient infrastructure or implementing AI tooling, this episode delivers actionable insights to help you avoid similar disasters and build more robust platform engineering practices. **Sources & References:** - Cloudflare outage on February 20, 2026: https://blog.cloudflare.com/cloudflare-outage-february-20-2026/ - What's your best use case for AI in your company so far?: https://www.reddit.com/r/sysadmin/comments/1rasadb/whats_your_best_use_case_for_ai_in_your_company/ - This simple infrastructure gap is holding back AI productivity: https://thenewstack.io/this-simple-infrastructure-gap-is-holding-back-ai-productivity/ - HTTP/3 on FreeBSD: Getting QUIC Working with nginx in a Bastille Jail: https://blog.hofstede.it/http3-on-freebsd-getting-quic-working-with-nginx-in-a-bastille-jail/ #PlatformEngineering #DevOps #CloudNative #Kubernetes

    17 min
  4. 5 DAYS AGO

    The Next Platform Engineer: AI + Observability + FinOps

    **Is AI about to revolutionize how we build infrastructure? The CNCF CTO says we're not prepared for what's coming.** In this episode of Platform Engineering Playbook, we dive deep into the future of cloud native infrastructure and why 2026 might be the year everything changes. Based on Chris Aniszczyk's latest insights, we explore how AI agents are moving beyond just consuming our platforms to actively designing and managing them. **What You'll Learn:** • How AI is reshaping platform engineering workflows and decision-making • Why current Kubernetes evolution patterns may not be sustainable • Practical strategies for platform engineers to prepare for AI-driven infrastructure • Key takeaways from the CNCF's 2026 observability trends **Episode Chapters:** 0:00 Cold Open - AI's Infrastructure Revolution 2:15 Today's Platform Engineering News 8:30 Deep Dive: CNCF CTO's 2026 Predictions 15:45 Technical Analysis: Kubernetes at Scale Whether you're building internal developer platforms or managing cloud native infrastructure at scale, this episode provides actionable insights for navigating the intersection of AI and platform engineering. **Sources & References:** - State of cloud native 2026: CNCF CTO's insights: https://www.cncf.io/blog/2026/02/19/state-of-cloud-native-2026-cncf-ctos-insights-and-predictions/ - CNCF 2026 Observability Summit Schedule: https://www.cncf.io/announcements/2026/02/18/cncf-releases-2026-observability-summit-north-america-schedule-as-cloud-native-observability-adoption-expands/ - DevOps Modernization with AI Agents: https://www.infoq.com/presentations/devops-modernization-ai-agents/?utm_campaign=infoq_content&utm_source=infoq&utm_medium=feed&utm_term=global - Amazon Connect Cases AWS Service Quotas: https://aws.amazon.com/about-aws/whats-new/2026/02/amazon-connect-cases-aws-service-quotas - Cloudflare HTTP 5xx Errors Incident: https://www.cloudflarestatus.com/incidents/xhmtd6x13cw1 #PlatformEngineering #DevOps #CloudNative #Kubernetes

    18 min
  5. 6 DAYS AGO

    Ray + Kubernetes: The Production AI Stack Explained

    **Why do 92% of ML models never reach production?** It's not a code problem—it's a platform engineering problem. In today's episode of Platform Engineering Playbook, we tackle the massive infrastructure gap that's keeping AI initiatives stuck in notebooks while your data science teams wonder why their brilliant models never see the light of day. **What You'll Learn:** ✅ The real reasons ML models fail to reach production (hint: it's your infrastructure) ✅ How to architect production-ready AI infrastructure using Ray on Kubernetes ✅ Practical strategies for platform engineers supporting data science teams ✅ Enterprise GitOps scaling from single clusters to fleet management **Episode Breakdown:** 0:00 Cold Open - The 92% problem 2:15 Industry News Roundup 8:30 Deep Dive: From Notebooks to Production 15:45 Architecture Analysis: Ray on Kubernetes **Today's Platform Engineering News:** • Datadog's new audit-ready compliance reporting • Amazon Bedrock transforming HR talent acquisition • The hidden cost of burning out your on-call engineers • Enterprise GitOps fleet management strategies Whether you're struggling with ML infrastructure or just want to stay ahead of platform engineering trends, this episode gives you actionable insights you can implement today. **Sources & References:** - From notebooks to nodes: Architecting production-ready AI infrastructure: https://thenewstack.io/production-ai-infrastructure-guide/ - Generate audit-ready vulnerability and compliance reports with Datadog Sheets: https://www.datadoghq.com/blog/audit-reports-datadog-sheets/ - AI meets HR: Transforming talent acquisition with Amazon Bedrock: https://aws.amazon.com/blogs/machine-learning/ai-meets-hr-transforming-talent-acquisition-with-amazon-bedrock/ - Is your on-call rotation quietly burning out top talent?: https://thenewstack.io/sustainable-on-call-strategies/ - How to scale GitOps in the enterprise: From single cluster to fleet management: https://platformengineering.org/blog/how-to-scale-gitops-in-the-enterprise #PlatformEngineering #DevOps #CloudNative #Kubernetes

    18 min
  6. 18 FEB

    Replace 5 Databases with 1? SurrealDB for AI Agents Explained

    Your AI agents are using five different databases right now - and you don't even know it. This database sprawl is silently killing your platform's performance and your team's sanity. In today's Platform Engineering Playbook, we dive deep into SurrealDB's multi-model approach and how it's revolutionizing AI infrastructure. Plus, breaking news on vulnerability management patterns that every platform engineer needs to understand. **What You'll Learn:** • Why database proliferation is the hidden killer of AI agent performance • SurrealDB's architecture deep dive and real-world deployment strategies • When (and when NOT) to consolidate your AI infrastructure databases • The contextual SBOM pattern transforming vulnerability management • India's massive $200B AI infrastructure play and what it means for the industry **Timestamps:** 0:00 Cold Open - The Database Sprawl Crisis 2:15 SurrealDB Deep Dive - Architecture & Implementation 15:30 Practical Takeaways - When to Use Multi-Model Databases **Why Listen?** Get actionable insights from real platform engineering challenges, not theoretical fluff. We break down complex infrastructure decisions into practical guidance you can implement today. Perfect for platform engineers, DevOps teams, and infrastructure architects building scalable AI systems. **Sources & References:** • SurrealDB Docker Extension: https://www.docker.com/blog/deploy-surrealdb-docker-desktop-extension/ • Spectral Collapse in Diffusion Inversion: https://arxiv.org/abs/2602.13303 • India AI Infrastructure Investment: https://techcrunch.com/2026/02/17/india-bids-to-attract-over-200b-in-ai-infrastructure-investment-by-2028/ • Contextual SBOM Pattern: https://developers.redhat.com/articles/2026/02/17/how-contextual-sbom-pattern-improves-vulnerability-management #PlatformEngineering #DevOps #CloudNative #Kubernetes

    19 min
  7. 17 FEB

    Agoda’s API Agent Turns Any API into MCP — No Code, No Deployments

    **What if API integration nightmares could disappear without writing a single line of code?** Agoda just dropped a game-changing solution that transforms any API into MCP (Model Context Protocol) with zero deployments - and it's about to reshape how platform teams approach AI integrations. In today's Platform Engineering Playbook, we break down this revolutionary no-code approach and explore what it means for enterprise platform strategies. Plus, we dive into Docker's latest sandbox capabilities with NanoClaw, performance testing breakthroughs for Identity Management systems using encrypted DNS in OpenShift, and the emerging patterns for running AI coding agents on Kubernetes. **What You'll Learn:** ✅ How Agoda's API Agent eliminates integration complexity ✅ The three-layer architecture powering zero-code API transformations   ✅ Real-world implications for platform engineering teams ✅ Docker's new sandboxing capabilities for secure code execution ✅ Advanced load testing strategies for IdM systems with eDNS and CoreDNS **Timestamps:** 00:00 Cold Open - The API Integration Revolution 02:15 Deep Dive Act 1 - Agoda's Game-Changing Approach 08:30 Deep Dive Act 2 - Architecture Deep Dive 15:45 Deep Dive Act 3 - Platform Team Takeaways Perfect for platform engineers, DevOps teams, and technical leaders navigating the AI-platform integration landscape. **Sources & References:** - Agoda's API Agent: https://www.infoq.com/news/2026/02/agoda-api-agent/ - Docker NanoClaw Sandboxes: https://www.docker.com/blog/run-nanoclaw-in-docker-shell-sandboxes/ - IdM Load Testing with eDNS: https://developers.redhat.com/articles/2026/02/16/load-testing-idm-edns-coredns-openshift - Kubernetes for AI Agents: https://cloudnativenow.com/features/gas-town-what-kubernetes-for-ai-coding-agents-actually-looks-like/ #PlatformEngineering #DevOps #CloudNative #Kubernetes

    19 min
  8. 16 FEB

    LocalStack Kills Community Edition: What Breaks in March

    **LocalStack just killed their open-source edition - but what does this really mean for your platform engineering stack?** In today's episode of Platform Engineering Playbook, we break down LocalStack's shocking decision to discontinue their Community Edition and what it means for teams relying on AWS local development. Plus, we dive into the ripple effects across the developer ecosystem and provide a practical decision framework for your next moves. **What You'll Learn:** • Why LocalStack's pricing shift from free to $39/month matters for platform teams • Decision frameworks for evaluating local development alternatives • How AI is revolutionizing code deployment at Spotify • The surprising exodus from computer science programs and where students are heading • Insider Claude coding tips from the engineer who built it • Why Hollywood is concerned about Seedance 2.0's video generation capabilities **Episode Chapters:** 0:00 Cold Open - LocalStack's Open Source Bombshell 2:15 Deep Dive Act 1 - The Setup 8:30 Deep Dive Act 2 - Pricing Analysis & Impact Whether you're managing platform infrastructure or building developer tooling, this episode gives you the insights and frameworks to navigate these industry shifts strategically. **Sources & References:** • LocalStack Community Edition Concerns: https://www.infoq.com/news/2026/02/localstack-aws-community/?utm_campaign=infoq_content&utm_source=infoq&utm_medium=feed&utm_term=global • Seedance 2.0 Hollywood Controversy: https://techcrunch.com/2026/02/15/hollywood-isnt-happy-about-the-new-seedance-2-0-video-generator/ • Spotify AI Code Deployment: https://nextunicorn.ventures/ai-revolutionizes-code-deployment-at-spotify/ • CS Student Exodus Analysis: https://techcrunch.com/2026/02/15/the-great-computer-science-exodus-and-where-students-are-going-instead/ • Claude Coding Tips: https://www.anup.io/35-claude-code-tips-from-the-guy-who-built-it/ #PlatformEngineering #DevOps #CloudNative #Kubernetes

    16 min

About

The Platform Engineering Playbook Podcast is where AI meets open-source infrastructure knowledge—and you're part of the editorial process. Every episode is researched, scripted, and produced with AI, then reviewed by the community and published on GitHub for anyone to improve. Facing tool sprawl across 130+ platforms? Justifying PaaS costs to your CFO? Navigating the Shadow AI crisis hitting 85% of organizations? We tackle the messy realities of platform engineering that most content avoids, delivering data-backed insights and decision frameworks you can use Monday morning. Built for senior engineers, SREs, and DevOps practitioners with 5+ years in production, we dissect cloud economics, AI governance, infrastructure trade-offs, and career strategy—with the receipts to back it up. Think we got something wrong? Have better data? Open a pull request at platformengineeringplaybook.com. This is infrastructure podcasting as a living document, where the community keeps us honest and the content gets better with every contribution. Read the playbook at https://platformengineeringplaybook.com

You Might Also Like