Platform Engineering Playbook Podcast

vibesre

The Platform Engineering Playbook Podcast is where AI meets open-source infrastructure knowledge—and you're part of the editorial process. Every episode is researched, scripted, and produced with AI, then reviewed by the community and published on GitHub for anyone to improve. Facing tool sprawl across 130+ platforms? Justifying PaaS costs to your CFO? Navigating the Shadow AI crisis hitting 85% of organizations? We tackle the messy realities of platform engineering that most content avoids, delivering data-backed insights and decision frameworks you can use Monday morning. Built for senior engineers, SREs, and DevOps practitioners with 5+ years in production, we dissect cloud economics, AI governance, infrastructure trade-offs, and career strategy—with the receipts to back it up. Think we got something wrong? Have better data? Open a pull request at platformengineeringplaybook.com. This is infrastructure podcasting as a living document, where the community keeps us honest and the content gets better with every contribution. Read the playbook at https://platformengineeringplaybook.com

  1. לפני 6 שע׳

    The End of ingress-nginx: Kubernetes Migration Guide Before 2026

    **70% of Kubernetes clusters will go dark in March 2026 when ingress-nginx support officially ends. Are you ready?** Today's Platform Engineering Playbook dives deep into the massive ingress-nginx migration that's about to impact millions of Kubernetes workloads. We'll break down your migration options, timeline, and practical steps to avoid the chaos. **What You'll Learn:** ✅ Why ingress-nginx is ending support and what it means for your clusters ✅ Complete migration strategies from early adopter teams ✅ Step-by-step playbook for platform engineering teams ✅ Alternative ingress controllers and their trade-offs **Episode Chapters:** 0:00 Cold Open - The ingress-nginx crisis 2:30 Welcome & Today's Platform Engineering News 5:15 Deep Dive: The ingress-nginx EOL situation 12:45 Migration analysis and real-world experiences **Plus:** Pulumi's distributed work scheduling system architecture, observability platform migration strategies with Prometheus and OpenTelemetry, Kubernetes AI inference updates, and SRE database connectivity troubleshooting frameworks. Perfect for platform engineers, DevOps teams, and anyone managing Kubernetes infrastructure at scale. **Sources & References:** - The End of kubernetes/ingress-nginx: Your March 2026 Migration Playbook: https://medium.com/@housemd/kubernetes-ingress-nginx-eol-march-2026-the-complete-migration-guide-to-replace-ingress-nginx-e8f6e118fb5f - How We Built a Distributed Work Scheduling System for Pulumi Cloud: https://www.pulumi.com/blog/how-we-built-a-distributed-work-scheduling-system-for-pulumi-cloud/ - Observability platform migration guide: Prometheus, OpenTelemetry, and Fluent Bit: https://thenewstack.io/observability-platform-migration-guide/ - Kubernetes WG Serving concludes following successful advancement of AI inference support: https://www.cncf.io/blog/2026/02/26/kubernetes-wg-serving-concludes-following-successful-advancement-of-ai-inference-support/ - A Unified Framework for SRE to Troubleshoot Database Connectivity in Kubernetes Cloud Applications: https://feeds.dzone.com/link/23568/17283905/sre-database-connectivity-troubleshooting-kubernetes #PlatformEngineering #DevOps #CloudNative #Kubernetes

    20 דק׳
  2. אתמול

    Claude Code Remote Control Changes Developer Workflows

    **What if 87% of developer productivity loss just became a thing of the past?**  Anthropic's Claude Computer Use capability is reshaping how platform engineers think about developer workflows, and today we're breaking down exactly what this means for your platform strategy. **In this episode:** • **Deep dive into Claude's Computer Use** - How remote control capabilities are eliminating context switching between development environments • **Technical analysis** - Session management, security implications, and integration patterns for platform teams • **Practical evaluation framework** - Should your platform team adopt Claude Code? We'll give you the decision matrix • **Platform engineering news roundup** - Self-service observability with OpenTelemetry, hidden costs of "automated" infrastructure, and real-world IT scaling challenges **Timestamps:** 0:00 - Cold Open: The Context Switching Crisis 2:15 - Today's Platform Engineering Headlines   8:30 - Deep Dive: Claude Computer Use Breakdown Whether you're architecting developer platforms or evaluating AI tooling for your engineering org, this episode delivers actionable insights you can implement immediately. **Sources & References:** • Claude Code Remote Control: https://code.claude.com/docs/en/remote-control • Self-service observability guide: https://platformengineering.org/blog/self-service-observability • Infrastructure hidden costs analysis: https://thenewstack.io/automated-infrastructure-hidden-costs/ • IT scaling discussion: https://www.reddit.com/r/sysadmin/comments/1redz97/2man_it_team_solo_admin_for_300_users_no_raise/ • Data sovereignty policy update: https://techcrunch.com/2026/02/25/us-tells-diplomats-to-lobby-against-foreign-data-sovereignty-laws/ #PlatformEngineering #DevOps #CloudNative #Kubernetes

    19 דק׳
  3. לפני יומיים

    Databricks Lakebase vs Postgres: The AI Database Shift

    **Is PostgreSQL really obsolete for AI workloads?** Databricks just dropped Lakebase and it's shaking up everything we thought we knew about database architecture for machine learning pipelines. In today's Platform Engineering Playbook, we're diving deep into Databricks' game-changing announcement and what it means for your data infrastructure strategy. Plus, we're covering the week's biggest platform engineering news that's reshaping how we build scalable systems. **What You'll Learn:** • Why Databricks believes traditional PostgreSQL falls short for AI workloads • Technical breakdown of Lakebase architecture and its key innovations • Practical decision framework: when to adopt Lakebase vs. stick with existing solutions • AWS expands Elemental Media Services to Malaysia • Elastic Cloud Serverless doubles Azure region availability • Hybrid Kubernetes strategies for enterprise-scale deployments • OpenTelemetry's 2025 achievements and 2026 roadmap **Timestamps:** 0:00 Cold Open - PostgreSQL vs AI Reality Check 2:15 Databricks Lakebase Deep Dive 15:30 Platform Engineering News Roundup Whether you're architecting data platforms, evaluating database solutions for ML workloads, or staying current with cloud-native trends, this episode delivers actionable insights you can implement immediately. **Sources & References:** • https://www.infoq.com/news/2026/02/databricks-lakebase-postgresql/ • https://aws.amazon.com/about-aws/whats-new/2026/02/elemental-Malaysia/ • https://www.elastic.co/blog/elastic-cloud-now-available-azure-virginia-singapore-spain-frankfurt • https://aws.amazon.com/blogs/containers/running-containerized-hybrid-nodes-with-amazon-elastic-kubernetes-service/ • https://cloudnativenow.com/contributed-content/hybrid-cloud-at-enterprise-scale-private-kubernetes-for-portability-and-control/ • https://opentelemetry.io/blog/2026/2025-year-in-review/ #PlatformEngineering #DevOps #CloudNative #Kubernetes

    19 דק׳
  4. לפני 3 ימים

    How to Secure AI Agents with MCP, OPA & Ephemeral Runners

    **Your AI agents have root access to your infrastructure right now - and you don't even know it.** What happens when we give AI agents the keys to our entire platform? In today's Platform Engineering Playbook, we dive deep into the hidden security risks of AI infrastructure automation and explore practical solutions for implementing least-privilege access controls. **What You'll Learn:** • How to secure AI agents with least-privilege gateway patterns using MCP and OPA • Databricks' new Lakebase PostgreSQL database designed specifically for AI workloads • Uber's Uforwarder: A scalable Kafka consumer proxy revolutionizing event-driven microservices • Why Kubernetes 1.35 signals the future of AI orchestration • Latest AWS updates including Claude Sonnet 4.6 in Bedrock and new agent plugins **Timestamps:** 0:00 - Cold Open: The AI Security Wake-Up Call 2:15 - Platform Engineering News Roundup 8:30 - Deep Dive: Securing AI Infrastructure Access 15:45 - Real-World Implementation Strategies Perfect for platform engineers, DevOps professionals, and infrastructure teams navigating the intersection of AI and cloud-native technologies. Get actionable insights to secure your AI-driven infrastructure before it's too late. **Sources & References:** - Building a Least-Privilege AI Agent Gateway: https://www.infoq.com/articles/building-ai-agent-gateway-mcp/ - Databricks Lakebase PostgreSQL: https://www.infoq.com/news/2026/02/databricks-lakebase-postgresql/ - KubeCon SecurityCon Deep Dive: https://www.cncf.io/blog/2026/02/23/kubecon-cloudnativecon-europe-2026-co-located-event-deep-dive-open-source-securitycon/ - Uber's Uforwarder: https://www.infoq.com/news/2026/02/uber-uforwarder-kafka-push-proxy/ - AWS Weekly Roundup: https://aws.amazon.com/blogs/aws/aws-weekly-roundup-claude-sonnet-4-6-in-amazon-bedrock-kiro-in-govcloud-regions-new-agent-plugins-and-more-february-23-2026/ - Kubernetes 1.35 AI Signals: https://www.cncf.io/blog/2026/02/23/kubernetes-as-ais-operating-system-1-35-release-signals/ #PlatformEngineering #DevOps #CloudNative #Kubernetes

    20 דק׳
  5. לפני 4 ימים

    Cloudflare Takes Down the Internet Again — With a Config Change

    **What happens when a single configuration change takes down 20% of the internet for six hours?** In this episode of Platform Engineering Playbook, we dissect the massive Cloudflare outage from February 20th, 2026 - a catastrophic failure that started with a routine BYOIP pipeline update and ended with Cloudflare accidentally deleting their own customers' networks. **What You'll Learn:** • The technical breakdown of how Cloudflare's configuration change cascaded into a global outage • Critical lessons for platform engineers about configuration management and deployment pipelines • Real-world AI use cases that are actually working in production environments • Infrastructure gaps that are secretly sabotaging AI productivity initiatives • HTTP/3 implementation strategies using nginx and FreeBSD **Episode Timestamps:** 0:00 - Cold Open: The 30-minute warning 2:30 - Today's Platform Engineering News 8:15 - Deep Dive Act 1: What Really Happened at Cloudflare Whether you're building resilient infrastructure or implementing AI tooling, this episode delivers actionable insights to help you avoid similar disasters and build more robust platform engineering practices. **Sources & References:** - Cloudflare outage on February 20, 2026: https://blog.cloudflare.com/cloudflare-outage-february-20-2026/ - What's your best use case for AI in your company so far?: https://www.reddit.com/r/sysadmin/comments/1rasadb/whats_your_best_use_case_for_ai_in_your_company/ - This simple infrastructure gap is holding back AI productivity: https://thenewstack.io/this-simple-infrastructure-gap-is-holding-back-ai-productivity/ - HTTP/3 on FreeBSD: Getting QUIC Working with nginx in a Bastille Jail: https://blog.hofstede.it/http3-on-freebsd-getting-quic-working-with-nginx-in-a-bastille-jail/ #PlatformEngineering #DevOps #CloudNative #Kubernetes

    17 דק׳
  6. 20 בפבר׳

    The Next Platform Engineer: AI + Observability + FinOps

    **Is AI about to revolutionize how we build infrastructure? The CNCF CTO says we're not prepared for what's coming.** In this episode of Platform Engineering Playbook, we dive deep into the future of cloud native infrastructure and why 2026 might be the year everything changes. Based on Chris Aniszczyk's latest insights, we explore how AI agents are moving beyond just consuming our platforms to actively designing and managing them. **What You'll Learn:** • How AI is reshaping platform engineering workflows and decision-making • Why current Kubernetes evolution patterns may not be sustainable • Practical strategies for platform engineers to prepare for AI-driven infrastructure • Key takeaways from the CNCF's 2026 observability trends **Episode Chapters:** 0:00 Cold Open - AI's Infrastructure Revolution 2:15 Today's Platform Engineering News 8:30 Deep Dive: CNCF CTO's 2026 Predictions 15:45 Technical Analysis: Kubernetes at Scale Whether you're building internal developer platforms or managing cloud native infrastructure at scale, this episode provides actionable insights for navigating the intersection of AI and platform engineering. **Sources & References:** - State of cloud native 2026: CNCF CTO's insights: https://www.cncf.io/blog/2026/02/19/state-of-cloud-native-2026-cncf-ctos-insights-and-predictions/ - CNCF 2026 Observability Summit Schedule: https://www.cncf.io/announcements/2026/02/18/cncf-releases-2026-observability-summit-north-america-schedule-as-cloud-native-observability-adoption-expands/ - DevOps Modernization with AI Agents: https://www.infoq.com/presentations/devops-modernization-ai-agents/?utm_campaign=infoq_content&utm_source=infoq&utm_medium=feed&utm_term=global - Amazon Connect Cases AWS Service Quotas: https://aws.amazon.com/about-aws/whats-new/2026/02/amazon-connect-cases-aws-service-quotas - Cloudflare HTTP 5xx Errors Incident: https://www.cloudflarestatus.com/incidents/xhmtd6x13cw1 #PlatformEngineering #DevOps #CloudNative #Kubernetes

    18 דק׳
  7. 19 בפבר׳

    Ray + Kubernetes: The Production AI Stack Explained

    **Why do 92% of ML models never reach production?** It's not a code problem—it's a platform engineering problem. In today's episode of Platform Engineering Playbook, we tackle the massive infrastructure gap that's keeping AI initiatives stuck in notebooks while your data science teams wonder why their brilliant models never see the light of day. **What You'll Learn:** ✅ The real reasons ML models fail to reach production (hint: it's your infrastructure) ✅ How to architect production-ready AI infrastructure using Ray on Kubernetes ✅ Practical strategies for platform engineers supporting data science teams ✅ Enterprise GitOps scaling from single clusters to fleet management **Episode Breakdown:** 0:00 Cold Open - The 92% problem 2:15 Industry News Roundup 8:30 Deep Dive: From Notebooks to Production 15:45 Architecture Analysis: Ray on Kubernetes **Today's Platform Engineering News:** • Datadog's new audit-ready compliance reporting • Amazon Bedrock transforming HR talent acquisition • The hidden cost of burning out your on-call engineers • Enterprise GitOps fleet management strategies Whether you're struggling with ML infrastructure or just want to stay ahead of platform engineering trends, this episode gives you actionable insights you can implement today. **Sources & References:** - From notebooks to nodes: Architecting production-ready AI infrastructure: https://thenewstack.io/production-ai-infrastructure-guide/ - Generate audit-ready vulnerability and compliance reports with Datadog Sheets: https://www.datadoghq.com/blog/audit-reports-datadog-sheets/ - AI meets HR: Transforming talent acquisition with Amazon Bedrock: https://aws.amazon.com/blogs/machine-learning/ai-meets-hr-transforming-talent-acquisition-with-amazon-bedrock/ - Is your on-call rotation quietly burning out top talent?: https://thenewstack.io/sustainable-on-call-strategies/ - How to scale GitOps in the enterprise: From single cluster to fleet management: https://platformengineering.org/blog/how-to-scale-gitops-in-the-enterprise #PlatformEngineering #DevOps #CloudNative #Kubernetes

    18 דק׳
  8. 18 בפבר׳

    Replace 5 Databases with 1? SurrealDB for AI Agents Explained

    Your AI agents are using five different databases right now - and you don't even know it. This database sprawl is silently killing your platform's performance and your team's sanity. In today's Platform Engineering Playbook, we dive deep into SurrealDB's multi-model approach and how it's revolutionizing AI infrastructure. Plus, breaking news on vulnerability management patterns that every platform engineer needs to understand. **What You'll Learn:** • Why database proliferation is the hidden killer of AI agent performance • SurrealDB's architecture deep dive and real-world deployment strategies • When (and when NOT) to consolidate your AI infrastructure databases • The contextual SBOM pattern transforming vulnerability management • India's massive $200B AI infrastructure play and what it means for the industry **Timestamps:** 0:00 Cold Open - The Database Sprawl Crisis 2:15 SurrealDB Deep Dive - Architecture & Implementation 15:30 Practical Takeaways - When to Use Multi-Model Databases **Why Listen?** Get actionable insights from real platform engineering challenges, not theoretical fluff. We break down complex infrastructure decisions into practical guidance you can implement today. Perfect for platform engineers, DevOps teams, and infrastructure architects building scalable AI systems. **Sources & References:** • SurrealDB Docker Extension: https://www.docker.com/blog/deploy-surrealdb-docker-desktop-extension/ • Spectral Collapse in Diffusion Inversion: https://arxiv.org/abs/2602.13303 • India AI Infrastructure Investment: https://techcrunch.com/2026/02/17/india-bids-to-attract-over-200b-in-ai-infrastructure-investment-by-2028/ • Contextual SBOM Pattern: https://developers.redhat.com/articles/2026/02/17/how-contextual-sbom-pattern-improves-vulnerability-management #PlatformEngineering #DevOps #CloudNative #Kubernetes

    19 דק׳

אודות

The Platform Engineering Playbook Podcast is where AI meets open-source infrastructure knowledge—and you're part of the editorial process. Every episode is researched, scripted, and produced with AI, then reviewed by the community and published on GitHub for anyone to improve. Facing tool sprawl across 130+ platforms? Justifying PaaS costs to your CFO? Navigating the Shadow AI crisis hitting 85% of organizations? We tackle the messy realities of platform engineering that most content avoids, delivering data-backed insights and decision frameworks you can use Monday morning. Built for senior engineers, SREs, and DevOps practitioners with 5+ years in production, we dissect cloud economics, AI governance, infrastructure trade-offs, and career strategy—with the receipts to back it up. Think we got something wrong? Have better data? Open a pull request at platformengineeringplaybook.com. This is infrastructure podcasting as a living document, where the community keeps us honest and the content gets better with every contribution. Read the playbook at https://platformengineeringplaybook.com

אולי יעניין אותך