The New Stack Podcast

The New Stack

The New Stack Podcast is all about the developers, software engineers and operations people who build at-scale architectures that change the way we develop and deploy software. For more content from The New Stack, subscribe on YouTube at: https://www.youtube.com/c/TheNewStack

  1. 19H AGO

    Why You Can't Build AI Without Progressive Delivery

    Former GitHub CEO Thomas Dohmke’s claim that AI-based development requires progressive delivery frames a conversation between analyst James Governor and The New Stack’s Alex Williams about why modern release practices matter more than ever. Governor argues that AI systems behave unpredictably in production: models can hallucinate, outputs vary between versions, and changes are often non-deterministic. Because of this uncertainty, teams must rely on progressive delivery techniques such as feature flags, canary releases, observability, measurement and rollback. These practices, originally developed to improve traditional software releases, now form the foundation for deploying AI safely. Concepts like evaluations, model versioning and controlled rollouts are direct extensions of established delivery disciplines.  Beyond AI, Governor’s book “Progressive Delivery” challenges DevOps thinking itself. He notes that DevOps focuses on development and operations but often neglects the user feedback loop. Using a framework of four A’s — abundance, autonomy, alignment and automation — he argues that progressive delivery reconnects teams with real user outcomes. Ultimately, success isn’t just reliability metrics, but whether users are actually satisfied.  Learn more from The New Stack about progressive delivery:  Mastering Progressive Hydration for Enhanced Web Performance  Continuous Delivery: Gold Standard for Software Development  Join our community of newsletter subscribers to stay on top of the news and at the top of your game.    Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

    28 min
  2. 5D AGO

    How Nutanix Is Taming Operational Complexity

    Most enterprises today run workloads across multiple IT infrastructures rather than a single platform, creating significant operational challenges. According to Nutanix CTO Deepak Goel, organizations face three major hurdles: managing operational complexity amid a shortage of cloud-native skills, migrating legacy virtual machine (VM) workloads to microservices-based cloud-native platforms, and running VM-based workloads alongside containerized applications. Many engineers have deep infrastructure experience but lack Kubernetes expertise, making the transition especially difficult and increasing the learning curve for IT administrators.  To address these issues, organizations are turning to platform engineering and internal developer platforms that abstract infrastructure complexity and provide standardized “golden paths” for deployment. Integrated development environments (IDEs) further reduce friction by embedding capabilities like observability and security.  Nutanix contributes through its hyper converged platform, which unifies compute and storage while supporting both VMs and containers. At KubeCon North America, Nutanix announced version 2.0 of Nutanix Data Services for Kubernetes (NDK), adding advanced data protection, fault-tolerant replication, and enhanced security through a partnership with Canonical to deliver a hardened operating system for Kubernetes environments. Learn more from The New Stack about operational complexity in cloud native environments: Q&A: Nutanix CEO Rajiv Ramaswami on the Cloud Native Enterprise  Kubernetes Complexity Realigns Platform Engineering Strategy  Platform Engineering on the Brink: Breakthrough or Bust?  Join our community of newsletter subscribers to stay on top of the news and at the top of your game. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

    15 min
  3. 5D AGO

    Do All Your AI Workloads Actually Require Expensive GPUs?

    GPUs dominate today’s AI landscape, but Google argues they are not necessary for every workload. As AI adoption has grown, customers have increasingly demanded compute options that deliver high performance with lower cost and power consumption. Drawing on its long history of custom silicon, Google introduced Axion CPUs in 2024 to meet needs for massive scale, flexibility, and general-purpose computing alongside AI workloads. The Axion-based C4A instance is generally available, while the newer N4A virtual machines promise up to 2x price performance. In this episode, Andrei Gueletii, a technical solutions consultant for Google Cloud joined Gari Singh, a product manager for Google Kubernetes Engine (GKE), and Pranay Bakre, a principal solutions engineer at Arm for this episode, recorded at KubeCon + CloudNativeCon North America, in Atlanta. Built on Arm Neoverse V2 cores, Axion processors emphasize energy efficiency and customization, including flexible machine shapes that let users tailor memory and CPU resources. These features are particularly valuable for platform engineering teams, which must optimize centralized infrastructure for cost, FinOps goals, and price performance as they scale. Importantly, many AI tasks—such as inference for smaller models or batch-oriented jobs—do not require GPUs. CPUs can be more efficient when GPU memory is underutilized or latency demands are low. By decoupling workloads and choosing the right compute for each task, organizations can significantly reduce AI compute costs. Learn more from The New Stack about the Axion-based C4A:  Beyond Speed: Why Your Next App Must Be Multi-Architecture Arm: See a Demo About Migrating a x86-Based App to ARM64 Join our community of newsletter subscribers to stay on top of the news and at the top of your game.  Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

    30 min
  4. 6D AGO

    Breaking Data Team Silos Is the Key to Getting AI to Production

    Enterprises are racing to deploy AI services, but the teams responsible for running them in production are seeing familiar problems reemerge—most notably, silos between data scientists and operations teams, reminiscent of the old DevOps divide. In a discussion recorded at AWS re:Invent 2025, IBM’s Thanos Matzanas and Martin Fuentes argue that the challenge isn’t new technology but repeating organizational patterns. As data teams move from internal projects to revenue-critical, customer-facing applications, they face new pressures around reliability, observability, and accountability. The speakers stress that many existing observability and governance practices still apply. Standard metrics, KPIs, SLOs, access controls, and audit logs remain essential foundations, even as AI introduces non-determinism and a heavier reliance on human feedback to assess quality. Tools like OpenTelemetry provide common ground, but culture matters more than tooling. Both emphasize starting with business value and breaking down silos early by involving data teams in production discussions. Rather than replacing observability professionals, AI should augment human expertise, especially in critical systems where trust, safety, and compliance are paramount. Learn more from The New Stack about enabling AI with silos:  Are Your AI Co-Pilots Trapping Data in Isolated Silos? Break the AI Gridlock at the Intersection of Velocity and Trust Taming AI Observability: Control Is the Key to Success Join our community of newsletter subscribers to stay on top of the news and at the top of your game.  Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

    31 min
  5. DEC 16

    Why AI Parallelization Will Be One of the Biggest Challenges of 2026

    Rob Whiteley, CEO of Coder, argues that the biggest winners in today’s AI boom resemble the “picks and shovels” sellers of the California Gold Rush: companies that provide tools enabling others to build with AI. Speaking onThe New Stack Makersat AWS re:Invent, Whiteley described the current AI moment as the fastest-moving shift he’s seen in 25 years of tech. Developers are rapidly adopting AI tools, while platform teams face pressure to approve them, as saying “no” is no longer viable.  Whiteley warns of a widening gap between organizations that extract real value from AI and those that don’t, driven by skills shortages and insufficient investment in training. He sees parallels with the cloud-native transition and predicts the rise of “AI-native” companies. As agentic AI grows, developers increasingly act as managers overseeing many parallel AI agents, creating new challenges around governance, security, and state management. To address this, Coder introduced Mux, an open source coding agent multiplexer designed to help developers manage and evaluate large volumes of AI-generated code efficiently. Learn more from The New Stack about AI Parallelization  The Production Generative AI Stack: Architecture and Components Enable ParallelFrontend/Backend Development to Unlock Velocity Join our community of newsletter subscribers to stay on top of the news and at the top of your game.  Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

    24 min
  6. DEC 11

    Kubernetes GPU Management Just Got a Major Upgrade

    Nvidia Distinguished Engineer Kevin Klues noted that low-level systems work is invisible when done well and highly visible when it fails — a dynamic that frames current Kubernetes innovations for AI. At KubeCon + CloudNativeCon North America 2025, Klues and AWS product manager Jesse Butler discussed two emerging capabilities: dynamic resource allocation (DRA) and a new workload abstraction designed for sophisticated AI scheduling. DRA, now generally available in Kubernetes 1.34, fixes long-standing limitations in GPU requests. Instead of simply asking for a number of GPUs, users can specify types and configurations. Modeled after persistent volumes, DRA allows any specialized hardware to be exposed through standardized interfaces, enabling vendors to deliver custom device drivers cleanly. Butler called it one of the most elegant designs in Kubernetes. Yet complex AI workloads require more coordination. A forthcoming workload abstraction, debuting in Kubernetes 1.35, will let users define pod groups with strict scheduling and topology rules — ensuring multi-node jobs start fully or not at all. Klues emphasized that this abstraction will shape Kubernetes’ AI trajectory for the next decade and encouraged community involvement. Learn more from The New Stack about dynamic resource allocation:  Kubernetes Primer: Dynamic Resource Allocation (DRA) for GPU Workloads Kubernetes v1.34 Introduces Benefits but Also New Blind Spots Join our community of newsletter subscribers to stay on top of the news and at the top of your game.    Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

    35 min
  7. DEC 10

    The Rise of the Cognitive Architect

    At KubeCon North America 2025, GitLab’s Emilio Salvador outlined how developers are shifting from individual coders to leaders of hybrid human–AI teams. He envisions developers evolving into “cognitive architects,” responsible for breaking down large, complex problems and distributing work across both AI agents and humans. Complementing this is the emerging role of the “AI guardian,” reflecting growing skepticism around AI-generated code. Even as AI produces more code, humans remain accountable for reviewing quality, security, and compliance. Salvador also described GitLab’s “AI paradox”: developers may code faster with AI, but overall productivity stalls because testing, security, and compliance processes haven’t kept pace. To fix this, he argues organizations must apply AI across the entire development lifecycle, not just in coding. GitLab’s Duo Agent Platform aims to support that end-to-end transformation. Looking ahead, Salvador predicts the rise of a proactive “meta agent” that functions like a full team member. Still, he warns that enterprise adoption remains slow and advises organizations to start small, build skills, and scale gradually. Learn more from The New Stack about the evolving role of "cognitive architects": The Engineer in the AI Age: The Orchestrator and Architect The New Role of Enterprise Architecture in the AI Era The Architect’s Guide to Understanding Agentic AI Join our community of newsletter subscribers to stay on top of the news and at the top of your game.   Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

    23 min
  8. DEC 9

    Why the CNCF's New Executive Director is Obsessed With Inference

    Jonathan Bryce, the new CNCF executive director, argues that inference—not model training—will define the next decade of computing. Speaking at KubeCon North America 2025, he emphasized that while the industry obsesses over massive LLM training runs, the real opportunity lies in efficiently serving these models at scale. Cloud-native infrastructure, he says, is uniquely suited to this shift because inference requires real-time deployment, security, scaling, and observability—strengths of the CNCF ecosystem.  Bryce believes Kubernetes is already central to modern inference stacks, with projects like Ray, KServe, and emerging GPU-oriented tooling enabling teams to deploy and operationalize models. To bring consistency to this fast-moving space, the CNCF launched a Kubernetes AI Conformance Program, ensuring environments support GPU workloads and Dynamic Resource Allocation. With AI agents poised to multiply inference demand by executing parallel, multi-step tasks, efficiency becomes essential. Bryce predicts that smaller, task-specific models and cloud-native routing optimizations will drive major performance gains. Ultimately, he sees CNCF technologies forming the foundation for what he calls “the biggest workload mankind will ever have.”  Learn more from The New Stack about inference:  Confronting AI’s Next Big Challenge: Inference Compute  Deep Infra Is Building an AI Inference Cloud for Developers  Join our community of newsletter subscribers to stay on top of the news and at the top of your game.  Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

    25 min
4.3
out of 5
31 Ratings

About

The New Stack Podcast is all about the developers, software engineers and operations people who build at-scale architectures that change the way we develop and deploy software. For more content from The New Stack, subscribe on YouTube at: https://www.youtube.com/c/TheNewStack

More From The New Stack

You Might Also Like