Managing 100s of Kubernetes Clusters using Cluster API, with Zain Malik

KubeFM

Discover how to manage Kubernetes at scale with declarative infrastructure and automation principles.

Zain Malik shares his experience managing multi-tenant Kubernetes clusters with up to 30,000 pods across clusters capped at 950 nodes. He explains how his team transitioned from Terraform to Cluster API for declarative cluster lifecycle management, contributing upstream to improve AKS support while implementing GitOps workflows.

You will learn:

  • How to address challenges in large-scale Kubernetes operations, including node pool management inconsistencies and lengthy provisioning times

  • Why Cluster API provides a powerful foundation for multi-cloud cluster management, and how to extend it with custom operators for production-specific needs

  • How implementing GitOps principles eliminates manual intervention in critical operations like cluster upgrades

  • Strategies for handling production incidents and bugs when adopting emerging technologies like Cluster API

Sponsor

This episode is sponsored by Learnk8s — get started on your Kubernetes journey through comprehensive online, in-person or remote training.

More info

  • Find all the links and info for this episode here: https://ku.bz/5PLksqVlk

  • Interested in sponsoring an episode? Learn more.

To listen to explicit episodes, sign in.

Stay up to date with this show

Sign in or sign up to follow shows, save episodes and get the latest updates.

Select a country or region

Africa, Middle East, and India

Asia Pacific

Europe

Latin America and the Caribbean

The United States and Canada