Niels Claeys shares how his team at Dataminded built Conveyor, a data platform processing up to 1.5 million core hours monthly. He explains the specific optimizations they discovered through production experience, from scheduler changes that immediately reduce costs by 10-15% to achieving 97% spot instance usage without reliability issues.
You will learn:
Why the default Kubernetes scheduler wastes money on batch workloads and how switching from "least allocated" to "most allocated" scheduling enables faster scale-down and better resource utilization
How to achieve 97% spot instance adoption through strategic instance type diversification, region selection, and Spark-specific techniques
Node pool design principles that balance Kubernetes overhead with workload efficiency
Platform-specific gotchas like AWS cross-AZ data transfer costs that can spike bills unexpectedly
Sponsor
This episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/hGRfkzDJW
Interested in sponsoring an episode? Learn more.
Information
- Show
- FrequencyUpdated weekly
- Published14 October 2025 at 06:00 UTC
- Season7
- Episode10
- RatingClean