Upstream @ AKS

Kaito: AI/ML inference & tuning in Kubernetes: with Ernest Wong: Upstream@AKS: Azure Kubernetes Svc

Lachie Evenson and Ernest Wong discuss Kaito.

KAITO is an operator that automates the AI/ML model inference or tuning workload in a Kubernetes cluster. The target models are popular open-sourced large models such as phi-4 and llama.Related LinksCNCF: https://www.cncf.io/KAITO: https://github.com/kaito-project/kaitoGet involved with KAITO: https://github.com/kaito-project/kaito?tab=readme-ov-file#get-involvedPackaging models as a container image: https://kaito-project.github.io/kaito/docs/model-as-oci-artifactsKAITO preset models: https://kaito-project.github.io/kaito/docs/presetsGPU Operator: https://github.com/NVIDIA/gpu-operatorRetrieval-Augmented Generation (RAG): https://kaito-project.github.io/kaito/docs/rag/