4월 7일
에피소드 17
16분

Advanced LLM Optimization techniques

Welcome to another Data Architecture Elevator podcast! Today's discussion is hosted by Paolo Platter supported by our experts Antonino Ingargiola and Irene Donato.

In this episode, we explore effective strategies for optimizing large language models (LLMs) for inference tasks with multimodal data like audio, text, images, and video.

We discuss the shift from online APIs to hosted models, choosing smaller, task-specific models, and leveraging fine-tuning, distillation, quantization, and tensor fusion techniques. We also highlight the role of specialized inference servers such as Triton and Dynamo, and how Kubernetes helps manage horizontal scaling.

Don't forget to follow us on LinkedIn! Enjoy!

에피소드 웹페이지

프로그램

Data Architecture Elevator
주기

매월 업데이트
발행일

2025년 4월 7일 오전 8:38 UTC
길이

16분
에피소드

17
등급

전체 연령 사용가

Advanced LLM Optimization techniques

정보