4月7日
单集 17
16 分钟

Advanced LLM Optimization techniques

Welcome to another Data Architecture Elevator podcast! Today's discussion is hosted by Paolo Platter supported by our experts Antonino Ingargiola and Irene Donato.

In this episode, we explore effective strategies for optimizing large language models (LLMs) for inference tasks with multimodal data like audio, text, images, and video.

We discuss the shift from online APIs to hosted models, choosing smaller, task-specific models, and leveraging fine-tuning, distillation, quantization, and tensor fusion techniques. We also highlight the role of specialized inference servers such as Triton and Dynamo, and how Kubernetes helps manage horizontal scaling.

Don't forget to follow us on LinkedIn! Enjoy!

单集网页

节目

Data Architecture Elevator
频率

一月一更
发布时间

2025年4月7日 UTC 08:38
长度

16 分钟
单集

17
分级

儿童适宜

Advanced LLM Optimization techniques

信息