21/07/2025
T1, E9
39MIN

Smaller Models, Same Power: How SLIM Shrinks LLMs Without Retraining | EP #09 Executive Code

Can you shrink a large language model without retraining—and still keep the accuracy? In this episode, Kirim talks to Mohammad Mozaffari, PhD candidate at the University of Toronto and author of SLIM, a new one-shot compression method that combines quantization, pruning, and low-rank adapters. Together, they unpack the math, the hardware limits, and the tradeoffs behind making LLMs run efficiently on smaller devices—without sacrificing performance. Dive deeper into the research discussed in this episode: - Compression Trinity for LLMs: https://www.cs.toronto.edu/~mmozaffari/compression-trinity/index.html - BEAM (Blockwise Error Minimization): https://www.cs.toronto.edu/~mmozaffari/compression-trinity/beam/index.html - SLiM Paper: https://arxiv.org/abs/2410.09615 ------ Connect with Mohammad Mozaffari LinkedIn: /mohammad-mozaffari-7804b7187 Connect with A. Kirimgeray Kirimli LinkedIn: / a-kirimgeray-k ---- Flatiron Software: https://flatiron.software Snapshot AI: https://www.snapshot.reviews

Podcast

Executive Code
Publicado

21 de julho de 2025 às 03:00 UTC
Temporada

1
Episódio

9

Smaller Models, Same Power: How SLIM Shrinks LLMs Without Retraining | EP #09 Executive Code

Informações