
Smaller Models, Same Power: How SLIM Shrinks LLMs Without Retraining | EP #09 Executive Code
Can you shrink a large language model without retraining—and still keep the accuracy? In this episode, Kirim talks to Mohammad Mozaffari, PhD candidate at the University of Toronto and author of SLIM, a new one-shot compression method that combines quantization, pruning, and low-rank adapters. Together, they unpack the math, the hardware limits, and the tradeoffs behind making LLMs run efficiently on smaller devices—without sacrificing performance.
Dive deeper into the research discussed in this episode:
- Compression Trinity for LLMs: https://www.cs.toronto.edu/~mmozaffari/compression-trinity/index.html
- BEAM (Blockwise Error Minimization): https://www.cs.toronto.edu/~mmozaffari/compression-trinity/beam/index.html
- SLiM Paper: https://arxiv.org/abs/2410.09615
------
Connect with Mohammad Mozaffari
LinkedIn: /mohammad-mozaffari-7804b7187
Connect with A. Kirimgeray Kirimli
LinkedIn: / a-kirimgeray-k
----
Flatiron Software: https://flatiron.software
Snapshot AI: https://www.snapshot.reviews
Informações
- Podcast
- Publicado21 de julho de 2025 às 03:00 UTC
- Temporada1
- Episódio9