We discuss Accurate KV Cache Quantization with Outlier Tokens Tracing, a deep dive into improving the efficiency of LLM inference. The authors enhance KV Cache quantization, a technique for reducing memory and compute costs during inference, by introducing a method to identify and exclude outlier tokens that hurt quantization accuracy, striking a better balance between efficiency and performance.
- Read the paper
- Access the slides
- Read the blog
- Join us for Arize Observe
Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
資訊
- 節目
- 頻率每月更新兩次
- 發佈時間2025年6月4日 下午2:00 [UTC]
- 長度25 分鐘
- 年齡分級兒少適宜