OCT 16
42 MIN

Offloading LLM Attention: Q-Shipping and KV-Side Compute

The source provides an extensive overview of strategies, collectively termed Q-shipping and KV-side compute, aimed at overcoming the memory bandwidth bottleneck during Large Language Model (LLM) inference, particularly in the decode phase

Episode Webpage

Show

The Gist Talk
Published

October 16, 2025 at 10:47 AM UTC
Length

42 min
Rating

Clean

Offloading LLM Attention: Q-Shipping and KV-Side Compute

Information