수도리부트

11/16/2024
S1, E2
43 MIN

LLM을 더 빠르게 서빙하는 법, KV 캐싱 & Speculative Decoding

수도리부트

LLM을 더 빠르게 서빙하는 법

- KV 캐싱

- Paged Attention

- vLLM

- Speculative Decoding

- OpenAI 프롬프트 캐싱

- OpenAI Predicted Outputs

Physical intelligence,

- AI 로봇

- Action 모델

Episode Webpage

Show

수도리부트
Frequency

Updated Biweekly
Published

November 16, 2024 at 9:40 AM UTC
Length

43 min
Season

1
Episode

2
Rating

Clean