Lecture 22: Hacker's Guide to Speculative Decoding in VLLM

GPU MODE 7,750 11 months ago

Video Not Working? Fix It Now

Abstract: We will discuss how vLLM combines continuous batching with speculative decoding with a focus on enabling external contributors. Topics include proposer/scorer/verifier framework, proposal methods, lookahead scheduling, dynamic speculative decoding, and future contribution ideas. Speaker: Cade Daniel Slides: https://docs.google.com/presentation/d/1p1xE-EbSAnXpTSiSI0gmy_wdwxN5XaULO3AnCWWoRe4/edit

Comment