In this session, we brought on vLLM Committers from Anyscale to give an in-depth dive into FP8 quantization. They discussed why FP8 is important, how to get started with FP8 in vLLM, and shared quality and performance results of FP8 quantization.
We also covered the latest updates in vLLM v0.5.1, including pipeline parallelism and model support for Gemma 2, Jamba, and DeepSeek-V2.
For more details, check out the session slides here: https://docs.google.com/presentation/d/1rPRibjxqqJR-qV-CVq0q0Z-KStHw5aaK
Join our bi-weekly vLLM office hours to stay current with vLLM, ask questions, meet the community, and give feedback: https://hubs.li/Q02Y5Pbh0