vLLM Office Hours - FP8 Quantization Deep Dive - July 9, 2024

Neural Magic 2,344 9 months ago

Video Not Working? Fix It Now

In this session, we brought on vLLM Committers from Anyscale to give an in-depth dive into FP8 quantization. They discussed why FP8 is important, how to get started with FP8 in vLLM, and shared quality and performance results of FP8 quantization. We also covered the latest updates in vLLM v0.5.1, including pipeline parallelism and model support for Gemma 2, Jamba, and DeepSeek-V2. For more details, check out the session slides here: https://docs.google.com/presentation/d/1rPRibjxqqJR-qV-CVq0q0Z-KStHw5aaK Join our bi-weekly vLLM office hours to stay current with vLLM, ask questions, meet the community, and give feedback: https://hubs.li/Q02Y5Pbh0

Comment