MENU

Fun & Interesting

vLLM Office Hours - Advanced Techniques for Maximizing vLLM Performance - September 19, 2024

Neural Magic 3,041 lượt xem 7 months ago
Video Not Working? Fix It Now

In this session of Neural Magic's bi-weekly vLLM office hours, we cover the latest updates in vLLM v0.6.0 and v0.6.1, including Vision LM support for Pixtral and Qwen2-VL, and tool-use support for Mistral and Qwen2.5. We also delve into advanced techniques for maximizing inference performance in large language models, highlighting key optimizations that deliver 2.7x throughput improvements and a 5x reduction in latency.

Session slides: https://docs.google.com/presentation/d/1vgt63f5Jl2HHrtHbNY5m9Vpgfi2RjaKC

Join our next vLLM office hours: https://hubs.li/Q02Y5Pbh0

Comment