As one of the top contributors to the vLLM project, Neural Magic teams up with the vLLM team from UC Berkeley every 2 weeks to host open office hours. Check out our session from our June 5, 2024 session, where we answered some great questions from participants.
We kicked off our June 5th session with a quick recap on vLLM and how Neural Magic can support enterprises today to successfully integrate vLLM as a part of their AI strategy. You'll hear answers to audience questions about post-training quantization, maximizing GPU usage for 70B LLMs, differences between vLLM and Hugging Face TGI, cache management, tensor parallelism, and more. You can see the session slides here: https://docs.google.com/presentation/d/1B50uCXzAarawDDizElNzi2o55fkgJZSm/edit#slide=id.p1
Do you have questions about vLLM that you'd like addressed directly by the experts? Join our next vLLM office hours and post your questions here: https://hubs.li/Q02Y5Pbh0