vLLM on Kubernetes in Production

Kubesimplify 5,841 11 months ago

Video Not Working? Fix It Now

vLLM is a fast and easy-to-use library for LLM inference and serving. In this video, we go through the basics of vLLM, how to run it locally, and then how to run it on Kubernetes in production with GPU-attached nodes via a DaemonSet. It includes a hands-on demo explaining vLLM deployment in production. Blog post: https://opensauced.pizza/blog/how-we-saved-thousands-of-dollars-deploying-low-cost-open-source-ai-technologies John McBride(@JohnCodes) ►►►Connect with me ►►► ► Kubesimplify: https://kubesimplify.com/newsletter ► Newsletter: https://saiyampathak.com/newsletter ► Discord: https://saiyampathak.com/discord ► Twitch: https://saiyampathak.com/twitch ► YouTube: https://saiyampathak.com/youtube.com ► GitHub: https://github.com/saiyam1814 ► LinkedIn: https://www.linkedin.com/in/saiyampathak/ ► Website: https://saiyampathak.medium.com/ ► Instagram: http://instagram.com/saiyampathak/ ► https://twitter.com/saiyampathak

Comment