Timestamps:
00:00 - Intro
01:24 - Technical Demo
09:48 - Results
11:02 - Intermission
11:57 - Considerations
15:48 - Conclusion
In this video, we explore distributed inference using vLLM and Ray. To demonstrate this exciting functionality, we set up two nodes: one equipped with two RTX 3090 Ti GPUs and another with two RTX 3060 GPUs. After configuring the nodes, we test distributed inference by loading a model across both nodes, enabling interaction with a fully distributed inference setup.
Join us as we dive into the technical details, share results, and discuss considerations for using distributed inference in your own projects!