Run A Local LLM Across Multiple Computers! (vLLM Distributed Inference)

Bijan Bowen 11,485 lượt xem 4 months ago

Video Not Working? Fix It Now

Timestamps:

00:00 - Intro
01:24 - Technical Demo
09:48 - Results
11:02 - Intermission
11:57 - Considerations
15:48 - Conclusion

In this video, we explore distributed inference using vLLM and Ray. To demonstrate this exciting functionality, we set up two nodes: one equipped with two RTX 3090 Ti GPUs and another with two RTX 3060 GPUs. After configuring the nodes, we test distributed inference by loading a model across both nodes, enabling interaction with a fully distributed inference setup.

Join us as we dive into the technical details, share results, and discuss considerations for using distributed inference in your own projects!

Comment