Recording of a live meetup on Feb 16, 2022 from our friends at Data + AI Denver/Boulder meetup group.
Meetup details:
Our first talk of the year features Jules Damji, Lead Developer Advocate at Anyscale as he discusses Ray: A Framework for Scaling and Distributing Python & ML Applications.
ABOUT THE TALK:
Modern machine learning (ML) workloads, such as deep learning and large-scale model training, are compute-intensive and require distributed execution. Ray is an open-source, distributed framework from U.C. Berkeley’s RISELab that easily scales Python applications and ML workloads from a laptop to a cluster, with an emphasis on the unique performance challenges of ML/AI systems. It is now used in many production deployments.
This talk will cover Ray’s overview, architecture, core concepts, and primitives, such as remote Tasks and Actors; briefly discuss Ray native libraries (Ray Tune, Ray Train, Ray Serve, Ray Datasets, RLlib); and Ray’s growing ecosystem.
Through a demo using XGBoost for classification, we will demonstrate how you can scale training, hyperparameter tuning, and inference—from a single node to a cluster, with tangible performance difference when using Ray.
The takeaways from this talk are:
Learn Ray architecture, core concepts, and Ray primitives and patterns
Why Distributed computing will be the norm not an exception
How to scale your ML workloads with Ray libraries:
Training on a single node vs. Ray cluster, using XGBoost with/without Ray
Hyperparameter search and tuning, using XGBoost with Ray and Ray Tune
Inferencing at scale, using XGBoost with/without Ray
Link to presentation deck: https://drive.google.com/file/d/1hZrfsz8MRVMMAFAzwyNI4tDpm5_clS8g/view?usp=sharing
ABOUT OUR SPEAKER:
Our Speaker, Jules Damji is the Lead Developer Advocate at Anyscale Inc.
He is an MLflow contributor, and co-author of Learning Spark, 2nd Edition. He is a hands-on developer with over 25 years of experience and has worked at leading companies, such as Sun Microsystems, Netscape, @Home, Opsware/LoudCloud, VeriSign, ProQuest, Hortonworks, and Databricks, building large-scale distributed systems. He holds a B.Sc and M.Sc in computer science (from Oregon State University and Cal State, Chico respectively), and an MA in political advocacy and communication (from Johns Hopkins University).
Link to check out the Boulder/Denver Data + AI Meetup group: https://www.meetup.com/Boulder-Denver-Data-AI-Meetup/