Lecture 05: Deployment (FSDL 2022)

The Full Stack 5,931 lượt xem 2 years ago

Video Not Working? Fix It Now

New course announcement ✨

We're teaching an in-person LLM bootcamp in the SF Bay Area on November 14, 2023. Come join us if you want to see the most up-to-date materials building LLM-powered products and learn in a hands-on environment.

https://www.scale.bythebay.io/llm-workshop

Hope to see some of you there!

--------------------------------------------------------------------------------------------- In this video, we cover the process for turning a promising ML model into a useful ML-powered product.

00:00 Overview
01:59 First, deploy a prototype with gradio or streamlit
04:32 Model-in-server architecture
07:44 Model-in-database architecture
11:42 Model-as-a-service architecture
14:03 REST APIs for model services
16:05 Dependency management for model services
18:26 Containerization for model services with Docker
24:40 Performance optimization: to GPU or not to GPU?
26:56 Optimization for CPUs: distillation, quantization, and caching
29:58 Optimization for GPUs: Batching and GPU sharing
32:35 Libraries for model serving on GPUs
33:18 Horizontal scaling
34:18 Horizontal scaling with container orchestration (k8s)
35:38 Horizontal scaling with serverless services
38:53 Rollouts: shadows and canaries
40:46 Managed options for model serving (AWS Sagemaker)
43:28 Takeaways on model services
44:33 Moving to edge
48:00 Frameworks for edge deployment
50:45 Making efficient models for the edge
52:05 Mindsets and takeaways for edge deployment
56:15 Takeways for deploying ML models

Detailed notes and slides: https://fullstackdeeplearning.com/course/2022/lecture-5-deployment

Subscribe to our channel and sign up at https://fullstackdeeplearning.com/course/2022/ to follow along with the 2022 course!

deep learning

machine learning

mlops

ai

Comment