This video provides a comprehensive overview of DeepSeek, a family of advanced language models, and their implementation on AWS. DeepSeek offers various open weights models including the base MoE model (DeepSeek-V3), reinforcement learning models (DeepSeek-R1-Zero and DeepSeek-R1), and efficiently distilled variants based on Qwen and Llama architectures. The document details key architectural optimizations in DeepSeek-V3: 1/ Multi-Head Latent Attention (MLA): Uses low-rank joint compression for attention keys and values to reduce memory requirements during inference and training. 2/ DeepSeekMoE: Implements finer-grained experts with node-limited routing to ensure tokens are sent to a limited number of processing nodes, complementary sequence-wise auxiliary loss, and no token-dropping during training. 3/ Multi-Token Prediction (MTP): Extends prediction to multiple future tokens, improving data efficiency and representation capabilities. Infrastructure optimizations include: 1/ DualPipe: A bidirectional pipeline scheduling system with efficient overlapping strategy 2/ FP8 Training: Mixed precision framework that conducts compute-dense operations in FP8 while maintaining critical operations in higher precision 3/ Other techniques: Efficient cross-node communication, recomputation, shared embedding, and online quantization Follow Haowen! 💼 LinkedIn: https://www.linkedin.com/in/haowenhuang/ Chapters: 00:00 - Introduction 01:02 - Agenda 02:04 - What is DeepSeek? 03:03 - DeepSeek Open Weights Model Overview 04:48 - DeepSeek V3 Base Model 05:50 - DeepSeek R1 Zero Model 07:16 - DeepSeek-R1 Model 08:40 - Model Distillation Strategy 09:43 - Key Optimizations of DeepSeek-V3 11:08 - Architectural Optimizations Overview 12:00 - Multi-Head Latent Attention (MLA) 15:00 - DeepSeekMoE: Finer-grained Experts 16:30 - DeepSeekMoE: Node-Limited Routing 17:32 - Multi-Token Prediction (MTP) 18:36 - GRPO (Group Relative Policy Optimization) 20:32 - Infrastructure Optimizations Overview 20:45 - DualPipe 24:35 - FP8 Training 26:10 - Summary