DeepSeek R1 Theory Overview | GRPO + RL + SFT

Deep Learning with Yacine 54,243 2 weeks ago

Video Not Working? Fix It Now

Here's an overview of the DeepSeek R1 paper. I read the paper this week and I was fascinated by the methods, however it was a bit difficult to follow what was going on with all the models being used. I found a neat map of the methodology which I'll be using in this tutorial to walk you through the paper. I strongly recommend you to still read the paper over here: 📌 PAPER: https://arxiv.org/pdf/2501.12948 and also to check out these other two video for the GRPO bit: 📌 https://www.youtube.com/watch?v=XMnxKGVnEUc&ab_channel=UmarJamil 📌 https://www.youtube.com/watch?v=bAWV_yrqx4w&ab_channel=YannicKilcher btw map I'm using is over here: https://www.reddit.com/r/LocalLLaMA/comments/1i66j4f/deepseekr1_training_pipeline_visualized/ Table of content - Introduction: 0:00 - DeepSeek R1-zero path: 2:23 - Reinforcement learning setup: 3:59 - Group Relative Policy Optimization (GRPO): 7:03 - DeepSeek R1-zero result: 11:40 - Cold start supervised fine-tuning: 15:30 - Consistency reward for CoT: 16:19 - Supervised Fine tuning data generation: 17:17 - Reinforcement learning with neural reward model: 19:47 - Distillation: 21:26 - Conclusion: 24:34 ---- Join the newsletter for weekly AI content: https://yacinemahdid.com Join the Discord for general discussion: https://discord.gg/QpkxRbQBpf ---- Follow Me Online Here: GitHub: https://github.com/yacineMahdid LinkedIn: https://www.linkedin.com/in/yacinemahdid/ ___ Have a great week! 👋

Comment