Group Relative Policy Optimization (GRPO) - Formula and Code

Deep Learning with Yacine 14,482 3 months ago

Video Not Working? Fix It Now

The GRPO algorithm is at the heart of the newest DeepSeek R1 architecture. In this tutorial, we will discuss the details of the formula along with a code implementation walkthrough from the HuggingFace post-training team! # Table of Content - Introduction: 0:00 - PPO vs GRPO: 1:18 - PPO formula overview: 4:24 - GRPO formula overview: 7:49 - GRPO pseudo code: 11:11 - GRPO Trainer code: 13:21 - Conclusion: 23:48 GRPO In HuggingFace: 📌 https://huggingface.co/docs/trl/main/en/grpo_trainer GRPO Trainer on Github: 📌 https://github.com/huggingface/trl/blob/main/trl/trainer/grpo_trainer.py#L568 DeepSeek Math paper: 📌 https://arxiv.org/pdf/2402.03300 Another cool walkthrough of GRPO: 📌 https://www.youtube.com/watch?v=bAWV_yrqx4w&ab_channel=YannicKilcher Awesome PPO tutorial: 📌 https://www.youtube.com/watch?v=TjHH_--7l8g&ab_channel=Serrano.Academy Enjoy! 🌹 ---- Join the newsletter for weekly AI content: https://yacinemahdid.com Join the Discord for general discussion: https://discord.gg/QpkxRbQBpf ---- Follow Me Online Here: GitHub: https://github.com/yacineMahdid LinkedIn: https://www.linkedin.com/in/yacinemahdid/ ___ Have a great week! 👋

Comment