The GRPO algorithm is at the heart of the newest DeepSeek R1 architecture. In this tutorial, we will discuss the details of the formula along with a code implementation walkthrough from the HuggingFace post-training team!
# Table of Content
- Introduction: 0:00
- PPO vs GRPO: 1:18
- PPO formula overview: 4:24
- GRPO formula overview: 7:49
- GRPO pseudo code: 11:11
- GRPO Trainer code: 13:21
- Conclusion: 23:48
GRPO In HuggingFace:
📌 https://huggingface.co/docs/trl/main/en/grpo_trainer
GRPO Trainer on Github:
📌 https://github.com/huggingface/trl/blob/main/trl/trainer/grpo_trainer.py#L568
DeepSeek Math paper:
📌 https://arxiv.org/pdf/2402.03300
Another cool walkthrough of GRPO:
📌 https://www.youtube.com/watch?v=bAWV_yrqx4w&ab_channel=YannicKilcher
Awesome PPO tutorial:
📌 https://www.youtube.com/watch?v=TjHH_--7l8g&ab_channel=Serrano.Academy
Enjoy! 🌹
----
Join the newsletter for weekly AI content: https://yacinemahdid.com
Join the Discord for general discussion: https://discord.gg/QpkxRbQBpf
----
Follow Me Online Here:
GitHub: https://github.com/yacineMahdid
LinkedIn: https://www.linkedin.com/in/yacinemahdid/
___
Have a great week! 👋