How does GRPO work?

Trelis Research 5,226 lượt xem 2 months ago

Video Not Working? Fix It Now

🔔 If you subscribe, click the bell to be notified of new vids 🔔

🛠 Build & Deploy Faster
Fine-tuning, Inference, Audio, Evals, and Vision Tools: https://trelis.com

💡 Need Technical or Market Assistance?
Book a Consult Here: https://forms.gle/wJXVZXwioKMktjyVA

🤝 Are You a Top Developer?
Work for Trelis: https://trelis.com/jobs/

💸 Starting a New Project/Venture?
Apply for a Trelis Grant: https://trelis.com/trelis-ai-grants/

📧 Get Trelis AI Tutorials by Email
Subscribe on Substack: https://trelis.substack.com

📸 Thumbnail Tutorial
See How It’s Made: https://youtu.be/ThKYjTdkyP8

TIMESTAMPS:
00:00 Introduction to Reinforcement Learning
00:30 Understanding Supervised Fine Tuning
01:30 Exploring ORPO: Odds Ratio Preference Optimization
06:57 Diving into GRPO: Group Relative Policy Optimization
08:31 Challenges and Rewards in GRPO
14:12 History and Evolution of Policy Optimization
19:30 Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO)
22:26 Simplifying PPO with GRPO
29:34 Final Thoughts on GRPO and Reinforcement Learning

Comment