🔔 If you subscribe, click the bell to be notified of new vids 🔔
🛠 Build & Deploy Faster
Fine-tuning, Inference, Audio, Evals, and Vision Tools: https://trelis.com
💡 Need Technical or Market Assistance?
Book a Consult Here: https://forms.gle/wJXVZXwioKMktjyVA
🤝 Are You a Top Developer?
Work for Trelis: https://trelis.com/jobs/
💸 Starting a New Project/Venture?
Apply for a Trelis Grant: https://trelis.com/trelis-ai-grants/
📧 Get Trelis AI Tutorials by Email
Subscribe on Substack: https://trelis.substack.com
📸 Thumbnail Tutorial
See How It’s Made: https://youtu.be/ThKYjTdkyP8
TIMESTAMPS:
00:00 Introduction to Reinforcement Learning
00:30 Understanding Supervised Fine Tuning
01:30 Exploring ORPO: Odds Ratio Preference Optimization
06:57 Diving into GRPO: Group Relative Policy Optimization
08:31 Challenges and Rewards in GRPO
14:12 History and Evolution of Policy Optimization
19:30 Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO)
22:26 Simplifying PPO with GRPO
29:34 Final Thoughts on GRPO and Reinforcement Learning