All the Code: https://github.com/priyammaz/PyTorch-Adventures/tree/main/PyTorch%20for%20Reinforcement%20Learning
Lets lay the baseline for some of the core ideas in Reinforcement Learning, and why its so different from anything else in ML! This will be the start of a sequence covering a lot of topics, and I'm really excited to share it with y'all! RL (mainly PPO) is a crucial part of LLM Alignment today, but instead of just jumping straight to PPO I wanted to build us up to it!
Timestamps:
00:00:00 Introduction
00:04:38 The Environment
00:13:40 What is a Policy?
00:15:12 What does RL Solve?
00:18:06 Discounted Rewards (short vs long term gains)
00:26:57 The Value Function
00:31:55 The Bellman Equation
00:48:53 Model-Based Learning
00:52:39 Model-Free Learning
00:57:47 The Plan for this Sequence
Socials!
X https://twitter.com/data_adventurer
Instagram https://www.instagram.com/nixielights/
Linkedin https://www.linkedin.com/in/priyammaz/
🚀 Github: https://github.com/priyammaz
🌐 Website: https://www.priyammazumdar.com/