This lecture (by Sean Welleck) for CMU CS 11-711, Advanced NLP covers:- RL basics- Reward functions for NLP- Policy gradient- Stabilizing learning (e.g., KL penalty, PPO, baselines)- Case studies (RLHF, RL for math)