CMU Advanced NLP Spring 2025 (11): Reinforcement Learning

Sean Welleck 2,342 lượt xem 2 months ago

Video Not Working? Fix It Now

This lecture (by Sean Welleck) for CMU CS 11-711, Advanced NLP covers:
- RL basics
- Reward functions for NLP
- Policy gradient
- Stabilizing learning (e.g., KL penalty, PPO, baselines)
- Case studies (RLHF, RL for math)

Comment