MENU

Fun & Interesting

CMU Advanced NLP Spring 2025 (11): Reinforcement Learning

Sean Welleck 1,323 lượt xem 3 days ago
Video Not Working? Fix It Now

This lecture (by Sean Welleck) for CMU CS 11-711, Advanced NLP covers:
- RL basics
- Reward functions for NLP
- Policy gradient
- Stabilizing learning (e.g., KL penalty, PPO, baselines)
- Case studies (RLHF, RL for math)

Comment