Experimenting with Reinforcement Learning with Verifiable Rewards (RLVR)

Nathan Lambert 4,647 1 month ago

Video Not Working? Fix It Now

Here's the latest talk I gave, last friday at the USC Information Sciences Institute. It's a slightly more technical version of the RL talks I've been giving, focusing on the different ways we (and the community is experimenting with RL for reasoning). It includes a bunch of discussion on GRPO expanding on my previous video. You can find the slides here: https://docs.google.com/presentation/d/13dBH2cYoJI4hCOHX5r5razq4HHdQVRIWOOIe08PTmPM/edit?usp=sharing 00:00 Introduction from RLHF to RLVR 07:51 Recap of post training 13:08 Reinforcement Learning with Verifiable Rewards Intro 20:22 RLVR experiments 41:27 Discussions 44:00 Conclusions Their "official" recording is here with more Q&A: https://www.youtube.com/watch?v=MTr2KM9lK1M

Comment