Early stages of the reinforcement learning era of language models

Nathan Lambert 3,093 lượt xem 3 weeks ago

Video Not Working? Fix It Now

Hey friends! This is a recent talk I gave at the UC Santa Cruz Silicon Valley Extension to their Natural Language Processing (NLP) masters students, doctoral students, alumni, and friends.

In this talk I cover the recent trend of reinforcement finetuning of language models, how it came about, technically how it is done, early experiments using it at Ai2 and recent mainstream releases utilizing it (DeepSeek R1, Claude 3.7, Grok 3, etc.). I conclude with a future of extensive RL training rather than just finetuning.

You can find the slides here: https://docs.google.com/presentation/d/1MnY-_YMhEwUdEO6oNIJcdGaEBN3goqCNUu_NLn_eYW4/edit?usp=sharing
Or, the full recording with talks from Alessio of Latent Space and Dylan of SemiAnalysis here: https://www.youtube.com/watch?v=HVM0vO4TjQs

Very related to a recent talk I gave on my primary Interconnects channel: https://www.youtube.com/watch?v=YXTYbr3hiFU

Thanks Sam & Jeff for hosting me! The next talk I post will include some more hot off the press RL research than this one :D

Comment