How language model post-training is done today

Interconnects AI 5,372 4 weeks ago

Video Not Working? Fix It Now

I’m far more optimistic about the state of open recipes for and knowledge of post-training starting 2025 than I was starting 2024. Last year one of my first posts was how open post-training won’t match like likes of GPT-4. This is still the case, but now we at least understand the scope of things we will be working with better. It’s a good time to record an overview of what post-training looks like today. I gave a version of this talk for the first time in 2023, which felt like a review of the InstructGPT paper not based on reproduced literature knowledge. In 2024, the scientific community made substantial progress in actually training these models and expanding the frontier of knowledge. Doing one of these talks every year feels like a good way to keep tabs on the state of play (whereas last year, I just had a bunch of links to add to the conversation on where to start). 00:00 Introduction 10:00 Prompts & Skill Selection 14:19 Instruction Finetuning 21:45 Preference Finetuning 36:17 Reinforcement Finetuning 45:28 Open Questions 52:02 Wrap Up Slides: https://docs.google.com/presentation/d/1FL6pzRT3tjCfJ985emS_2YfujCe_iz6dsyRcDIUFPqs/edit#slide=id.g31d874a0784_2_0 More context: https://www.interconnects.ai/p/the-state-of-post-training-2025 Get Interconnects (https://www.interconnects.ai/)... ... on YouTube: https://www.youtube.com/@interconnects ... on Twitter: https://x.com/interconnectsai ... on Linkedin: https://www.linkedin.com/company/interconnects-ai ... on Spotify: https://open.spotify.com/show/2UE6s7wZC4kiXYOnWRuxGv … on Apple Podcasts: https://podcasts.apple.com/us/podcast/interconnects/id1719552353

Comment