LLM Training & Reinforcement Learning from Google Engineer | SFT + RLHF | PPO vs GRPO vs DPO

Martin Is A Dad 1,915 1 month ago

Video Not Working? Fix It Now

As a regular normal swe, I want to share the most typical LLM training process nowadays (Pre-Training + SFT + RLHF), along with the basics of Reinforcement Learning and some of the most popular policy optimization algorithms. This video is also a prequel of DeepSeek-R1 deep dive: https://youtu.be/FT5cRAqPY4c Related content: What is AI Alignment: https://youtu.be/S6PhQ-9m2MU Advanced Prompting: https://youtu.be/1D6PSo1OZm4 #llm #openai #google #ai #reinforcementlearning #machinelearning 0:00 Intro 0:25 Modern LLM Training Flow 1:00 Pre-Training 1:47 Post-Training 4:46 SFT 6:09 Reinforcement Learning 10:31 Policy Gradient 12:08 PPO 15:30 GRPO 16:36 DPO 20:01 Post-Training Example Flow

Comment