Reinforcement Learning with Human Feedback - How to train and fine-tune Transformer Models

Serrano.Academy 19,671 1 year ago

Video Not Working? Fix It Now

Reinforcement Learning with Human Feedback (RLHF) is a method used for training Large Language Models (LLMs). In the heart of RLHF lies a very powerful reinforcement learning method called Proximal Policy Optimization. Learn about it in this simple video! This is the first one in a series of 3 videos dedicated to the reinforcement learning methods used for training LLMs. Full Playlist: https://www.youtube.com/playlist?list=PLs8w1Cdi-zvYviYYw_V3qe6SINReGF5M- Video 0 (Optional): Introduction to deep reinforcement learning https://www.youtube.com/watch?v=SgC6AZss478 Video 1: Proximal Policy Optimization https://www.youtube.com/watch?v=TjHH_--7l8g Video 2 (This one): Reinforcement Learning with Human Feedback Video 3 (Coming soon!): Deterministic Policy Optimization 00:00 Introduction 00:48 Intro to Reinforcement Learning (RL) 02:47 Intro to Proximal Policy Optimization (PPO) 4:17 Intro to Large Language Models (LLMs) 6:50 Reinforcement Learning with Human Feedback (RLHF) 13:08 Interpretation of the Neural Networks 14:36 Conclusion Get the Grokking Machine Learning book! https://manning.com/books/grokking-machine-learning Discount code (40%): serranoyt (Use the discount code on checkout)

Comment