REINFORCE: Reinforcement Learning Most Fundamental Algorithm

Andriy Drozdyuk 13,179 lượt xem 3 years ago

Video Not Working? Fix It Now

If you would like to see more videos like this please consider supporting me on Patreon -https://www.patreon.com/andriydrozdyuk

Reinforcement Learning: An Introduction, 2nd Ed, Sutton & Barto
For REINFORCE algorithm see Section "13.3 REINFORCE: Monte Carlo Policy Gradient":
http://incompleteideas.net/book/the-book-2nd.html

Complete code used in the video can be found here:
https://github.com/drozzy/reinforce

0:00 - Introduction
0:15 - Intro to RL
0:38 - Problem with Environment
1:02 - Why is this a problem for RL?
1:41 - Puppy treats (low level of abstraction)
2:14 - Good actions (middle level of abstraction)
3:22 - Reward as a signal (high level of abstraction)
4:04 - REINFORCE Algorithm Overview
5:11 - Collected Trajectory
6:01 - Product of G and Policy Gradient
6:34 - Two key concepts: sample and evaluate
6:48 - Sampling an action
7:22 - Sampling in REINFORCE
7:38 - Evaluating an action
8:24 - Sampling vs. Evaluating
8:41 - Sampling using torch.distributions.Categorical
9:12 - Evaluating using torch.distributions.Categorical
9:50 - Env/NN/Optim
10:07 - Collect One Episode of Experience
10:53 - Compute Discounted Returns
11:44 - Update the Policy
12:41 - Executing Trained Policy
13:04 - Demo Cart Pole Balancing

Comment