Research Scientist Hado van Hasselt covers policy algorithms that can learn policies directly and actor critic algorithms that combine value predictions for more efficient learning. Slides: https://dpmd.ai/policygradient Full video lecture series: https://dpmd.ai/DeepMindxUCL21