Can a Reinforcement Learning Agent Learn with NO Rewards? Intrinsic Curiosity Coding Tutorial

Machine Learning with Phil 7,932 4 years ago

Video Not Working? Fix It Now

Is it possible to train a deep reinforcement learning agent to navigate its environment without the use of rewards? It turns out that with the Intrinsic Curiosity Module (ICM) it's actually feasible. ICM is a bolt on module for deep reinforcement learning agents that uses self supervised predictions of the environment dynamics to generate an intrinsic reward. The less the agent knows about the environment, the stronger this reward signal is, which generates an incentive to explore. Environments with very sparse or totally absent extrinsic rewards are now within reach of reinforcement learning algorithms. In this PyTorch tutorial we're going to code up both an asynchronous advantage actor critic (A3C) agent as well as the intrinsic curiosity module. We're going to put both to the test in the cartpole environment and show that our ICM agent can get scores as high as 115 points without using any extrinsic rewards. For comparison, the A3C agent languishes around 22 points on average. The code for this video is here: https://github.com/philtabor/Youtube-Code-Repository/tree/master/ReinforcementLearning/ICM If you want to support my work, please check out my courses below. Learn how to turn deep reinforcement learning papers into code: Get instant access to all my courses, including the new Prioritized Experience Replay course, with my subscription service. $29 a month gives you instant access to 42 hours of instructional content plus access to future updates, added monthly. Discounts available for Udemy students (enrolled longer than 30 days). Just send an email to [email protected] https://www.neuralnet.ai/courses Or, pickup my Udemy courses here: Deep Q Learning: https://www.udemy.com/course/deep-q-learning-from-paper-to-code/?couponCode=DQN-JUNE-22 Actor Critic Methods: https://www.udemy.com/course/actor-critic-methods-from-paper-to-code-with-pytorch/?couponCode=AC-JUNE-22 Curiosity Driven Deep Reinforcement Learning https://www.udemy.com/course/curiosity-driven-deep-reinforcement-learning/?couponCode=ICM-JUNE-22 Natural Language Processing from First Principles: https://www.udemy.com/course/natural-language-processing-from-first-principles/?couponCode=NLP-JUNE-22 Here are some books / courses I recommend (affiliate links): Grokking Deep Learning in Motion: https://bit.ly/3fXHy8W Grokking Deep Learning: https://bit.ly/3yJ14gT Grokking Deep Reinforcement Learning: https://bit.ly/2VNAXql Come hang out on Discord here: https://discord.gg/Zr4VCdv Need personalized tutoring? Help on a programming project? Shoot me an email! [email protected] Website: https://www.neuralnet.ai Github: https://github.com/philtabor Twitter: https://twitter.com/MLWithPhil ICM Paper: https://arxiv.org/abs/1705.05423v1 0:00 Intro and Paper 8:43 Code Overview 9:24 Coding A3C 23:13 Coding ICM 31:58 Coding Batch Memory 34:15 Coding SharedAdam 42:54 Coding ParallelEnv 41:52 Coding Worker 54:01 Coding PlotLearning 55:20 Coding Main 57:23 Moment of Truth 58:00 Validating A3C 58:27 Turning off Rewards 59:58 Validating ICM

Comment