Is it possible to train a deep reinforcement learning agent to navigate its environment without the use of rewards? It turns out that with the Intrinsic Curiosity Module (ICM) it's actually feasible. ICM is a bolt on module for deep reinforcement learning agents that uses self supervised predictions of the environment dynamics to generate an intrinsic reward.
The less the agent knows about the environment, the stronger this reward signal is, which generates an incentive to explore. Environments with very sparse or totally absent extrinsic rewards are now within reach of reinforcement learning algorithms.
In this PyTorch tutorial we're going to code up both an asynchronous advantage actor critic (A3C) agent as well as the intrinsic curiosity module. We're going to put both to the test in the cartpole environment and show that our ICM agent can get scores as high as 115 points without using any extrinsic rewards. For comparison, the A3C agent languishes around 22 points on average.
The code for this video is here:
https://github.com/philtabor/Youtube-Code-Repository/tree/master/ReinforcementLearning/ICM
If you want to support my work, please check out my courses below.
Learn how to turn deep reinforcement learning papers into code:
Get instant access to all my courses, including the new Prioritized Experience Replay course, with my subscription service. $29 a month gives you instant access to 42 hours of instructional content plus access to future updates, added monthly.
Discounts available for Udemy students (enrolled longer than 30 days). Just send an email to [email protected]
https://www.neuralnet.ai/courses
Or, pickup my Udemy courses here:
Deep Q Learning:
https://www.udemy.com/course/deep-q-learning-from-paper-to-code/?couponCode=DQN-JUNE-22
Actor Critic Methods:
https://www.udemy.com/course/actor-critic-methods-from-paper-to-code-with-pytorch/?couponCode=AC-JUNE-22
Curiosity Driven Deep Reinforcement Learning
https://www.udemy.com/course/curiosity-driven-deep-reinforcement-learning/?couponCode=ICM-JUNE-22
Natural Language Processing from First Principles:
https://www.udemy.com/course/natural-language-processing-from-first-principles/?couponCode=NLP-JUNE-22
Here are some books / courses I recommend (affiliate links):
Grokking Deep Learning in Motion: https://bit.ly/3fXHy8W
Grokking Deep Learning: https://bit.ly/3yJ14gT
Grokking Deep Reinforcement Learning: https://bit.ly/2VNAXql
Come hang out on Discord here:
https://discord.gg/Zr4VCdv
Need personalized tutoring? Help on a programming project? Shoot me an email! [email protected]
Website: https://www.neuralnet.ai
Github: https://github.com/philtabor
Twitter: https://twitter.com/MLWithPhil
ICM Paper:
https://arxiv.org/abs/1705.05423v1
0:00 Intro and Paper
8:43 Code Overview
9:24 Coding A3C
23:13 Coding ICM
31:58 Coding Batch Memory
34:15 Coding SharedAdam
42:54 Coding ParallelEnv
41:52 Coding Worker
54:01 Coding PlotLearning
55:20 Coding Main
57:23 Moment of Truth
58:00 Validating A3C
58:27 Turning off Rewards
59:58 Validating ICM