The matrix math behind transformer neural networks, one step at a time!!!

StatQuest with Josh Starmer 65,543 11 months ago

Video Not Working? Fix It Now

Transformers, the neural network architecture behind ChatGPT, do a lot of math. However, this math can be done quickly using matrix math because GPUs are optimized for it. Matrix math is also used when we code neural networks, so learning how ChatGPT does it will help you code your own. Thus, in this video, we go through the math one step at a time and explain what each step does so that you can use it on your own with confidence. NOTE: This StatQuest assumes that you are already familiar with: Transformers: https://youtu.be/zxQyTK8quyY The essential matrix algebra for neural networks: https://youtu.be/bQ5BoolX9Ag If you'd like to support StatQuest, please consider... Patreon: https://www.patreon.com/statquest ...or... YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join ...buying my book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store... https://statquest.org/statquest-store/ ...or just donating to StatQuest! paypal: https://www.paypal.me/statquest venmo: @JoshStarmer Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter: https://twitter.com/joshuastarmer 0:00 Awesome song and introduction 1:43 Word Embedding 3:37 Position Encoding 4:28 Self Attention 12:09 Residual Connections 13:08 Decoder Word Embedding and Position Encoding 15:33 Masked Self Attention 20:18 Encoder-Decoder Attention 21:31 Fully Connected Layer 22:16 SoftMax #StatQuest #Transformer #ChatGPT

Comment