Attention is All You Need: Ditching Recurrence for Good!

Priyam Mazumdar 2,437 lượt xem 1 month ago

Video Not Working? Fix It Now

Link to Code: https://github.com/priyammaz/PyTorch-Adventures/tree/main/PyTorch%20for%20Transformers/Attention%20Mechanisms/Attention
Github Repo: https://github.com/priyammaz/PyTorch-Adventures

Although our time with Recurrent Neural Networks was not long... Its time to leave them behind for Transformers! I only joke, there are plenty of use cases for RNNs today still, but most state-of-the-art models are leveraging some type of an Attention based system!

In this video we will explore Self-Attention, Causal Attention, Attention Masking, Cross Attention and Flash Attention. This serves to provide a foundation to understand all the different forms in which we typically encounter Attention. This is also the first video in a sequence where we will be implementing the Attention is All You Need Paper, specifically for Neural Language Translation. But due to the specific importance of the Attention mechanism, I wanted to give it a bit more time to explore and make a separate video of it all together!

Timestamps:
00:00:00 Introduction
00:03:19 What is Attention
00:25:00 Implementing Non-Learnable Attention
00:34:20 nn.Linear on Multidimensional Tensors
00:38:45 Simple Attention
00:44:50 Moving to Multi-Head Attention
00:50:20 (Inefficient) Multi-Head Attention
01:02:00 Packing Linear Layers
01:06:30 Multidimensional MatMul
01:09:20 (Efficient) Multi-Head Attention
01:26:30 Attention Masking (Pad Mask)
02:03:08 Causal Masking
02:05:55 Causal + Attention Masking
02:17:30 Cross Attention (with masking)
02:41:10 Flash Attention
02:44:40 Full Attention Implementation (w/ Flash Attention)
03:01:00 Wrap-up

Socials!
X https://twitter.com/data_adventurer
Instagram https://www.instagram.com/nixielights/
Linkedin https://www.linkedin.com/in/priyammaz/
🚀 Github: https://github.com/priyammaz
🌐 Website: https://www.priyammazumdar.com/

Comment