In this video I teach how to code a Transformer model from scratch using PyTorch. I highly recommend watching my previous video to understand the underlying concepts, but I will also rehearse them in this video again while coding. All of the code is mine, except for the attention visualization function to plot the chart, which I have found online at the Harvard university's website. Paper: Attention is all you need - https://arxiv.org/abs/1706.03762 The full code is available on GitHub: https://github.com/hkproj/pytorch-transformer It also includes a Colab Notebook so you can train the model directly on Colab. Chapters 00:00:00 - Introduction 00:01:20 - Input Embeddings 00:04:56 - Positional Encodings 00:13:30 - Layer Normalization 00:18:12 - Feed Forward 00:21:43 - Multi-Head Attention 00:42:41 - Residual Connection 00:44:50 - Encoder 00:51:52 - Decoder 00:59:20 - Linear Layer 01:01:25 - Transformer 01:17:00 - Task overview 01:18:42 - Tokenizer 01:31:35 - Dataset 01:55:25 - Training loop 02:20:05 - Validation loop 02:41:30 - Attention visualization