MENU

Fun & Interesting

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

Umar Jamil 461,376 2 years ago
Video Not Working? Fix It Now

A complete explanation of all the layers of a Transformer Model: Multi-Head Self-Attention, Positional Encoding, including all the matrix multiplications and a complete description of the training and inference process. Paper: Attention is all you need - https://arxiv.org/abs/1706.03762 Slides PDF: https://github.com/hkproj/transformer-from-scratch-notes Chapters 00:00 - Intro 01:10 - RNN and their problems 08:04 - Transformer Model 09:02 - Maths background and notations 12:20 - Encoder (overview) 12:31 - Input Embeddings 15:04 - Positional Encoding 20:08 - Single Head Self-Attention 28:30 - Multi-Head Attention 35:39 - Query, Key, Value 37:55 - Layer Normalization 40:13 - Decoder (overview) 42:24 - Masked Multi-Head Attention 44:59 - Training 52:09 - Inference

Comment