Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

Umar Jamil 508,045 lượt xem 1 year ago

Video Not Working? Fix It Now

A complete explanation of all the layers of a Transformer Model: Multi-Head Self-Attention, Positional Encoding, including all the matrix multiplications and a complete description of the training and inference process.

Paper: Attention is all you need - https://arxiv.org/abs/1706.03762

Slides PDF: https://github.com/hkproj/transformer-from-scratch-notes

Chapters
00:00 - Intro
01:10 - RNN and their problems
08:04 - Transformer Model
09:02 - Maths background and notations
12:20 - Encoder (overview)
12:31 - Input Embeddings
15:04 - Positional Encoding
20:08 - Single Head Self-Attention
28:30 - Multi-Head Attention
35:39 - Query, Key, Value
37:55 - Layer Normalization
40:13 - Decoder (overview)
42:24 - Masked Multi-Head Attention
44:59 - Training
52:09 - Inference

transformer

deep learning

pytorch

ai

ml

machine learning

attention is all you need

Comment