I made this video to illustrate the difference between how a Transformer is used at inference time (i.e. when generating text) vs. how a Transformer is trained.
Disclaimer: this video assumes that you are familiar with the basics of deep learning, and that you've used HuggingFace Transformers at least once. If that's not the case, I highly recommend this course: http://cs231n.stanford.edu/ which will teach you the basics of deep learning. To learn HuggingFace, I recommend our free course: https://huggingface.co/course.
The video goes in detail explaining the difference between input_ids, decoder_input_ids and labels:
- the input_ids are the inputs to the encoder
- the decoder_input_ids are the inputs to the decoder
- the labels are the targets for the decoder.
Resources:
- Transformer paper: https://arxiv.org/abs/1706.03762
- Jay Allamar's The Illustrated Transformer blog post: https://jalammar.github.io/illustrated-transformer/
- HuggingFace Transformers: https://github.com/huggingface/transformers
- Transformers-Tutorials, a repository containing several demos for Transformer-based models: https://github.com/NielsRogge/Transformers-Tutorials.