How a Transformer works at inference vs training time

Niels Rogge 63,369 2 years ago

Video Not Working? Fix It Now

I made this video to illustrate the difference between how a Transformer is used at inference time (i.e. when generating text) vs. how a Transformer is trained. Disclaimer: this video assumes that you are familiar with the basics of deep learning, and that you've used HuggingFace Transformers at least once. If that's not the case, I highly recommend this course: http://cs231n.stanford.edu/ which will teach you the basics of deep learning. To learn HuggingFace, I recommend our free course: https://huggingface.co/course. The video goes in detail explaining the difference between input_ids, decoder_input_ids and labels: - the input_ids are the inputs to the encoder - the decoder_input_ids are the inputs to the decoder - the labels are the targets for the decoder. Resources: - Transformer paper: https://arxiv.org/abs/1706.03762 - Jay Allamar's The Illustrated Transformer blog post: https://jalammar.github.io/illustrated-transformer/ - HuggingFace Transformers: https://github.com/huggingface/transformers - Transformers-Tutorials, a repository containing several demos for Transformer-based models: https://github.com/NielsRogge/Transformers-Tutorials.

Comment