Lecture 12.3 Famous transformers (BERT, GPT-2, GPT-3)

DLVU 18,687 lượt xem 4 years ago

Video Not Working? Fix It Now

ERRATA:
In the "original transformer" (slide 51), in the source attention, the key and value come from the encoder, and the query comes from the decoder.

In this lecture we look at the details of some famous transformer models. How were they trained, and what could they do after they were trained.

annotated slides: https://dlvu.github.io/sa
Lecturer: Peter Bloem

Comment