RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs

DeepLearning Hero 37,040 lượt xem 1 year ago

Video Not Working? Fix It Now

Unlike sinusoidal embeddings, RoPE are well behaved and more resilient to predictions exceeding the training sequence length. Modern LLMs have already steered away from sinusoidal embeddings for better alternatives like RoPE. Stay with me in the video and learn about what's wrong with sinusoidal embeddings, the intuition or RoPE and how RoPE works.

Original Transformer paper: https://arxiv.org/pdf/1706.03762.pdf
RoPE paper: https://arxiv.org/pdf/2104.09864.pdf
Using interpolation for RoPE: https://arxiv.org/pdf/2306.15595.pdf

0:00 - Introduction
1:06 - Attention computation
1:51 - Token and positional similarity
2:52 - Vector view of query and key
4:52 - Sinusoidal embeddings
5:53 - Problem with sinusiodal embeddings
6:34 - Conversational view
8:50 - Rope embeddings
10:20 - Rope beyond 2D
12:36 - Changes to the equations
13:00 - Conclusion

deep learning

llms

llm

chatgpt

transformer models

machine learning

Comment