Paper here: https://arxiv.org/abs/2410.06205
Notes: https://drive.google.com/file/d/152NPPyNjo-N6MMIaupXacS41BUJgjE5l/view?usp=drive_link
00:00 Intro
01:09 RoPE: Rotary Positional Embeddings
10:37 Notes on RoPE
12:04 Does RoPE decay with distance?
14:14 How are different frequencies used?
17:02 High frequencies: positional attention
21:29 Low frequencies: semantic attention
28:00 p-RoPE
30:36 Thoughts on this paper