Mamba 2 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through SSS Duality
Paper here: https://arxiv.org/abs/2405.21060
Code!: https://github.com/state-spaces/mamba/blob/main/mamba_ssm/modules/mamba2.py
Notes: https://drive.google.com/file/d/1--XGPFeXQyx4CPxgYjzR4qrLd-baLWQC/view?usp=sharing
00:00 Intro
01:45 SSMs
08:00 Quadratic form of an SSM
15:02 Expanded form of an SSM
24:00 Attention - it's all you need??
29:55 Kernel attention
32:50 Linear attention
34:32 Relating attention to SSMs
38:35 Defining the M matrix
43:48 Splitting the M matrix
46:30 Off diagonal decomposition
54:00 Recurrent form of the off diagonal
1:03:30 Combining the M matrix blocks and code
1:06:22 Complexity and other analysis