MENU

Fun & Interesting

Mamba 2 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through SSS Duality

Gabriel Mongaras 11,414 10 months ago
Video Not Working? Fix It Now

Paper here: https://arxiv.org/abs/2405.21060 Code!: https://github.com/state-spaces/mamba/blob/main/mamba_ssm/modules/mamba2.py Notes: https://drive.google.com/file/d/1--XGPFeXQyx4CPxgYjzR4qrLd-baLWQC/view?usp=sharing 00:00 Intro 01:45 SSMs 08:00 Quadratic form of an SSM 15:02 Expanded form of an SSM 24:00 Attention - it's all you need?? 29:55 Kernel attention 32:50 Linear attention 34:32 Relating attention to SSMs 38:35 Defining the M matrix 43:48 Splitting the M matrix 46:30 Off diagonal decomposition 54:00 Recurrent form of the off diagonal 1:03:30 Combining the M matrix blocks and code 1:06:22 Complexity and other analysis

Comment