How to find the Mamba in your Llama (and make it fast).
Work led by Junxiong Wang and Daniele Paliotta with Avner May and Tri Dao advising.
Arxiv Paper: https://arxiv.org/abs/2408.15237
Code: https://github.com/jxiw/MambaInLlama
Tutorial (on Mamba): https://www.youtube.com/watch?v=dVH1dRoMPBc&t=2s
Also check out
- "Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models": https://arxiv.org/abs/2408.10189
- Rene: https://cartesia.ai/blog/2024-08-27-on-device