MAMBA from Scratch: Neural Nets Better and Faster than Transformers

Algorithmic Simplicity 252,306 1 year ago

Video Not Working? Fix It Now

Mamba is a new neural network architecture that came out this year, and it performs better than transformers at language modelling! This is probably the most exciting development in AI since 2017. In this video I explain how to derive Mamba from the perspective of linear RNNs. And don't worry, there's no state space model theory needed! Mamba paper: https://openreview.net/forum?id=AL1fq05o7H Linear RNN paper: https://openreview.net/forum?id=M3Yd3QyRG4 #mamba #deeplearning #largelanguagemodels 00:00 Intro 01:33 Recurrent Neural Networks 05:24 Linear Recurrent Neural Networks 06:57 Parallelizing Linear RNNs 15:33 Vanishing and Exploding Gradients 19:08 Stable initialization 21:53 State Space Models 24:33 Mamba 25:26 The High Performance Memory Trick 27:35 The Mamba Drama

Comment