Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - 693

The TWIML AI Podcast with Sam Charrington 8,767 9 months ago

Video Not Working? Fix It Now

Today, we're joined by Albert Gu, assistant professor at Carnegie Mellon University, to discuss his research on post-transformer architectures for multi-modal foundation models, with a focus on state-space models in general and Albert’s recent Mamba - https://arxiv.org/abs/2312.00752 and Mamba-2 - https://arxiv.org/abs/2405.21060 papers in particular. We dig into the efficiency of the attention mechanism and its limitations in handling high-resolution perceptual modalities, and the strengths and weaknesses of transformer architectures relative to alternatives for various tasks. We dig into the role of tokenization and patching in transformer pipelines, emphasizing how abstraction and semantic relationships between tokens underpin the model's effectiveness, and explore how this relates to the debate between handcrafted pipelines versus end-to-end architectures in machine learning. Additionally, we touch on the evolving landscape of hybrid models which incorporate elements of attention and state, the significance of state update mechanisms in model adaptability and learning efficiency, and the contribution and adoption of state-space models like Mamba and Mamba-2 in academia and industry. Lastly, Albert shares his vision for advancing foundation models across diverse modalities and applications. 🎧 / 🎥 Listen or watch the full episode on our page: https://twimlai.com/go/693. 🔔 Subscribe to our channel for more great content just like this: https://youtube.com/twimlai?sub_confirmation=1 🗣️ CONNECT WITH US! =============================== Subscribe to the TWIML AI Podcast: https://twimlai.com/podcast/twimlai/ Follow us on Twitter: https://twitter.com/twimlai Follow us on LinkedIn: https://www.linkedin.com/company/twimlai/ Join our Slack Community: https://twimlai.com/community/ Subscribe to our newsletter: https://twimlai.com/newsletter/ Want to get in touch? Send us a message: https://twimlai.com/contact/ 📖 CHAPTERS =============================== 00:00 - Introduction 05:36 - Post transformer approaches 07:46 - Attention 10:54 - Tokens 14:25 - Transformers 19:00 - Convolutions 22:04 - Recurrent models 24:36 - Mamba and state-space models 42:35 - Performance on multimodal data 46:24 - Handcrafted pipelines vs. end-to-end architectures 51:52 - Future directions 🔗 LINKS & RESOURCES =============================== Mamba: Linear-Time Sequence Modeling with Selective State Spaces - https://arxiv.org/abs/2312.00752 Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality - https://arxiv.org/abs/2405.21060 Efficiently Modeling Long Sequences with Structured State Spaces - https://arxiv.org/abs/2111.00396 Improving the Gating Mechanism of Recurrent Neural Networks - https://arxiv.org/abs/1910.09890 CKConv: Continuous Kernel Convolution For Sequential Data - https://arxiv.org/abs/2102.02611 On the Parameterization and Initialization of Diagonal State Space Models - https://arxiv.org/abs/2206.11893 Long Context Language Models and their Biological Applications with Eric Nguyen - 690 - https://twimlai.com/podcast/twimlai/long-context-language-models-and-their-biological-applications/ Language Modeling With State Space Models with Dan Fu - 630 - https://twimlai.com/podcast/twimlai/language-modeling-with-state-space-models/ 📸 Camera: https://amzn.to/3TQ3zsg 🎙️Microphone: https://amzn.to/3t5zXeV 🚦Lights: https://amzn.to/3TQlX49 🎛️ Audio Interface: https://amzn.to/3TVFAIq 🎚️ Stream Deck: https://amzn.to/3zzm7F5

Comment