Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained

AI Coffee Break with Letitia 5,985 lượt xem 1 year ago

Video Not Working? Fix It Now

Contextual sparsity: Take an LLM and make it sparse at inference time. In this video, we explain how the DEJAVU method implements contextual sparsity.
➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/

📜 Liu, Zichang, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava et al. "Deja vu: Contextual sparsity for efficient llms at inference time." In International Conference on Machine Learning, pp. 22137-22176. PMLR, 2023. https://arxiv.org/abs/2310.17157

Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Dres. Trost GbR, Siltax, Vignesh Valliappan, @Mutual_Information , Michael

Outline:
00:00 DEJAVU explained
02:58 Sparse neural networks
04:06 Why static sparsity hurts
04:43 Contextual sparsity
05:40 DEJAVU method
07:59 Speedups!
08:52 MoE: Connection to Mixture of Experts
09:38 Theoretical insights: Why can we make MLPs sparsity
10:36 Why can we make attention sparse?
11:38 Attention does Mean-shift clustering!

▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
Join this channel to get access to perks:
https://www.youtube.com/channel/UCobqgqE4i5Kf7wrxRxhToQA/join
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

🔗 Links:
AICoffeeBreakQuiz: https://www.youtube.com/c/AICoffeeBreak/community
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
YouTube: https://www.youtube.com/AICoffeeBreak

#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research

Video editing: Nils Trost
Music 🎵 : Sunday Rain - Cheel

Comment