A Visual Guide to Mixture of Experts (MoE) in LLMs

Maarten Grootendorst 19,669 lượt xem 4 months ago

Video Not Working? Fix It Now

In this highly visual guide, we explore the architecture of a Mixture of Experts in Large Language Models (LLM) and Vision Language Models.

Timeline
0:00 Introduction
0:34 A Simplified Perspective
2:14 The Architecture of Experts
3:05 The Router
4:08 Dense vs. Sparse Layers
4:33 Going through a MoE Layer
5:35 Load Balancing
6:05 KeepTopK
7:27 Token Choice and Top-K Routing
7:48 Auxiliary Loss
9:23 Expert Capacity
10:40 Counting Parameters with Mixtral 7x8B
13:42 MoE in Vision Language Models
13:57 Vision Transformer
14:45 Vision-MoE
15:50 Soft-MoE
19:11 Bonus Content!

🛠️ Written version of this visual guide
https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mixture-of-experts

Support to my newsletter for more visual guides:
✉️ Newsletter https://newsletter.maartengrootendorst.com

I wrote a book!
📚 Hands-On Large Language Models
https://llm-book.com/

#datascience #machinelearning #ai

Comment