Llama 4 From Scratch in PyTorch - Vision Language Models + MoE

Priyam Mazumdar 1,572 lượt xem 1 month ago

Video Not Working? Fix It Now

Code: https://github.com/priyammaz/PyTorch-Adventures/tree/main/PyTorch%20for%20NLP/Llama4

Today we do a full implementation of the Llama4 Model! This is basically a guided walkthrough of the Huggingface 🤗 implementation so all credit goes to the authors and the researchers from Meta of course! My goal here is to just take a look under the hood to see exactly whats going on!

Now... to inference this model, you need... ahem... an H100 GPU... w/ Int4 Quantization!! Excuse Me?? Well, I obviously cannot inference it so we will do small test cases probing the model and seeing how tensors flow through all of it! (Also the downloaded weights for Scout is like 200GB... I miss the 7B model days)

Sorry the video is a little rough (you get to see me debug annoying stuff), I just wanted to get this information out ASAP so its helpful for everyone starting to look at Llama4! I am learning this along with all of yall!

Also, I promise to make individual videos going in depth into a lot of these topics (MoE, KV Cache, Rotary Embeddings, etc...) I just wanted to skip ahead a little!

Resources incase you feel lost!
@umarjamilai has incredible resources to learn about Llama!
@AndrejKarpathy has an awesome video about reproducing GPT2!
@SebastianRaschka has an entire series on LLMs!

Happy Learning!

Timestamps
00:00:00 Introduction
00:05:00 Whats New?
00:08:00 MoE + Shared Expert
01:16:38 1D Rotary Embeddings
01:48:10 RMSNorm
01:52:40 KV Cache
02:00:45 Grouped Query Attention (GQA)
02:11:04 Scalable Softmax
02:25:50 Text Decoder Layer
02:35:50 Llama4TextModel
02:44:00 Causal Mask
03:01:50 Chunked Attention Mask
03:15:00 Llama4ForCausalLM
03:18:29 Vision MLP
03:22:48 MultiModal Projection
03:25:35 PixelShuffle
03:37:00 2D Rotary Embeddings
03:57:40 Vision Attention
04:03:20 Vision Encoder Layer
04:07:08 Vision Encoder
04:09:30 UnfoldConvolution for Patch Embeddings
04:14:24 Llama4VisionModel
04:24:00 VisionLanguageModel (Put it Together!)
04:46:05 Testing a Forward Pass!

Socials!
X https://twitter.com/data_adventurer
Instagram https://www.instagram.com/nixielights/
Linkedin https://www.linkedin.com/in/priyammaz/
🚀 Github: https://github.com/priyammaz
🌐 Website: https://www.priyammazumdar.com/

Comment