Lets Reproduce the Vision Transformer on ImageNet

Priyam Mazumdar 604 lượt xem 1 week ago

Video Not Working? Fix It Now

Code: https://github.com/priyammaz/PyTorch-Adventures/tree/main/PyTorch%20for%20Computer%20Vision/Vision%20Transformer

Today we will be doing a full reproduction script of the Vision Transformer on ImageNet! I hope you already know Attention, if you don't please check out this video here https://youtu.be/JXY5CmiK3LI?feature=shared

The trick to training a ViT from scratch has less to do with the model and more to do with Data Augmentation. After we build the model we will build a distributed training pipeline with MixCut, CutMix and RandAug. The model trained for 300 epochs reached roughly 80% Top1 Accuracy which is close to the results reported by PyTorch!

Timestamps:
00:00:00 Introduction
00:06:15 Images to Patch Embeddings
00:17:20 Local vs Global Image Comprehension
00:21:00 Self-Attention
00:36:00 MultiLayer Perceptron
00:41:10 Encoder Block
00:45:20 CLS Token vs Pooling
00:50:10 Vision Transformer Implementation
01:09:40 The Importance of Augmentation!
01:12:00 Implement Training Augmentation Pipeline
01:24:15 Mixup and CutMix
01:35:28 Testing Augmentation
01:40:25 Calculate TopK Accuracy
01:49:48 Distributed Training Script
2:08:15 Testing the Training Script
2:10:28 Results

Socials!
X https://twitter.com/data_adventurer
Instagram https://www.instagram.com/nixielights/
Linkedin https://www.linkedin.com/in/priyammaz/
🚀 Github: https://github.com/priyammaz
🌐 Website: https://www.priyammazumdar.com/

Comment