MENU

Fun & Interesting

Vision Transformers (ViT) Explained + Fine-tuning in Python

James Briggs 69,142 2 years ago
Video Not Working? Fix It Now

Vision and language are the two big domains in machine learning. Two distinct disciplines with their own problems, best practices, and model architectures. At least, that was the case. The Vision Transformer (ViT) marks the first step towards the merger of these two fields into a single unified discipline. For the first time in the history of ML, a single model architecture has come to dominate both language and vision. Before ViT, transformers were "those language models" and nothing more. Since then, ViT and further work has solidified them as a likely contender for the architecture that merges the two disciplines. This video will dive into ViT, explaining and visualizing the intuition behind how and why it works. We will see how to implement it using the Hugging Face transformers library in Python. Then use it for image classification. 🌲 Pinecone article: https://www.pinecone.io/learn/vision-transformers Code: https://github.com/pinecone-io/examples/blob/master/learn/search/image/image-retrieval-ebook/vision-transformers/vit.ipynb 🌟 Build Better Agents + RAG: https://platform.aurelio.ai (use "JBMARCH2025" coupon code for $20 free credits) 👾 Discord: https://discord.gg/c5QtDB9RAP 00:00 Intro 00:58 In this video 01:12 What are transformers and attention? 01:39 Attention explained simply 04:15 Attention used in CNNs 05:24 Transformers and attention 07:01 What vision transformer (ViT) does differently 07:28 Images to patch embeddings 08:22 1. Building image patches 10:23 2. Linear projection 10:57 3. Learnable class embedding 13:30 4. Adding positional embeddings 16:37 ViT implementation in python with Hugging Face 16:45 Packages, dataset, and Colab GPU 18:42 Initialize Hugging Face ViT Feature Extractor 22:48 Hugging Face Trainer setup 25:14 Training and CUDA device error 26:27 Evaluation and classification predictions with ViT 28:54 Final thoughts #machinelearning #deeplearning #ai #python

Comment