SHViT (CVPR2024): Single-Head Vision Transformer with Memory Efficient Macro Design

Soroush Mehraban 1,113 12 months ago

Video Not Working? Fix It Now

In this video, we review the SHViT (Single-Head Vision Transformer) paper, which introduces a memory-efficient Vision Transformer with competitive performance. SHViT reduces computational redundancy with larger-stride patch embeddings and a single-head attention module. paper link: https://arxiv.org/abs/2401.16456 Table of Content: 00:00 Intro 00:54 FastViT 01:33 EfficientFormerV2 02:48 Macro Design Analysis 08:54 Micro Design 13:27 Single-Head Self-Attention 15:07 FasterNet 16:59 SHViT 19:20 Results Icon made by Freepik from flaticon.com

Comment