In this video, I explain how the swin transformer architecture is modified to make it more scalable and achieve better performance.
Paper link: https://arxiv.org/abs/2111.09883
Table of Content:
00:00 Intro
00:34 Scaling issue
03:41 res-post-norm
04:47 Scaled cosine attention
06:50 Scaling up window resolution
12:02 Continuous relative position bias
16:19 GPU Memory Optimization
19:17 Model configuration
19:49 Image classification comparison
Icon made by Freepik from flaticon.com