The paper presented at the 2022 Conference on Computer Vision and Pattern Recognition (CVPR) details a newly proposed architecture that adopts the design principles of Swin Transformers but replaces them with convolutions to achieve superior performance. In essence, the authors propose a Convolutional Neural Network (ConvNet) architecture that outperforms Swin Transformers while still following the underlying design principles.
Paper link: https://arxiv.org/abs/2201.03545
Table of Content:
00:00 Introduction
01:09 Training Techniques
01:40 Data Augmentation
04:27 Label Smoothing
06:39 Changing stage compute ratio
08:11 Changing stem to "Patchify"
09:20 ResNeXt-ify
11:11 Inverted Bottleneck
12:57 Larger Kernel Sizes
15:19 Micro Design
19:39 Making it scalable
19:48 Result
Icon made by Freepik from flaticon.com