ConvNet beats Vision Transformers (ConvNeXt) Paper explained

Soroush Mehraban 2,209 2 years ago

Video Not Working? Fix It Now

The paper presented at the 2022 Conference on Computer Vision and Pattern Recognition (CVPR) details a newly proposed architecture that adopts the design principles of Swin Transformers but replaces them with convolutions to achieve superior performance. In essence, the authors propose a Convolutional Neural Network (ConvNet) architecture that outperforms Swin Transformers while still following the underlying design principles. Paper link: https://arxiv.org/abs/2201.03545 Table of Content: 00:00 Introduction 01:09 Training Techniques 01:40 Data Augmentation 04:27 Label Smoothing 06:39 Changing stage compute ratio 08:11 Changing stem to "Patchify" 09:20 ResNeXt-ify 11:11 Inverted Bottleneck 12:57 Larger Kernel Sizes 15:19 Micro Design 19:39 Making it scalable 19:48 Result Icon made by Freepik from flaticon.com

Comment