DINO - DETR with Improved DeNoising AnchorBoxes for End-to-End Object Detection

Mak Gaiduk 3,676 lượt xem 1 year ago

Video Not Working? Fix It Now

This video talks about DINO - the first state-of-the-art, Detr-like, transformer based model.
This video is part of broader series: Modern Object Detection - from YOLO to Transformer https://www.youtube.com/playlist?list=PL1HdfW5-F8AQlPZCJBq2gNjERTDEAl8v3.
The model itself builds on top of the concepts introduced in Detr, Deformable Detr, DAB Detr and DN Detr, improving on them and remixing them to achieve superior quality under the same conditions (training time, parameter count, pretrain data size). One of the model variants also utilises huge backbone - Swin-L - and pretraining on Objects365 dataset to achieve SOTA accuracy on CoCo dataset.
Important links:
- Original paper: https://arxiv.org/pdf/2203.03605.pdf
- DINO source code: https://github.com/IDEA-Research/DINO
- My video about Detr, first model in the series: https://youtu.be/A2f4w54fSsM
- My video about Deformable Detr: https://youtu.be/9UG4amweIjk
- My video about DAB Detr: https://youtu.be/8aZIoEt0D7Y
- My video about DN Detr: https://youtu.be/9VKDZZcOfFk

00:00 - Intro
02:30 - Previous Detr models overview
20:54 - Contrastive Denoising Loss
24:32 - Mixed Query Selection
26:59 - Look Forward Twice
30:20 - Objects 365 Dataset
32:54 - Results
37:22 - Next Up

Comment