Object Detection with Transformers (DETR)

Mak Gaiduk 4,262 lượt xem 1 year ago

Video Not Working? Fix It Now

The content is also available as text: https://github.com/adensur/blog/blob/main/computer_vision_zero_to_hero/12_detr/Readme.md

This video is part of my "Modern Object Detection: from YOLO to transformers" series: https://www.youtube.com/playlist?list=PL1HdfW5-F8AQlPZCJBq2gNjERTDEAl8v3
It talks about DETR - first transformer-based object detector model that was aimed at simplifying the overall approach to object detection, making it single-stage and without hand-crafter components.
The video goes in detail through the following:
- Direct set prediction and bipartite matching loss
- What are transformers and attention
- How attention is used in DETR to form encoder-decoder
- Cool visualisations of attention masks

Useful links:
- Original paper: https://arxiv.org/pdf/2005.12872.pdf
- Cool post explaining positional encodings in detail: https://towardsdatascience.com/master-positional-encoding-part-i-63c05d90a0c3
- My video about YOLO algorithm: https://youtu.be/QHoAWDI8g_c
- My video about how ResNet model works: https://youtu.be/uztrVK1BhGw

00:00 - Intro
05:49 - Motivation behind DETR
08:50 - Direct Set Prediction
18:52 - Transformers and Attention
21:51 - Input to Transformers: Sequences
24:16 - Self Attention
37:14 - Cross Attention
44:02 - Positional Encoding
51:08 - Analysis
56:41 - Next Up

Comment