Contrastive Learning for Multi-Object Tracking with Transformers
- Resource Type
- Conference
- Authors
- Plaen, Pierre-Francois De; Marinello, Nicola; Proesmans, Marc; Tuytelaars, Tinne; Van Gool, Luc
- Source
- 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) WACV Applications of Computer Vision (WACV), 2024 IEEE/CVF Winter Conference on. :6853-6863 Jan, 2024
- Subject
- Computing and Processing
Training
Training data
Object detection
Self-supervised learning
Detectors
Transformers
Feature extraction
Algorithms
Video recognition and understanding
Image recognition and understanding
Applications
Autonomous Driving
- Language
- ISSN
- 2642-9381
The DEtection TRansformer (DETR) opened new possibilities for object detection by modeling it as a translation task: converting image features into object-level representations. Previous works typically add expensive modules to DETR to perform Multi-Object Tracking (MOT), resulting in more complicated architectures. We instead show how DETR can be turned into a MOT model by employing an instance-level contrastive loss, a revised sampling strategy and a lightweight assignment method. Our training scheme learns object appearances while preserving detection capabilities and with little overhead. Its performance surpasses the previous state-of-the-art by +2.6 mMOTA on the challenging BDD100K dataset and is comparable to existing transformer-based methods on the MOT17 dataset.