학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Bridging the Gap: A Fusion of CNN and Transformer Models for Real-Time Object Detection

Resource Type: Conference
Authors: Pan, Yuanke; Zhou, Chengmin; Su, Liyilei; Hassan, Haseeb; Huang, Bingding
Source: 2023 IEEE 11th Joint International Information Technology and Artificial Intelligence Conference (ITAIC) Information Technology and Artificial Intelligence Conference (ITAIC), 2023 IEEE 11th Joint International. 11:1916-1921 Dec, 2023
Subject: Computing and Processing
Engineering Profession
Robotics and Control Systems
Training
YOLO
Schedules
Computational modeling
Transformers
Convolutional neural networks
Task analysis
End-to-end Object Detection
CNN (Convolutional Neural Network)
Real-Time Object Detection
Transformer Models
Inference Speed
Language
ISSN: 2693-2865

Online Access

Full Text (IEEE)

초록

Object detection is the main task in computer vision. Recently, object detection tasks have been performed through convolutional neural networks and the YOLO family, and gained substantial attention from the research community. Likewise, transformer-based models were introduced to improve the efficiency and accuracy of many detection models. However, realtime object detection still suffers from slow speed. To address this, a novel approach is proposed by fusing CNN with transformerbased detection. This fusion process positively impacts the accuracy and inference speed. In our experiments, we achieved a notable 51.8 mAP, which represents a 0.9 % improvement. Note that it is performed by 36 million parameters, which is 5 million parameters fewer than transformer-based models. A 144 GFLOPS computation rate was measured for 800×1333 pixels (an excellent improvement over 640×640 pixels at 74 GFLOPS. All this is achieved through 40 epochs training schedule. In summary, the proposed model reduced millions of parameters and the computational cost of the model by restructuring the original detection-based architecture. Moreover, it significantly enhances inference speed compared to prior transformer-based models.

공지

DAU Library

학술논문

요약정보

Bridging the Gap: A Fusion of CNN and Transformer Models for Real-Time Object Detection

Online Access

초록