eArticles

Home

eArticles

검색결과 돌아가기

검색화면

Export 프린트

End-to-end acceleration of the YOLO object detection framework on FPGA-only devices.

Resource Type: Article
Authors: Zhang, Dezheng; Wang, Aibin; Mo, Ruchan; Wang, Dong
Source: Neural Computing & Applications. Jan2024, Vol. 36 Issue 3, p1067-1089. 23p.
Subject: *OBJECT recognition (Computer vision)
*CONVOLUTIONAL neural networks
*CACHE memory
*COMPUTATIONAL complexity
Language
ISSN: 0941-0643

Online Access

초록

Object detection has been revolutionized by convolutional neural networks (CNNs), but their high computational complexity and heavy data access requirements make implementing these algorithms on edge devices challenging. To address this issue, we propose an efficient object detection accelerator for YOLO series algorithm. Our architecture utilizes multiple dimensions of parallelism to accelerate the convolution computation. We employ line-buffer-based parallel data caches and dedicated data access units to minimize off-chip bandwidth pressure. Additionally, our proposed design not only accelerates the convolutional computation, but also control-intensive post-processing to achieve low detection latency. We evaluate the final design on Xilinx V7-690t FPGA device, achieving a throughput of 525 GOP/s for a batch size of 1 and 914 GOP/s for a batch size equal to 2. Compared with state-of-the-art YOLOv2 and YOLOv3 implementations, our proposed accelerator offers up to 9 × throughput improvement and 5 × shorter latency. [ABSTRACT FROM AUTHOR]

공지

DAU Library

eArticles

요약정보

End-to-end acceleration of the YOLO object detection framework on FPGA-only devices.

Online Access

초록