학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

A 3D CNN-LSTM-Based Image-to-Image Foreground Segmentation

Resource Type: Periodical
Authors: Akilan, T.; Wu, Q.J.; Safaei, A.; Huo, J.; Yang, Y.
Source: IEEE Transactions on Intelligent Transportation Systems IEEE Trans. Intell. Transport. Syst. Intelligent Transportation Systems, IEEE Transactions on. 21(3):959-971 Mar, 2020
Subject: Transportation
Aerospace
Communication, Networking and Broadcast Technologies
Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Three-dimensional displays
Solid modeling
Image segmentation
Visualization
Decoding
Encoding
Computational modeling
Deep learning
foreground-background segmentation
intelligent systems
LSTM
spatiotemporal cues
Language
ISSN: 1524-9050
1558-0016

Online Access

초록

The video-based separation of foreground (FG) and background (BG) has been widely studied due to its vital role in many applications, including intelligent transportation and video surveillance. Most of the existing algorithms are based on traditional computer vision techniques that perform pixel-level processing assuming that FG and BG possess distinct visual characteristics. Recently, state-of-the-art solutions exploit deep learning models targeted originally for image classification. Major drawbacks of such a strategy are the lacking delineation of FG regions due to missing temporal information as they segment the FG based on a single frame object detection strategy. To grapple with this issue, we excogitate a 3D convolutional neural network (3D CNN) with long short-term memory (LSTM) pipelines that harness seminal ideas, viz., fully convolutional networking, 3D transpose convolution, and residual feature flows. Thence, an FG-BG segmenter is implemented in an encoder-decoder fashion and trained on representative FG-BG segments. The model devises a strategy called double encoding and slow decoding, which fuses the learned spatio-temporal cues with appropriate feature maps both in the down-sampling and up-sampling paths for achieving well generalized FG object representation. Finally, from the Sigmoid confidence map generated by the 3D CNN-LSTM model, the FG is identified automatically by using Nobuyuki Otsu’s method and an empirical global threshold. The analysis of experimental results via standard quantitative metrics on 16 benchmark datasets including both indoor and outdoor scenes validates that the proposed 3D CNN-LSTM achieves competitive performance in terms of figure of merit evaluated against prior and state-of-the-art methods. Besides, a failure analysis is conducted on 20 video sequences from the DAVIS 2016 dataset.

공지

DAU Library

학술논문

요약정보

A 3D CNN-LSTM-Based Image-to-Image Foreground Segmentation

Online Access

초록