학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Swin on Axes: Extending Swin Transformers to Quadtree Image Representations

Resource Type: Conference
Authors: Oliu, Marc; Nasrollahi, Kamal; Escalera, Sergio; Moeslund, Thomas B.
Source: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) WACVW Applications of Computer Vision Workshops (WACVW), 2024 IEEE/CVF Winter Conference on. :193-201 Jan, 2024
Subject: Bioengineering
Computing and Processing
Engineering Profession
Computer vision
Costs
Conferences
Computational modeling
Graphics processing units
Machine learning
Parallel processing
Language
ISSN: 2690-621X

Online Access

Full Text (IEEE)

초록

In recent years, Transformer models have revolutionized machine learning. While this has resulted in impressive re-sults in the field of Natural Language Processing, Computer Vision quickly stumbled upon computation and memory problems due to the high resolution and dimensionality of the input data. This is particularly true for video, where the number of tokens increases cubically relative to the frame and temporal resolutions. A first approach to solve this was Vision Transformers, which introduce a partitioning of the input into embedded grid cells, lowering the effective reso-lution. More recently, Swin Transformers introduced a hi-erarchical scheme that brought the concepts of pooling and locality to transformers in exchange for much lower computational and memory costs. This work proposes a refor-mulation of the latter that views Swin Transformers as reg-ular Transformers applied over a quadtree representation of the input, intrinsically providing a wider range of de-sign choices for the attentional mechanism. Compared to similar approaches such as Swin and MaxViT, our method works on the full range of scales while using a single attentional mechanism, allowing us to simultaneously take into account both dense short range and sparse long range de-pendencies with low computational overhead and without introducing additional sequential operations, thus making full use of GPU parallelism.

공지

DAU Library

학술논문

요약정보

Swin on Axes: Extending Swin Transformers to Quadtree Image Representations

Online Access

초록