학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation

Resource Type: Conference
Authors: Liu, Huan; Chen, Qiang; Tan, Zichang; Liu, Jiang-Jiang; Wang, Jian; Su, Xiangbo; Li, Xiaolong; Yao, Kun; Han, Junyu; Ding, Errui; Zhao, Yao; Wang, Jingdong
Source: 2023 IEEE/CVF International Conference on Computer Vision (ICCV) ICCV Computer Vision (ICCV), 2023 IEEE/CVF International Conference on. :14983-14992 Oct, 2023
Subject: Computing and Processing
Signal Processing and Analysis
Computer vision
Codes
Pose estimation
Transformers
Decoding
Optimization
Language
ISSN: 2380-7504

Online Access

Full Text (IEEE)

초록

In this paper, we study the problem of end-to-end multi-person pose estimation. State-of-the-art solutions adopt the DETR-like framework, and mainly develop the complex decoder, e.g., regarding pose estimation as keypoint box detection and combining with human detection in ED-Pose [38], hierarchically predicting with pose decoder and joint (keypoint) decoder in PETR [27].We present a simple yet effective transformer approach, named Group Pose. We simply regard K-keypoint pose estimation as predicting a set of N × K keypoint positions, each from a keypoint query, as well as representing each pose with an instance query for scoring N pose predictions.Motivated by the intuition that the interaction, among across-instance queries of different types, is not directly helpful, we make a simple modification to decoder self-attention. We replace single self-attention over all the N × (K + 1) queries with two subsequent group self-attentions: (i) N within-instance self-attention, with each over K keypoint queries and one instance query, and (ii) (K +1) same-type across-instance self-attention, each over N queries of the same type. The resulting decoder removes the interaction among across-instance type-different queries, easing the optimization and thus improving the performance. Experimental results on MS COCO and Crowd-Pose show that our approach without human box supervision is superior to previous methods with complex decoders, and even is slightly better than ED-Pose that uses human box supervision. Paddle 1 and PyTorch 2 codes are available.

공지

DAU Library

학술논문

요약정보

Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation

Online Access

초록