FANpose: 2D human pose estimation with fully attentional networks under vision transformer baselines

Mingliang Chen; Guangxing Tan

doi:10.1117/12.3034838

11 July 2024 FANpose: 2D human pose estimation with fully attentional networks under vision transformer baselines

Mingliang Chen, Guangxing Tan

Proceedings Volume 13210, Third International Symposium on Computer Applications and Information Systems (ISCAIS 2024); 132103B (2024) https://doi.org/10.1117/12.3034838
Event: Third International Symposium on Computer Applications and Information Systems (ISCAIS 2023), 2024, Wuhan, China

Abstract

2D human pose estimation (HPE) has been a research focus of computer vision, 2D HPE baseline is one of the main studies. As the field of HPE continues to evolve, Vision Transformer Baselines have emerged as a significant area of interest, showing considerable potential in visual applications. However, accurate estimation remains a challenge in 2D HPE. This study introduces a novel approach named FANpose for 2D HPE in images. Building upon the top-tier VITpose Baselines, we innovate in two main aspects. Firstly, we employ fully attentional net- works to replace the vision transformer baseline model, thereby enhancing the model’s robustness. Secondly, we improve keypoint localization accuracy by replacing traditional Gaussian kernels with Laplacian kernels, thereby enhancing the model’s recognition precision. On the MS COCO dataset, our model achieves AP and AR scores that are respectively 0.4 and 0.6 higher than VITpose-B, and our model is 32M smaller in terms of parameters than VITpose-B.FANpose achieves satisfactory results in human pose estimation tasks, showcasing its immense potential for practical applications.

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Mingliang Chen and Guangxing Tan "FANpose: 2D human pose estimation with fully attentional networks under vision transformer baselines", Proc. SPIE 13210, Third International Symposium on Computer Applications and Information Systems (ISCAIS 2024), 132103B (11 July 2024); https://doi.org/10.1117/12.3034838

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
6 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Transformers

Autoregressive models

Pose estimation

Visual process modeling

Data modeling

Feature extraction

RGB color model

Show All Keywords

Keywords/Phrases

Search In:

Publication Years