Paper
11 July 2024 FANpose: 2D human pose estimation with fully attentional networks under vision transformer baselines
Mingliang Chen, Guangxing Tan
Author Affiliations +
Abstract
2D human pose estimation (HPE) has been a research focus of computer vision, 2D HPE baseline is one of the main studies. As the field of HPE continues to evolve, Vision Transformer Baselines have emerged as a significant area of interest, showing considerable potential in visual applications. However, accurate estimation remains a challenge in 2D HPE. This study introduces a novel approach named FANpose for 2D HPE in images. Building upon the top-tier VITpose Baselines, we innovate in two main aspects. Firstly, we employ fully attentional net- works to replace the vision transformer baseline model, thereby enhancing the model’s robustness. Secondly, we improve keypoint localization accuracy by replacing traditional Gaussian kernels with Laplacian kernels, thereby enhancing the model’s recognition precision. On the MS COCO dataset, our model achieves AP and AR scores that are respectively 0.4 and 0.6 higher than VITpose-B, and our model is 32M smaller in terms of parameters than VITpose-B.FANpose achieves satisfactory results in human pose estimation tasks, showcasing its immense potential for practical applications.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Mingliang Chen and Guangxing Tan "FANpose: 2D human pose estimation with fully attentional networks under vision transformer baselines", Proc. SPIE 13210, Third International Symposium on Computer Applications and Information Systems (ISCAIS 2024), 132103B (11 July 2024); https://doi.org/10.1117/12.3034838
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Transformers

Autoregressive models

Pose estimation

Visual process modeling

Data modeling

Feature extraction

RGB color model

Back to Top