Volumetric spatial feature representation for view-invariant human action recognition using a depth camera

Seong-Sik Cho; A-Reum Lee; Heung-Il Suk; Jeong-Seon Park; Seong-Whan Lee

doi:10.1117/1.OE.54.3.033102

3 March 2015 Volumetric spatial feature representation for view-invariant human action recognition using a depth camera

Seong-Sik Cho, A-Reum Lee, Heung-Il Suk, Jeong-Seon Park, Seong-Whan Lee

Author Affiliations +

Optical Engineering, Vol. 54, Issue 3, 033102 (March 2015). https://doi.org/10.1117/1.OE.54.3.033102

Abstract

The problem of viewpoint variations is a challenging issue in vision-based human action recognition. With the richer information provided by three-dimensional (3-D) point clouds thanks to the advent of 3-D depth cameras, we can effectively analyze spatial variations in human actions. In this paper, we propose a volumetric spatial feature representation (VSFR) that measures the density of 3-D point clouds for view-invariant human action recognition from depth sequence images. Using VSFR, we construct a self-similarity matrix (SSM) that can graphically represent temporal variations in the depth sequence. To obtain an SSM, we compute the squared Euclidean distance of VSFRs between a pair of frames in a video sequence. In this manner, an SSM represents the dissimilarity between a pair of frames in terms of spatial information in a video sequence captured at an arbitrary viewpoint. Furthermore, due to the use of a bag-of-features method for feature representations, the proposed method efficiently handles the variations of action speed or length. Hence, our method is robust to both variations in viewpoints and lengths of action sequences. We evaluated the proposed method by comparing with state-of-the-art methods in the literature on three public datasets of ACT4², MSRAction3D, and MSRDailyActivity3D, validating the superiority of our method by achieving the highest accuracies.

Citation Download Citation

Seong-Sik Cho, A-Reum Lee, Heung-Il Suk, Jeong-Seon Park, and Seong-Whan Lee "Volumetric spatial feature representation for view-invariant human action recognition using a depth camera," Optical Engineering 54(3), 033102 (3 March 2015). https://doi.org/10.1117/1.OE.54.3.033102

Published: 3 March 2015

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available