Imaging Components, Systems, and Processing

Volumetric spatial feature representation for view-invariant human action recognition using a depth camera

[+] Author Affiliations
Seong-Sik Cho, A-Reum Lee

Korea University, Department of Computer Science and Engineering, Seongbuk-gu, Seoul 136-713, Republic of Korea

Heung-Il Suk

Korea University, Department of Brain and Cognitive Engineering, Seongbuk-gu, Seoul 136-713, Republic of Korea

Jeong-Seon Park

Chonnam National University, Department of Multimedia, Yeosu, Jeollanam-do 550-749, Republic of Korea

Seong-Whan Lee

Korea University, Department of Computer Science and Engineering, Seongbuk-gu, Seoul 136-713, Republic of Korea

Korea University, Department of Brain and Cognitive Engineering, Seongbuk-gu, Seoul 136-713, Republic of Korea

Opt. Eng. 54(3), 033102 (Mar 03, 2015). doi:10.1117/1.OE.54.3.033102
History: Received November 3, 2014; Accepted January 30, 2015
Text Size: A A A

Abstract.  The problem of viewpoint variations is a challenging issue in vision-based human action recognition. With the richer information provided by three-dimensional (3-D) point clouds thanks to the advent of 3-D depth cameras, we can effectively analyze spatial variations in human actions. In this paper, we propose a volumetric spatial feature representation (VSFR) that measures the density of 3-D point clouds for view-invariant human action recognition from depth sequence images. Using VSFR, we construct a self-similarity matrix (SSM) that can graphically represent temporal variations in the depth sequence. To obtain an SSM, we compute the squared Euclidean distance of VSFRs between a pair of frames in a video sequence. In this manner, an SSM represents the dissimilarity between a pair of frames in terms of spatial information in a video sequence captured at an arbitrary viewpoint. Furthermore, due to the use of a bag-of-features method for feature representations, the proposed method efficiently handles the variations of action speed or length. Hence, our method is robust to both variations in viewpoints and lengths of action sequences. We evaluated the proposed method by comparing with state-of-the-art methods in the literature on three public datasets of ACT42, MSRAction3D, and MSRDailyActivity3D, validating the superiority of our method by achieving the highest accuracies.

Figures in this Article
© 2015 Society of Photo-Optical Instrumentation Engineers

Citation

Seong-Sik Cho ; A-Reum Lee ; Heung-Il Suk ; Jeong-Seon Park and Seong-Whan Lee
"Volumetric spatial feature representation for view-invariant human action recognition using a depth camera", Opt. Eng. 54(3), 033102 (Mar 03, 2015). ; http://dx.doi.org/10.1117/1.OE.54.3.033102


Access This Article
Sign in or Create a personal account to Buy this article ($20 for members, $25 for non-members).

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging & repositioning the boxes below.

Related Book Chapters

Topic Collections

Advertisement
  • Don't have an account?
  • Subscribe to the SPIE Digital Library
  • Create a FREE account to sign up for Digital Library content alerts and gain access to institutional subscriptions remotely.
Access This Article
Sign in or Create a personal account to Buy this article ($20 for members, $25 for non-members).
Access This Proceeding
Sign in or Create a personal account to Buy this article ($15 for members, $18 for non-members).
Access This Chapter

Access to SPIE eBooks is limited to subscribing institutions and is not available as part of a personal subscription. Print or electronic versions of individual SPIE books may be purchased via SPIE.org.