Joint object and action recognition via fusion of partially observable surveillance imagery data

Amir Shirkhodaie; Alex L. Chan

doi:10.1117/12.2266224

2 May 2017 Joint object and action recognition via fusion of partially observable surveillance imagery data

Amir Shirkhodaie, Alex L. Chan

Proceedings Volume 10200, Signal Processing, Sensor/Information Fusion, and Target Recognition XXVI; 1020014 (2017) https://doi.org/10.1117/12.2266224
Event: SPIE Defense + Security, 2017, Anaheim, CA, United States

Abstract

Partially observable group activities (POGA) occurring in confined spaces are epitomized by their limited observability of the objects and actions involved. In many POGA scenarios, different objects are being used by human operators for the conduct of various operations. In this paper, we describe the ontology of such as POGA in the context of In-Vehicle Group Activity (IVGA) recognition. Initially, we describe the virtue of ontology modeling in the context of IVGA and show how such an ontology and a priori knowledge about the classes of in-vehicle activities can be fused for inference of human actions that consequentially leads to understanding of human activity inside the confined space of a vehicle. In this paper, we treat the problem of “action-object” as a duality problem. We postulate a correlation between observed human actions and the object that is being utilized within those actions, and conversely, if an object being handled is recognized, we may be able to expect a number of actions that are likely to be performed on that object. In this study, we use partially observable human postural sequences to recognition actions. Inspired by convolutional neural networks (CNNs) learning capability, we present an architecture design using a new CNN model to learn “action-object” perception from surveillance videos. In this study, we apply a sequential Deep Hidden Markov Model (DHMM) as a post-processor to CNN to decode realized observations into recognized actions and activities. To generate the needed imagery data set for the training and testing of these new methods, we use the IRIS virtual simulation software to generate high-fidelity and dynamic animated scenarios that depict in-vehicle group activities under different operational contexts. The results of our comparative investigation are discussed and presented in detail.

Conference Presentation

Citation Download Citation

Amir Shirkhodaie and Alex L. Chan "Joint object and action recognition via fusion of partially observable surveillance imagery data", Proc. SPIE 10200, Signal Processing, Sensor/Information Fusion, and Target Recognition XXVI, 1020014 (2 May 2017); https://doi.org/10.1117/12.2266224

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available