Ινστιτούτο Τεχνολογιών Πληροφορικής και Επικοινωνιών

el en

Δημοσιεύσεις Ι.Π.ΤΗΛ.

Local descriptions for human action recognition from 3D reconstruction data

In this paper, a view-invariant approach to human action recognition using 3D reconstruction data is proposed. Initially, a set of calibrated Kinect sensors are employed for producing a 3D reconstruction of the performing subjects. Subsequently, a 3D flow field is estimated for every captured frame. For performing action recognition, the ‘Bag-of-Words’ methodology is followed, where SpatioTemporal Interest Points (STIPs) are detected in the 4D space (xyz-coordinates plus time). A novel local-level 3D flow descriptor is introduced, which among others incorporates spatial and surface information in the flow representation and efficiently handles the problem of defining 3D orientation at every STIP location. Additionally, typical 3D shape descriptors of the literature are used for producing a more complete representation. Experimental results as well as comparative evaluation using datasets from the Huawei/3DLife 3D human reconstruction and action recognition Grand Challenge demonstrate the efficiency of the proposed approach.

Συνημμένα

Ινστιτούτο Τεχνολογιών Πληροφορικής και Επικοινωνιών

Τ.Θ.60361, 6ο χλμ Χαριλάου - Θέρμης, 57001, Θεσσαλονίκη
τηλ. 2311 257701-3 / fax. 2310 474128
email: