Human Action Recognition With Temporal Dense Sampling Deep Neural Networks

In computer vision, Human Action Recognition (HAR) has always been an important study for human-computer interaction. With more and more effective algorithms in representation learning, specifically Convolutional Neural Network (ConvNet)-based architecture in computer vision, the breakthrough for HAR...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Kok Seang
Format: Thesis
Published: 2019
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In computer vision, Human Action Recognition (HAR) has always been an important study for human-computer interaction. With more and more effective algorithms in representation learning, specifically Convolutional Neural Network (ConvNet)-based architecture in computer vision, the breakthrough for HAR has been increasing over the past decades. However, HAR remains challenging due to the complicated changes of visual appearance over the sequence of image frames, such as inconsistent gestures and human position. To present human action video effectively, a sampling strategy, i.e., Temporal Dense Sampling (TDS) is introduced by incorporating temporal pooling into temporal segmentation in order to achieve dense sampling on the time axis. In this thesis, a pretrained Deep ConvNet from a large-scale image recognition task, namely InceptionResNet-V2, is transferred to the proposed HAR framework. In this way, not only the training resources can be reduced, but also useful insight about the environment can be fed into the proposed frameworks. Subsequently, three representation learning models: (1) Long-short Term Memory (LSTM) Network, (2) Bi-Directional Long-short Term Memory (BiLSTM) Network, and (3) 1-Dimensional (1D) ConvNet that are capable of modeling these spatio-temporal dynamics are proposed with TDS to perform HAR. In this thesis, these frameworks are named as: (1) Temporal Dense Sampling-LSTM Network (TDS-LSTMNet), (2) Fine-Tuned Temporal Dense Sampling-BiLSTM Network (FTDS-BiLSTMNet), and (3) Fine-Tuned Temporal Dense Sampling-1D ConvNet (FTDS-1DConvNet).