SIAM 2014 InvitedTalk

Learning hierarchical invariant spatio-temporal
features for human action and activity

recognition
Binu M Nair, Vijayan K Asari

07/08/2014
Introduction
Applications of activity/action recognition
Gaming (Kinect)
Autonomous Visual Control of Fighter Jets by Air Crew hand gestures.
Research Objectives
To detect and recognize harmful activities of individuals of interest from a set/pair of surveillance cameras at long range.
Motivation: Monitoring a crowded environment and locating suspicious activities by security personnel
Security personnel creates a temporary signature of people in the scene (type of clothing, the shape etc..)
Identifies the action of the person (walking, running etc..)
Locates the individual with suspicious action and then observes him closely of what he is doing( from the joints movements etc)
Introduction
To have an automated system to perform these tasks, there are 4 different entities
Automatic Pedestrian Unique ID tagger
Security personnel fairly knowing what each person looked like
Human Action Recognition
Seeing what action each one does : walking, running, bending etc..
Automatic Detection and Tracking of Specific body joints.
Examining a particular individual(performing a suspicious action) closely of what he/she does
Inference of what activity is performed by joint trajectory analysis based on context
Eg: Bending down to place a suitcase or pick up a box or tying his shoe lace etc..
Motivation
Need a real time system

Recognize an action or an activity from 15-20 frames of a streaming video
Should not depend on the initialization of action/gait cycle states (starting/ending points of a
an action cycle)
Should be invariant to speed of motion
Applications
Air crew hand gesture recognition for autonomous visual control of fighter jet
Decision to follow a person based on activity in surveillance.
Typical Data-flow for Generic Action Recognition

system
Action Learning
Video
Feature
Extraction
Action
Segmentation
Action Model
Database
Action
Classification
Feature Extraction : - Posture/Motion Cues (Hierarchical invariant features)
Action Segmentation:- Segmenting out action instances consistent with the train set
Action Learning and Classification:- Learn statistical models to classify new feature
observations ( based on PCA-Generalized Regression Neural Networks)
Feature Extraction and Feature Fusion

Hierarchical Histogram of
Oriented Flow
HOF
(N)
Input Frame
Masked Region
Optical Flow
R-Transform
Hierarchical
Histogram of
Oriented Flow
+
Quantized
Local Binary
Pattern
Optical Flow
HOF
(N/2)
HOF
(N/2)
HOF
(N/2)
Feature Fusion
Mag/Dir
Feature Fusion
Action
Feature
Assumption that HHOF, LBFP and RT are independent of each other.

Can concatenate one after the other to form the complete feature vector ( Feature Fusion in
Biometric systems)
HOF
(N/2)
Feature Selection
Feature Set
3-Level HHOF ( 140 elements) , 2-Level LBFP ( 295 elements) , 2-level R-Transform
(180) : Total Feature Set
Over fitting of regression model for each action class and tuned more to irrelevant and
redundant feature elements and thus lower accuracy.
Methodology ( Fast Correlation-based Feature Selection) - FCBF

Identify relevant features with large correlation values
Remove redundant features and choose a subset of features.
Correlation measure based on Information Theory

Symmetrical Uncertainty (SU) between two random variables X and Y
H(X) Entropy ; IG(X|Y) information of X gained from the knowledge provided by
Algorithm(Training / Testing)
RESULTS
Weizmann dataset
10 different actions performed by 9 different persons

Low resolution video at 30 fps
Static background
Weizmann Dataset
Testing strategy:- Leave 10 out (corresponding to one person)
Partial Sequence :- 15 frames with overlap of 10 frames
Robustness Test (Test for Deformity)

With bag
Legs
Occluded
With dog
Normal
Walk
Knees Up
With
Briefcase
Limping
With Pole
Moonwalk
With Skirt
Test Seq
1st Best
2nd Best
Swinging a
bag
Walk 2.508 Skip
3.094 3.939
Carrying a
briefcase
Walk 1.866 Skip
2.170 3.641
Walking
with a dog
Walk 1.806 Skip
2.338 3.824
Knees Up
Walk 2.894 Side
3.270 4.091
Limping
Man
Walk 2.224 Skip
2.922 3.821
Sleepwalkin Walk 1.892 Skip

g
2.132 3.663
Occluded
Legs
Walk 1.883 Skip
2.594 2.624
Normal
Walk
Walk 1.886 Skip
2.624 3.633
Occluded by Walk 2.149 Skip

a pole
2.945 3.880
Walking in a Walk 1.855 Skip

skirt
2.159 3.540
Median to
all actions
Cambridge Hand gesture
9 different hand gestures
Different combinations of shape and motion

5 different illumination conditions
KTH Action Dataset
6 human actions
25 subjects
4 different scenarios
600 sequence divided into 2391 subsequences
Low res : 160 120 at 25 fps
11/10/2014
Binu M Nair
14
Results on 4 sets using proposed feature

set.
Results on all sets with STIP features
UCF Sports Dataset
High Res : 720 480

200 video sequences
Contains 9 actions
Challenge :
Complex and varying background

Wide range of scenes and view point variations
Tested on 8 actions : dive, golf swing, lift, ride, run, skate, swing and walk
Tested on window size of 15 frames with overlap of 10.
11/10/2014
Binu M Nair
17
Future work in action recognition
Testing on the UCF ARG

Dataset
Multi-view human action
dataset
Set of actions
Boxing, carrying, clapping,

digging, jogging, open-close
trunk,
running,
throwing, walking, waving
Challenges
Different resolutions
across cameras.
Different kinds of
features.
Thank You
Questions?

SIAM 2014 InvitedTalk

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

SIAM 2014 InvitedTalk

Transféré par

Droits d'auteur :

Formats disponibles

Learning hierarchical invariant spatio-temporal

features for human action and activity

Binu M Nair, Vijayan K Asari

Applications of activity/action recognition

Autonomous Visual Control of Fighter Jets by Air Crew hand gestures.

Identifies the action of the person (walking, running etc..)

Security personnel fairly knowing what each person looked like

Human Action Recognition

Automatic Detection and Tracking of Specific body joints.

Examining a particular individual(performing a suspicious action) closely of what he/she does

Inference of what activity is performed by joint trajectory analysis based on context

Need a real time system

Typical Data-flow for Generic Action Recognition

Feature Extraction : - Posture/Motion Cues (Hierarchical invariant features)

Feature Extraction and Feature Fusion

Assumption that HHOF, LBFP and RT are independent of each other.

Methodology ( Fast Correlation-based Feature Selection) - FCBF

Remove redundant features and choose a subset of features.

Correlation measure based on Information Theory

10 different actions performed by 9 different persons

Robustness Test (Test for Deformity)

Walk 2.508 Skip

Walk 1.866 Skip

Walk 1.806 Skip

Walk 2.894 Side

Walk 2.224 Skip

Sleepwalkin Walk 1.892 Skip

Walk 1.883 Skip

Walk 1.886 Skip

Occluded by Walk 2.149 Skip

Walking in a Walk 1.855 Skip

Cambridge Hand gesture

9 different hand gestures

Different combinations of shape and motion

KTH Action Dataset

Results on 4 sets using proposed feature

Results on all sets with STIP features

UCF Sports Dataset

High Res : 720 480

Complex and varying background

Future work in action recognition

Testing on the UCF ARG

Boxing, carrying, clapping,

Vous aimerez peut-être aussi