Vous êtes sur la page 1sur 4

An Application of ANNs to Human Action Recognition

Based on Limbs Data


MAIMAITIMIN MAIERDAN, Keigo Watanabe, and Shoichi Maeyama
Department of Intelligent Mechanical System,
Okayama University, 3-1-1 Tsushima-naka,
Kita-ku, Okayama 700-8530, Japan
Email: merdan-m@usmm.sys.okayma-u.ac.jp,
{watanabe, maeyama}@sys.okayma-u.ac.jp
Abstract: An application of Neural Network in human action recognition is presented in this paper. This paper is conclusion of
previous work which is the part of a human behavior estimation system. This system is divided into two parts: human action
recognition and object recognition. First we will do brief introduction on this estimation system, next we will discussion about
the human action recognition which is the important part of the system. In this section we use Microsoft Kinect to capture human
joint data. And calculate the limb angles. And we will discussion about the Neural Network which we designed most fit for this
model. The ANN is separated in two big stage. First stage is using to exclude the noise data from capturing. The second stage of
ANN is using to exclude the limb angles which is non-related to current action.
Keywords: Neural Network, Human Action Recognition, Estimation.

1 INTRODUCTION
Recently, Human-robot communication become very
important .One of effective ways for an intelligent system or
robot to communicate with human is understanding and
estimating human behaviors. The intelligent system or robot
should be able to recognize human actions and predict a next

using 2D camera, which aimed at constructing a system that


can understand the human behaviors using the time series
images for the movement of the human, and estimate the
human behaviors by examining the action and the target.
These studies indicated that human behavior recognition
must be established in human action and the environment
associated with human action.
Also there are some researches using 3D sensor and
recognizing human actions [6, 7, 8, 9, 10]. They proposed a
comparison among human gesture recognition approaches
using several data mining classification methods.
Particularly, these studies focused on recognition of human
actions by using 3D senor, or identified human body.2002 or
later.
According to them we can easily find out that the human
behavior estimation includes two main research areas as
shown in Fig. 1. : i.e., Human Action Recognition and Object
Recognition. So, if we need that an intelligent system or
robot should estimate a human behavior, then the human
action recognition part must be linked to the environment,
specifically, linked to the object recognition part.
In this paper, we propose a method which based on
previous work for recognizing human actions,

2 PREVIOUS WORK
Fig. 1. Outline of estimation system
step of human behaviors associated with environment.
Some researchers have been already reported on the
estimation of human behaviors [1, 2, 3, 4, 5]. For example,
there were some approaches in [3, 4, 5], as methods with

In this section, we do discussion on our previous work of


the method on human behavior estimation.
2.1 Human action recognition
We used six joints: neck, right shoulder, left shoulder, left
elbow, left hand and hip center, , as shown in Fig. 2. By

making vectors between two joints, we can obtain the


following vectors. And calculate the different angles on the
different planes of projection of these intersecting vectors,
we can obtain these angles. Also we can find out, from the
calculated angles and the objectivity of human hand, that
each angle must be limited in range as shown in Table 1.

, and the vector related to the object


action and the vectorDE
in the assumed environmental space. The system concept is
shown in Fig. 3..
Combining the recognized human action ``Pointing" and
the detected object 1 , human behavior will be estimated
as EB=[OBA+ CS +Object], as shown in Table 1. Note here
that the object status was assumed to be changed according
to the previous status and behavior attribute.

3 IMPROVED DATA CLASSIFICATION AND


ANN
In this section we will discuss the more fixed data
processing method and the improved application of neural
network, which make the system more accuracy.

Fig. 1. Vector definition


2.1.1 Neural Network
Our network includes 3 layers: an input layer with
6 neurons, a hidden layer with 6 neurons and an outp
ut layer with 4 neurons, as shown in Fig.2..

Input

3.1 Data Classification


If we want to make the estimation system more robust,
should make it recognizing more actions, so the sample
structure is not enough for that. First step is separating the
captured data in two basic group one is action and the other
one is status. Because in our real life the action basically
classified in this two parts. Which we will decided by the
movement of hip joint. If the hip joint is changing in large
range at each frame, the put this captured data in the neural
network which is used to recognize the action, otherwise put
them in the other neural network which is used to recognize
the status. However, we captured these data, as shown in
Table 2, to train the two ANNs.
3.2 Neural Network for Action

Input
Layer

T1

Hidden
Layer

35

T2 T34

Output
Layer

Output

Fig. 2. Structure of Neural network in previous


work

2.2 Estimation System


As mentioned before, the present human behavior
estimation is represented by using two components: one is
human action for achieving the behavior and the other is
target object that is the object related to the behavior. In
this case the gesture Pointing corresponds to the human

10

T35

Input

Input
Layer

Hidden
Layer

Output

Fig. 4. Structure of first stage neural network

3.2.1 First Stage Neural Network


Table 2.

Action and Status


Action
Walk
Run

Forward jump

2 objects,
100 rounds of each action

Sit down
There is our new structure of ANN, as shown in Fig. 4. is the
first stage neural network. Each angle we calculated as , ,
, , , , math to one of this first stage neural network.
And send 35 frames data of each angle in each action to train
the neural network. According to our experiment, the action
walking cost the most much time in all of actions. Its
about 35 frames, so the other actions which doesnt take so
long time will have additional values to make them 35
frames. The samples of the input data for action walking
running are shown in Table 3.
Action

which actions after considered each angles in first stage


neural network. The result is shown in Fig. 7.

Input data

Walking

-15,-13, -10, -8, -5, -3, 0, 1, 2, 4, 7,


6, 10, 11, 16, 18, 20, 24, 26, 28, 31,
27, 24, 21, 18, 10, 7, 4, 1, -3, -4, -9,
-11, -14
-75,
-70, -55, -45, 0, 28, 40, 45, 45,
Running
42, 25, 10, 0, -30, -45, -75, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0,
Table 3. The example of one sequence input data
As we mentioned we have 4 actions in action group, so
we should have 4 different outputs to separate them. The first
stage output sequences are defined as Table 4., second stage
output sequences are defined as Table 5..
Table 4. Definition of first stage output
Actions
Output sequence
Walk
1 = {1,1}
Run
2 = {1,0}
Forward jump
3 = {0,1}
Sit down
4 = {0,0}
Also we can see the training result in Fig. 5.The meaning
of this stage of neural network can tell us how large the
correlations are between current angle values and the defined
actions.
3.2.1 Second Stage Neural Network
The second stage neural network, as shown in Fig. 5 is
used to find out the current values of 7 angles are related to

Table 5. Definition of first stage output


Actions
Output sequence
Walk
1 = {1,1,1,1}
Run
2 = {1,0,1,0}
Forward jump
3 = {0,0,1,1}
Sit down
4 = {1,0,0,0}
Ouput #2
1.2
1

0.8
0.6

0.4

14
0.2

Input
Layer

0
1

101

201

301

401

501

601

701

Desired Output

801

901

1001 1101 1201

Network Output

Hidden
Layer

Output #1

1.2
1
0.8

Output

0.6

0.4
0.2

0
1

101

201

301

401

1
501

1
601

1
701

801

901

1001 1101 1201

Fig. 6. Structure of second stage neural network


Desired Output

Network Output

Fig. 5. the output of neural network on angle

4 CONCLUSION AND RESULT


In this paper we have discussed the improved recognition
methods on four human actions samples: walk, run, sit down
and rid by using the Neural Network. We used two stages of
ANNs for logically exterminating these four actions, even if
it is under a bad circumstance with big noisy data. And this
neural network is capable of learning more and more actions
and also status if there is more data. In the future, we will
capture more and more actions to complete database and we
will improve the object recognition part.

Output #1
2
1

Desired Output

1201

1101

901

1001

801

701

601

501

401

301

201

101

Network Output

Output #2
2
1

Desired Output

1201

1101

901

1001

801

701

601

501

401

301

201

101

Network Output

Output #3
2
1

Desired Output

1201

1101

901

1001

801

701

601

501

401

301

201

101

Network Output

Output #4
2
1

Desired Output

1201

1101

901

1001

801

701

601

501

401

301

201

[1]
Shigeki Aoki, Yoshio Iwai, Masaki Onishi,
Atsuhiro Kojima, and Kunio Fukunaga. Learning and
recognizing behavioral patterns using position and posture of
human body and its application to detection of irregular
states. Systems and Computers in Japan, 36(13):4556,
November 2005.
[2]
Longbing Cao. In-depth behavior understanding
and use: The behavior informatics approach. Information
Sciences, 180(17):30673085, September 2010.
[3]
Kiyotaka Izumi, Kohei Kamohara, and Keigo
Watanabe. Estimation system of human behaviors using
fuzzy neural network based object selection. 2008 SICE
Annual Conference, pages 19891993, August 2008.
[4]
Keigo Watanabe and Kiyotaka Izumi. An
approach to estimating human behaviors by using an active
vision head. 2006 9th International Conference on Control,
Automation, Robotics and Vision (2006), (December):58,
2006.
[5]
Keigo Watanabe, Kiyotaka Izumi, Kohei
Kamohara, and Eisuke Yamada. Feature extractions for
estimating human behaviors via a binocular vision head.
2007 International Conference on Control, Automation and
Systems, pages 634639, 2007.
[6]
MA Bautista and A Hernndez-Vela. Probabilitybased dynamic time warping for gesture recognition on
RGB-D data. Advances in Depth , pages 126135, 2013.
[7]
Sergio Escalera. Human behavior analysis from
depth maps. Articulated Motion and Deformable Objects,
pages 282292, 2012.
[8]
Mohamed E.
Hussein,
Marwan
Torki,
Mohammad A. Gowayyed, and Motaz El-Saban. Human
action recognition using a temporal hierarchy of covariance
descriptors on 3D joint locations. pages 24662472, August
2013.
[9]
Maimaitimin Maierdan, Keigo Watanabe, and
Shoichi Maeyama. Human behavior recognition system
based on 3-dimensional clustering methods. In 2013 13th
International Conference on Control, Automation and
Systems (ICCAS 2013), pages 11331137. IEEE, October
2013.

[10]
Orasa Patsadu, Chakarida Nukoolkit, and Bunthit
Watanapa. Human gesture recognition using Kinect camera.
2012 Ninth International Conference on Computer Science
and Software Engineering JCSSE, pages 2832, 2012.

101

REFERENCES

Network Output

Fig.7. the output of second stage neural network