Académique Documents
Professionnel Documents
Culture Documents
Flow Diagram
Upper
Input User Face Segmenta
body
Video Detection tion
detection
Translatio
Mask
n Scaling Fitting
Image
And
Rotation
Final
Output
Video
Face Detection:
The main function of this step is to determine whether human faces appear in a given
image, and where these faces are located at. The expected outputs of this step are
patches containing each face in the input image. In order to make further face
recognition system more robust and easy to design, face alignment are per-formed
to justify the scales and orientations of these patches. Besides serving as the pre-
processing for face recognition, face detection could be used for re-gion-of-interest
detection, retargeting, video and image classification, etc. For such purpose VIOLA
JONES algorithm is used.
Exponential Loss
Feature extraction
Rectangular features are used, with a new image representation their calculation is
very fast.Classifier training and feature selection using a slight variation of a method
called AdaBoost. A combination of simple classifiers is very effective
FEATURES
INTEGRAL IMAGES
Summed area tables
A representation that means any rectangles values can be calculated in four accesses
of the integral image.
FEATURE EXTRACTION:
Features are extracted from sub windows of a sample image.
The base size for a sub window is 24 by 24 pixels.
Each of the four feature types are scaled and shifted across all possible
combinations
In a 24 pixel by 24 pixel sub window there are ~160,000 possible features to be calculated.
ALGORITHM FLOW CHART
FACE LK BODY LK
MORPHOLOGICAL
FITTING
OPERATION
SEGMENTATION
LK TRACKING
CONCEPT
This system has more equations than unknowns and thus it is usually over-
determined. The Lucas-Kanade method obtains a compromise solution by the least
squares principle. Namely, it solves the 22 system
or
The matrix is often called the structure tensor of the image at the point p.
MORPHOLOGICAL OPERATION
R = (R A) A
R = (R A) A
TRACKING ALGORITHM
The tracking algorithm of Kinect SDK is created using a large training data set
with over 500k frames distributed over many video arrays. The data set have
been generated synthetically by rendering depth images of common poses of
actions which are frequently performed in video games such as dancing, runningor
kicking, along with their labeled ground truth pairs. In order eliminate same poses
in the data set; a small offset of 5cm is used to create a subset of poses that all
poses are at least as distant as the offset to each other. Pixel depths are then
normalized before raining so that they become translation independent.
After that, training is done to form a decision forest of depth 20.
The result of image segmentation is a set of segments that collectively cover the
entire image, or a set of contours extracted from the image. Each of the pixels in a
region are similar with respect to some characteristic or computed property, such
as color, intensity, or texture
Skin Segmentation
The model of the costume is superimposed on the top layer, the user always
stays behind the model which restricts some possible actions of the user such as
folding armsor holding hands in front of the t-shirt. In order to solve that issue skin
colored areas are detected and brought to the front layer.HSV and YCbCr color
spaces are commonly usedfor skin color segmentation. It is preferred as theYCbCr
color space and the RGB images are convertedinto YCbCr color space by using
following equations:
Since we have the extracted user image as a regionof interest the threshold is applied
only on the pixels that lie on the user. Thus, the areas on the background
which may resemble with the skin color are not processed.
FITTING
Scaling
Translation
Rotation
The skeletal tracker returns the 3D coordinates 20body joints in terms of pixel
locations for x and y coordinates and meters for the depth. The joints which are used
to locate the parts of the 2D cloth model is seen in tracking algorithm .One may
notice flickers and vibrations on the joints due to the frame based recognition
approach. This problem is partially solved by adjusting the smoothing parameters of
the skeleton engine of the Kinect SDK.The rotation angles of the parts of the model
are defined as the angles of the main axes of the bodyand arms.
=atan2 (fyjoint1 yjoint2g; fxjoint1 xjoint2g) (3)
Here the main body axis is defined as the line between the shoulder center and the
hip center, the axes for arms are defined as the lines between the
corresponding shoulders and elbows and atan2 is a function similar to the inverse
tangent but only defined in the interval [0; pi).For each model part an anchor point
is defined as the center of rotation which is set to the middle of the
corresponding line of axis.
Model Scaling
A naive approach for model scaling can be to employthe distance between the joints
as a scaling factor.This approach can be convenient to accommodate the
height and weight related variations of users. Howeverthe distance between the
joints is not sufficientlyaccurate to scale the model to simulate the distancefrom the
sensor. Hence, we tried to adopt a hybridapproach in order to perform shape and
distancebased scaling with enhanced accuracy.The depth based scaling variable DS
is definedstraightforwardly as,
DS =Smodel/Zspine (4)
where smodel is a vector of default width and heightof the model when the user is
one meter distant fromthe sensor and zspine is the distance of the spine of
the user to the sensor.
The shape based length of an arm Lkarm is definedas the Euclidean distance between
the correspondingshoulder and the elbow. The width Wbody and heightHbody of
the body are also calculated with a similarfashion by using the distance between the
shouldersand the distance between the shoulder center and thehip center
correspondingly as follows:
=( )^2 + ( )^2
= *
=| |
=| |
Here x and y are the 2D coordinates of the joints, is constant the width to length ratio
of the arms and
k = {left; right}
We define a scaling function S as a weighted average of depth and shape based
scaling factors as,
=k{(1 +2 D )/(1 +2 )}
=k{(1 +2 D )/(1 +2 )}
where K is an arbitrary parameter to adjust the size for different models as a global
scaling factor, w1and w2 are the weights for shape and depth based
scaling factors. An increase in w2 will extend the range of displacement from the
sensor but decrease the robustness to the weight and height based shape
variations of users. In the experiments the parameters are defined as K = 1, w1 = 1
and w2 = 3.