VII Latin American Symposium on Circuits and Systems (LASCAS) 2016
Telepresence Using the Kinect Sensor
and the NAO Robot Jose Avalos , Student Member, IEEE, Sergio Cortez , Student Member, IEEE, Karina Vasquez , Student Member, IEEE, Victor Murray , Senior Member, IEEE, and Oscar E. Ramos , Member, IEEE Department of Electrical Engineering, Universidad de Ingenieria y Tecnologia - UTEC, Lima, Peru. Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, New Mexico, USA.
AbstractSeveral applications require that a telepresence sys-
tem does not only transmit images, audio and video but also real-time motion. This work presents the implementation of a system that allows for telepresence using a humanoid robot and a motion capture device. The proposed methodology for motion imitation is fast and uses a low-cost motion acquisition sensor. The objective is to make the robot reproduce the motion of a person. In this way, remote actions can be executed through the robot sending also images and audio from the environment. The method has been applied to the humanoid robot called NAO, which has been able to reproduce human motions. This framework can also be used for different education purposes. Fig. 1: Example of robot teleoperation. The real and simulated NAO robot KeywordsNAO robot, kinect sensor, robotic kinematics, telep- follows the movements of the person. resence
I. I NTRODUCTION In this work, we propose a methodology to achieve real-time
Telepresence is a resource that allows people to be con- telepresence using a geometric approach for planar motion and nected from any place in order to virtually accomplish a an inverse kinematics approach for a complete 3D motion of a specific task. An example is a videoconference in which both humanoid robot. The methods have been validated using both visual and audio are enabled to recreate a certain scenario. Kinect sensor v1 and v2 [6] controlling the NAO robot [7]. However, the real target of telepresence is to get the user have An example of the proposed teleoperation method is shown the control of all his faculties in any place at any time. This in Fig. 1 where the robot imitates the motion shown by the will improve communication and, as a result, it can be applied demonstrator in real time. The remainder of this paper is on any educational field. organized as follows. In Section II we present the methods to Different methods have been applied and analyzed to achieve motion imitation, which is crucial for teleoperation. achieve telepresence through the usage of robots. Suleiman Section III shows the experimental results of two methods. et al. [1] present human motion imitation by a human- size humanoid robot considering inverse dynamics, where the II. M ETHODS physical properties of the robot constitute the main problem. To apply the teleoperation to any robot-sensor system, we The method by Cela et al. [2] is a low-cost system that consists propose to divide the task in three main parts. The first one in the use of accelerometers attached to the human body to consists in moving the robot joints through inverse kinematic detect the human joint motion; however, this method is limited operations in the operational space or in the joint configuration to simple robots due to imprecise data. On the other hand, It is space. The second part obtains compatible data from the also possible to use visual sensors to acquire the human motion motion capture device and processes it to make it suitable without needing to attach any special device to the person as in for motion imitation. The last part consists in creating a [3] where a robot with eyes capable of capturing motion is bridge between the robot and the motion capture device, which presented. In this front, the Kinect sensor has also been used to will allow the transmission of data in real time, achieving directly detect human body joints [4] and to use the point cloud teleoperation. The process can be geared to work under a two data with some further processing in a virtual environment or a three dimensional model. [5]. In all these cases, the data recovered from the motion capture system is processed to recover the motion and is then A. Two dimensional teleoperation motion sent to the robot. An important issue yet to be solved is the 1) Robot Motion Control: In this case, a joint-space based time needed to process the motion and to determine the most control system is more efficient and easier to implement since adequate joint configuration for the robot, which is usually not the motion constrained to a plane represented using joint generated in real time due to computational complexity. angles. Therefore, the model will directly work in the joint
ISBN 978-1-4673-7835-2/16/$31.002016 IEEE 303 IEEE Catalog Number CFP16LAS-ART
VII Latin American Symposium on Circuits and Systems (LASCAS) 2016
space rather than in the Cartesian space. In two dimensions,
specific angles relative to the motion of certain limbs will be the data sent to the robot. The program will receive joint angles which will constitute the motion targets. 2) Data acquisition: There are different ways and devices that act as motion capture systems to recover data from the human body [2][5]. Another example is a depth sensor such as the Kinect which can capture specific joints of the human body, in Cartesian coordinates, related to the arms, Fig. 2: Skeleton showing the important angles for the method. represents legs, head and torso. These sensors usually come with an SDK the angle relative to the torso and the forearm. + represents the angle at (Software Development Kit) which can be used to transform the elbow, between the forearm and the arm. Cartesian into spherical coordinates to obtain the desired joint angles. The propose method is based on vector operations. This method can be extrapolated for other limbs as long as three dimensional motion. In this case, the robot control will the motion is constrained to a plane. be defined with respect to Cartesian coordinates instead of First, we analyze both arms of the person who will tele- joint angles. Since the motion is now three dimensional, a full operate the robot. Note that the inertial reference frame is tele-operation can be achieved. a motionless torso, and local reference frames lie on each Let the robot joint angles be represented by the joint shoulder; that is, the position of each arm is represented with configuration vector q = (q1 , q2 , , qn ), where n is the respect to its shoulder. The third component of these positions number of degrees of freedom of the robot. Let a Cartesian is not important, since we are using a two dimensional model position be xi = (xi , yi , zi ). Forward kinematics relates the (it is only included for mathematical purposes). joint configuration to an operational space configuration, such Let the sensor output describing the ith joint position as the Cartesian space, as with respect to the sensor reference frame be represented in xi = fi (q) (4) Cartesian coordinates as: where fi is the forward kinematics function. It can be seen in xi = (xi , yi , zi ). (1) (4) that the Cartesian coordinates are a function that depends For instance, let the shoulder position be xsj , the elbow directly on the joint angles. This is the case of the joint control position be xej and the hand position be xhj , where j refers model described in section II-A. Since we are interested in the to either the left or right limb. Using these positions, let the joint angles that will achieve a desired Cartesian position, (4) vector referred to the forearm, vfj and the vector referred to needs to be inverted as the arm vaj be defined as: q = fi1 (xi ), (5) vfj = xej xsj (2) which is referred to as inverse kinematics. Due to the non- and linearities of (4), it is difficult to obtain the inverse map as vaj = xhj xej . (3) in (5) in a general case. One usual approach to solve this problem is through differential inverse kinematics which acts In the two-dimensional case, let the angle between the torso as a linearization of the model around the working point. This and the forearm be represented by as Fig. 2 shows. This relation is represented with a differential map which, in this angle can range from 0 to 180 where 0 corresponds to the case, corresponds to the Jacobian Ji = f i q so that forearm close to the torso and 180 corresponds to the forearm fully raised. The angle at the elbow, between the forearm and xi = Ji q. (6) the arm is + , where is obtained in a similar way as . As (6) shows, the Jacobian relates the rate of change of the Note that the angle + ranges from 0 to 360 . joint angles to the rate of change of the Cartesian coordinates. 3) Data Transmission: Once the motion capture sensor is The system (6) is linear and needs to be solved for q. In the acquiring raw data, which is processed as in [4] to obtain the general case [2], this solution is given by desired joint angles as described in the previous section, they can be sent to the robot. For these data to be transmitted to q = Ji# xi + (I Ji# Ji )z, (7) the robot, a bridge must be implemented to make the system where Ji#denotes the Moore-Penrose pseudoinverse of Ji , and work in real time. The implementation of the bridge depends z is an arbitrary vector which is projected onto the nullspace upon the specific characteristics of the robot and the sensor to of Ji using (I Ji# Ji ). Note that if both q and x have be used. the same size, leading to a square Jacobian, and if Ji is a B. Three dimensional teleoperation motion nonsingular matrix, then the pseudoinverse becomes the usual inverse Ji# = Ji1 . Using Euler integration, the joint velocities 1) Robot Motion Control: Contrary to the planar motion obtained with (7) can be transformed into joint positions as described in section II-A, a control based on inverse kinemat- ics or differential inverse kinematics is more suitable for a qk+1 = qk + qk T (8)
ISBN 978-1-4673-7835-2/16/$31.002016 IEEE 304 IEEE Catalog Number CFP16LAS-ART
VII Latin American Symposium on Circuits and Systems (LASCAS) 2016
where qk is the joint configuration at time k. These joint values
can then be directly sent to the robot. 2) Data acquisition: This part is similar to its counterpart in the two dimensional model. However, in this case the data process is only referred to the Cartesian coordinates obtained by the motion capture device instead of the joint angles. The constraint to be taken into account is that the positions relative to the human bones need to be normalized. The normalization depends on the specific robot that is used since different robots have different link lengths. 3) Data transmission: The data obtained with the motion Fig. 3: NAOs arm design. Image of Aldebaran [10]. capture device needs to be transmitted to the robot using a bridge which depends on the specific hardware. One option for this case is to use ROS (Robot Operating System) as a are sent to NAO. For instance, if a sudden movement is sent middleware for interconnecting the system. to the robot, the program will end the motion execution as a safety measure. Because of this issue, the data from the Kinect III. R ESULTS AND D ISCUSSION are restricted to the robot joint limit values with the purpose For the validation of the proposed methodology, we used the of avoiding physical damage and undesired interruptions. This Kinect For Windows as a motion capture device. The reason is similar to clamping these values as in [9]. The angles are was its very low price compared to other motion capture sys- multiplied by a factor obtained by experimental tests to achieve tems, and the fact that it is open source. This sensor allows the better result. Likewise, tests were made to adjust the speed to recovery of the data through its SDK. The humanoid robot that reproduce the motion avoiding interruptions due to torques or we used for the experiments is NAO from Aldebaran Robotics, angular limits. which resembles a human in its kinematical structure. This allows us to reproduce anthropomorphic movements. B. Results of the three-dimensional telepresence The Kinect for Windows v1 sensor detects depth based To generate motion for the robot in three dimensions we on structured light. Combining both depth by stereo and used a control system based on inverse kinematics. Since we depth by focus techniques, the Kinect sensor provides a three are interested in reproducing the motion shown by a human dimensional reconstruction of a 3D space. The algorithm used demonstrator, it is important to ensure that the reference by Microsoft to recognize a person relies on the comparison of frames of both the motion sensor and the robot are consistent. the current obtained data with an internal database using a tree- In the general case, these reference frames have a different type algorithm [8]. If the operation is successful, the person orientation as clearly presented in [4], where a Kinect v2 and is properly recognized and represented as a skeleton. The a 7-DOF KUKA robotic arm were used. To solve this problem, function to achieve this recovery in the Kinect SDK is called it is necessary to use a rotation matrix that can be obtained Skeletal Tracking. This provides the position (in Cartesian from a prior calibration process. In our case, contrary to [4], coordinates) of the main joints in a human being. Contrary we are assuming that the chest remains fixed throughout the to its predecessor, the Kinect v2 is based on the principle of motion and then, it is straightforward to obtain the rotation time-of-flight. matrix relating both frames from simple inspection. The software for robot control has been mainly developed A. Results of the two-dimensional telepresence for Linux since most robots use a Linux-based real-time This method uses the Kinect sensor v1 with its correspond- embedded computer. We used ROS (Robot Operating System) ing SDK V1.8, which can only be installed on Windows. The in Linux as a middleware to get the acquired motion from the NAO robot is programmed using the interface provided by sensor, and to send the generated joint command to the robot. Aldebaran Robotics called NAOqi. This work uses C++ to Since the Kinect SDK needed to extract the human motion data program the robot motion. has been developed only for Windows, we needed to send the The joint angular limits of the robot limbs do not correspond data from a machine running Windows to a machine running to the values used in section II-A2. For example, the joint angle Linux. This was achieved through a logical (non-physical) corresponding to the shoulder neither begins at 0 nor ends serial connection, using the rosserial package, which acted as at 180 degrees, but ranges from -18 to 76 (LShoulderRoll) a bridge between Windows and ROS in Linux. This connection as Fig. 3 depicts. This figure also shows that the values for scheme is shown in Fig. 4. RelbowRoll range from 0 to 88.5 instead of the assumed The human motion is then used as input to the control sys- ones. To solve this problem, a linear map relating these values tem where inverse differential kinematics is used to generate was used. If the robot angular values exceed the joint limits, the desired joint angles qdesired using (8). The NAOqi Bridge the robot stops moving due to its internal safety procedure. package acts as an interface between ROS and the NAOqi Another factor that leads to automatically stopping the motion which control the robot. It takes the desired joint angles that is the nominal torque which appears when sharp movements have been published to a ROS topic and sends them to NAOqi.
ISBN 978-1-4673-7835-2/16/$31.002016 IEEE 305 IEEE Catalog Number CFP16LAS-ART
VII Latin American Symposium on Circuits and Systems (LASCAS) 2016
Fig. 4: Scheme showing the parts of the system used for the 3D motion, from the human motion acquisition (with the Kinect) to the humanoid robot execution.
Fig. 6: Comparison of Kinect v1 (red) and v2 (blue) in terms of smoothing
in the motion acquisition. xaxis is time and yaxis is the position in x.
Fig. 5: Example of the results of the three-dimensional telepresence system
under three different positions. first step to achieve full telepresence. Future work will involve audio and video transmission since the humanoid platform (NAO) possesses these sensors. Similarly, it takes the joint measured angles qmeasured coming from the NAOqi and feedbacks them publishing in a ROS R EFERENCES topic that the controller can read. The NAOqi is the ultimate [1] W. Suleiman, E. Yoshida, F. Kanehiro, J.-P. Laumond, and A. Monin, interface between the robot and the host computer, which is On human motion imitation by humanoid robot, in IEEE International normally used to control the NAO robot. The results of three Conference on Robotics and Automation (ICRA), 2008, pp. 26972704. [2] A. Cela, J. J. Yebes, R. Arroyo, L. M. Bergasa, R. Barea, and E. Lopez, dimensional telepresence are shown in Fig. 5. The software Complete low-cost implementation of a teleoperated control system for we developed for this work is freely available1 . a humanoid robot, Sensors, vol. 13, no. 2, pp. 13851401, 2013. In our experience, the Kinect v1 presented problems when [3] S. Filiatrault and A.-M. Cretu, Human arm motion imitation by a humanoid robot, in IEEE International Symposium on Robotic and some joints start overlapping with other joints. To analyze Sensors Environments (ROSE), 2014, pp. 3136. these problems, we chose a movement parallel to the ground [4] Y. Yang, H. Yan, M. Dehghan, and M. H. Ang, Real-time human-robot where the arm starts fully extended and moves towards the interaction in complex environment using kinect v2 image recognition, in IEEE International Conference on Cybernetics and Intelligent Systems torso. Fig. 6 shows the resulting path: we can see that the v2 (CIS), 2015, pp. 112117. maintains a smoother track than the v1. Therefore, we used [5] H. Hisahara, S. Hane, H. Takemura, Y. Chin, T. Ogitsu, and H. Mi- the Kinect v1 only for the two dimensional approach, but we zoguchi, 3d point cloudbased virtual environment for safe testing of robot control programs, International Conference on Intelligent used the Kinect v2 for the three dimensional motion. Systems, Modelling and Simulation (ISMS), pp. 2427, 2015. [6] H. Furntratt and H. Neuschmied, Evaluating pointing accuracy on IV. C ONCLUSIONS kinect v2 sensor, in International Conference on Multimedia and Human-Computer Interaction (MHCI), 2014, pp. 124.1124.5. A method for the human teleoperation of a humanoid [7] S. Shamsuddin, L. I. Ismail, H. Yussof, N. I. Zahari, S. Bahari, robot for motion in two dimensions and three dimensions H. Hashim, and A. Jaffar, Humanoid robot nao: Review of control was presented in this paper. The proposed approach was and motion exploration, in IEEE International Conference on Control System, Computing and Engineering (ICCSCE), 2011, pp. 511516. validated using the NAO robot and the Kinect sensor. There [8] J. Shotton, T. Sharp, A. Kipman, A. Fitzgibbon, M. Finocchio, A. Blake, exist different applications for telepresence. In particular, this M. Cook, and R. Moore, Real-time human pose recognition in parts work wants to use telepresence as a method for long-distance from single depth images, Communications of the ACM, vol. 56, no. 1, pp. 116124, 2013. teaching with active participation of the users. Moreover, this [9] D. Raunhardt and R. Boulic, Progressive clamping, in IEEE Interna- work could be used as an interactive tool for teaching children tional Conference on Robotics and Automation (ICRA), 2007, pp. 4414 with autism in order to improve their social abilities. This is a 4419. [10] A. Robotics, Nao software 1.14. 5 documentation, URL http://www. 1 https://github.com/utecrobotics/nao aldebaran-robotics. com, 2014. kinect teleop
ISBN 978-1-4673-7835-2/16/$31.002016 IEEE 306 IEEE Catalog Number CFP16LAS-ART