Académique Documents
Professionnel Documents
Culture Documents
I. I NTRODUCTION
Most real-world tasks involve interactions with complex,
non-linear dynamics. Although practiced humans are able to
control these interactions intuitively, developing robotic controllers for them is very difficult. Several common household
activities fall into this category, including scrubbing surfaces,
folding clothes, interacting with appliances, and cutting food.
Other applications include surgery, assembly, and locomotion.
These interactions are characterized by hard-to-model effects,
involving friction, deformation, and hysteresis. The compound
interaction of materials, tools, environments, and manipulators
further alters these effects. Consequently, the design of controllers for such tasks is highly challenging.
In recent years, feed-forward model-predictive control
(MPC) has proven effective for many complex tasks, including
quad-rotor control [36], mobile robot maneuvering [20], fullbody control of humanoid robots [14], and many others
[26, 18, 11]. The key insight of MPC is that an accurate
predictive model allows us to optimize control inputs for some
cost over both inputs and predicted future outputs. Such a
cost function is often easier and more intuitive to design than
completely hand-designing a controller. The chief difficulty in
MPC lies instead in designing an accurate dynamics model.
Let us consider the dynamics involved in cutting food items,
as shown in Fig. 1 for the wide range of materials shown
in Fig. 2. An effective cutting strategy depends heavily on
properties of the food, including its coefficient of friction
with the knife, elastic modulus, fracture effects, and hysteretic
effects such as plastic deformation [29]. These parameters
lead humans to such diverse cutting strategies as slicing,
sawing, and chopping. In fact, they can even vary within a
single material; compare cutting through the skin of a lemon
to cutting its flesh. Thus, a major challenge of this work
Fig. 1: Cutting food: Our PR2 robot uses our algorithms to perform
complex, precise food-cutting operations. Given the large variety of
material properties, it is challenging to design appropriate controllers.
Lemon, Faster
0.02
0.01
0
0.01
0
0.01
0.02
Time (s)
0.03
0
0.015
Time (s)
0
0.005
0.01
4
Time (s)
0.01
0.005
0
0.005
0.015
0
0.015
0.01
2
0
0.01
0.03
0
0.01
Position (m)
0.01
0.01
0.02
2
0.015
0.005
0.015
0
0.01
Position (m)
0.03
0
Position (m)
0.03
0.02
Position (m)
Position (m)
Lemon
0.03
0.02
0.02
Position (m)
Vertical Axis
Sawing Axis
Butter
0.03
0.005
0
0.005
0.01
0.015
0
Fig. 3: Variation in cutting dynamics: plots showing desired (green) and actual (blue) trajectories, along with error (red) obtained using a
stiffness controller while cutting butter (left) and a lemon at low (middle) and high (right) rates of vertical motion.
Fig. 4: Gripper axes: PR2s gripper with knife grasped, showing the
axes used in this paper. The X (sawing) axis points along the blade
of the knife, Y points normal to the blade, and Z points vertically.
increases error along the sawing axis, even though the same
controls are used for that axis.
In our approach, we fix the orientation of the end-effector,
as well as the position of the knife along its Y axis, using
stiffness control to stabilize these. However, even though our
primary goal is to move the knife along its Z axis, as shown
in Fig. 3, the X and Z axes are strongly coupled for this
problem. Thus, our algorithm performs control along both the
X and Z axes. This allows sawing and slicing motions
in which movement along the X axis is used to break static
friction along the Z axis and enable forward progress. We use
a nonlinear function f to predict future states:
x
(t+1) = f (x(t) , u(t+1) )
(1)
t+1:t+T , Ut+1:t+T )
(2)
Ut+1:t+T
= arg max C(X
Ut+1:t+T
h[l]
l(b1)
l(b)
h[lp]
x
(b+1)
h[lc]
h[f ]
h[c]
v (b1)
v (b)
u(b+1)
Fig. 5: Deep predictive model: Architecture of our recurrent conditional deep predictive dynamics model.
[f ](b)
hj
[l](b)
hj
=
=
Nu
X
i=1
Nl
X
[f ] (b+1)
Wi,j ui
(b+1)
N
oh
X
(4)
(5)
(6)
[l] (b)
Wi,j li
i=1
x
j
Wi,j hi
i=1
Long-Term Recurrent Latent Features: Another major challenge in modeling time-dependent dynamics is integrating
long-term information while still allowing for transitions in dynamics, such as moving from the skin to the flesh of a lemon.
To this end, we introduce transforming recurrent units (TRUs).
To retain state information from previous observations, our
TRUs use temporal recurrence, where each latent unit has
weights to the previous timesteps latent features. To allow this
state to transition based on locally observed transformations in
dynamics, they use the paired-filter behavior of multiplicative
interactions to detect transitions in the dynamic response of
the system and update the latent state accordingly. In previous work, multiplicative factored conditional units have been
shown to work well in modeling transformations in images
[27] and physical states [38], making them a good choice
here. Each TRU thus takes input from the previous TRUs
output and the short-term response features for the current
and previous time-blocks. With ll denoting recurrent weights,
lc denoting current-step for the latent features, lp previousstep, and lo output, and Nlh as the number of TRU hidden
(b)
lj
=
=
i=1
Nv
X
i=1
Nlh
X
i=1
[f ] (b1)
Wi,j vi
(7)
(8)
+
Nl
X
[ll] (b1)
Wk,j lk
k=1
(9)
to future timesteps. This could cause huge errors at timehorizon T , which will in turn cause large gradients to be backpropagated, resulting in instability in the learning and overfitting to the training data. To remedy this, we propose a multistage pre-training approach which first optimizes some subsets
of the weights, leading to much more accurate predictions and
less instability when optimizing the final recurrent network.
We show in Fig. 7 that our learning algorithm significantly
outperforms random initialization.
Phase 1: Unsupervised Pre-Training: In order to obtain
a good initial set of features for l, we apply an unsupervised
learning algorithm similar to the sparse auto-encoder algorithm
[16] to train the non-recurrent parameters of the TRUs. This
algorithm first projects from the TRU inputs up to l, then uses
the projected l to reconstruct these inputs. The TRU weights
are optimized for a combination of reconstruction error and
sparsity in the outputs of l.
Phase 2: Short-term Prediction Training: While we could
now use these parameters as a starting point to optimize a
fully recurrent multi-step prediction system, we found that in
practice, this lead to instability in the predicted values, since
inaccuracies in initial predictions might blow up and cause
huge deviations in future timesteps.
Instead, we include a second pre-training phase, where we
train the model to predict a single timestep into the future. This
allows the model to adjust from the task of reconstruction to
that of physical prediction, without risking the aforementioned
instability. For this stage, we remove the recurrent weights
from the TRUs, effectively setting all W [ll] to zero and
ignoring them for this phase of optimization.
Taking x(m,k) as the state for the k th time-block of training
case m, M as the number of training cases, and Bm as the
number of timeblocks for case m, this stage optimizes:
M BX
m 1
X
= arg min
||
x(m,b+1) x(m,b+1) ||22
(10)
m=1 b=2
The second term, Csaw , keeps the tip of the knife inside some
reasonable sawing range along the X axis, ensuring that it
actually cuts through the food. Since any valid position is
acceptable, this term will be zero inside some margins from
this range, then become quadratic once it passes those margins.
Taking Px as the center point of the sawing range, ds as the
range, and as the margin, we define this term as:
t+T
n
o2
X
(13)
max 0, |Px(k) Px | ds +
Csaw (X) =
k=t
VII. E XPERIMENTS
In order to evaluate our algorithm as compared to other
baseline and state-of-the-art approaches, we performed a series
of experiments, both offline on our large dataset of material
interactions, and online on our PR2 robot.
12
10
Linear SS
GMMLinear
ARMAX
Simple Deep
5NN
Ours, NonRecur.
Gaussian Process
Recur. Deep
Ours, Rand. Init
Ours
0
0
0.05
0.1
0.15
0.2
0.25
Time (s)
0.3
0.35
0.4
0.45
0.5
A. Dataset
Our material interaction dataset contains 1488 examples of
robotic food-cutting for 20 different materials (Fig. 2). We
collected data from three different settings. First, a fixedparameter setting in which trajectories as shown in the leftmost
two columns of Fig. 3 were used with a stiffness controller.
Second, for 8 of the 20 materials in question, data was
collected while a human tuned a stiffness controller to improve
cutting rate. This data was not collected for all materials to
avoid giving the learning algorithm and controller near-optimal
cases for all materials. Third, a randomized setting where most
parameters of the controller, including cutting and sawing rate
and stiffnesses, but excluding sawing range (still fixed at 4cm)
were randomized for each stroke of the knife. This helped to
obtain data spanning the entire range of possible interactions.
B. Prediction Experiments
Setting: In order to test our model, we examine its predictive
accuracy compared to several other approaches. Each model
was given 0.7s worth of past trajectory information (forces
and known poses) and 0.5s of future forces and then asked to
predict the future end-effector trajectory. For this experiment,
we used 70% of our data for training, 10% for validation, and
20% for testing, sampling to keep each set class-balanced.
Baselines: For prediction, we include a linear state-space
model, an ARMAX model which also weights a range of past
states, and a K-Nearest Neighbors (K-NN) model (5-NN gave
the best results) as baseline algorithms. We also include a
GMM-linear model [25], and a Gaussian process (GP) model
[10], trained using the GPML package [32]. Additionally, we
compare to standard recurrent and non-recurrent two-layer
deep networks, and versions of our approach without recurrence and without pre-training (randomly initializing weights,
then training for recurrent prediction).
Results: Fig. 7 shows performance for each model as mean
L2 distance from predicted to ground truth trajectory vs.
prediction time in the future. Temporally-local (piecewise-)
linear methods (linear SS, GMM-linear, and ARMAX) gave
weak performance for this problem, each yielding an average
error of over 8mm at 0.5s. This shows, as expected, that linear
models are a poor fit for our highly non-linear problem.
Instance-based learning methods K-NN and Gaussian processes gave better performance, at an average of 4.25mm and
3.56mm, respectively. Both outperformed the baseline twolayer non-recurrent deep network, which gave 4.90mm error.
The GP gave the best performance of any temporally-local
model, although this came at the cost of extreme inference
time, taking an average of 3361s (56 minutes) to predict 0.5s
into the future, 1.18x106 times slower than our algorithm,
whose MATLAB implementation took only 3.1ms.
The relatively unimpressive performance of a standard twolayer deep network for this problem underscores the need
for task-appropriate architectures and learning algorithms. Including conditional structures, as in the non-recurrent version
of our model and temporal recurrence reduced this error
to 3.80mm and 3.27mm, respectively. Even when randomly
initialized, our model outperformed the baseline recurrent
network, giving only 2.78mm error, showing the importance of
using an appropriate architecture. Using our learning algorithm
further reduced error to 1.78mm, 36% less than the randomlyinitialized version and 46% less than the baseline recurrent
model, demonstrating the need for careful pre-training.
Our approach also gave a tighter and lower 95% confidence
interval of prediction error, from 0.17-7.23mm at 0.5s, a width
of 7.06mm, compared to the baseline recurrent nets interval
of 0.26-10.71mm, a width of 10.45mm, and the GPs interval
of 0.34-15.22mm, a width of 14.88mm.
C. Robotic Experiments
Setting: To examine the actual control performance of our
system, we also performed a series of online robotic experiments. In these experiments, the robots goal was to make a
linear cut between two given points. We selected a subset of
10 of the 20 materials in our dataset for these experiments,
aiming to span the range of variation in material properties. We
evaluated these experiments in terms of cutting rate, i.e. the
vertical distance traveled by the knife divided by time taken.
Baselines: For control, we validate our algorithm against
several other control methods with varying levels of class
information. First, we compare to a class-generic stiffness
controller, using the same controller for all classes. This
controller was tuned to give a good (>90%) rate of success
for all classes, meaning that it is much slower than necessary
for easy-to-cut classes such as tofu. We also validate against
controllers tuned separately for each of the test classes, using
the same criteria as above, showing the maximum cutting rate
that can be expected from fixed-trajectory stiffness control.
As a middleground, we compare to an algorithm similar to
that of Gemici and Saxena [15], where a set of class-specific
material properties are mapped to learned haptic categories.
We found that learning five such categories assigned all data
for each class to exactly one cluster, and thus used the same
controller for all classes assigned to each cluster. In cases
where this controller was the same as used for the class-tuned
case, we used the same results for both.
Results: Figure 8 shows the results of our series of over 450
robotic experiments. For all materials except butter and tofu,
25
20
Stiff., classgeneral
Stiff., clustered (Gemici)
Stiff., classspecific
Ours
15
10
But. Tofu
R EFERENCES
[1] C. G. Atkeson, C. H. An, and J. M. Hollerbach. Estimation
of inertial parameters of manipulator loads and links. Int.
J. Rob. Res., 5(3):101119, Sept. 1986.
[2] M. Beetz, U. Klank, I. Kresse, A. Maldonado,
L. Mosenlechner, D. Pangercic, T. Ruhr, and M. Tenorth.
Robotic Roommates Making Pancakes. In Humanoids,
2011.
[3] Y. Bengio. Learning deep architectures for AI. FTML, 2
(1):1127, 2009.
[4] S. Bennett. A brief history of automatic control. Control
Systems, IEEE, 16(3):1725, Jun 1996. ISSN 1066-033X.
[5] M. Bollini, J. Barry, and D. Rus. Bakebot: Baking cookies
with the pr2. In IROS PR2 Workshop, 2011.
[6] M. V. Butz, O. Herbort, and J. Hoffmann. Exploiting redundancy for flexible behavior: Unsupervised learning in a
modular sensorimotor control architecture. Psychological
Review, 114:10151046, 2007.
[7] S. Chernova and M. Veloso. Confidence-based policy
learning from demonstration using gaussian mixture models. In AAMAS, 2007.
[8] C.-M. Chow, A. G. Kuznetsov, and D. W. Clarke. Successive one-step-ahead predictions in multiple model predictive control. Int. J. Systems Science, 29(9):971979, 1998.
[9] R. Collobert and J. Weston. A unified architecture for
natural language processing: deep neural networks with
multitask learning. In ICML, 2008.
[10] M. P. Deisenroth and C. E. Rasmussen. Pilco: A modelbased and data-efficient approach to policy search. In
ICML, 2011.
[11] M. Dominici and R. Cortesao. Model predictive control architectures with force feedback for robotic-assisted beating
heart surgery. In ICRA, 2014.
[12] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient
methods for online learning and stochastic optimization.
JMLR, 12:21212159, July 2011.
[13] Eigen Matrix Library. http://eigen.tuxfamily.org.
[14] T. Erez, K. Lowrey, Y. Tassa, V. Kumar, S. Kolev, and
E. Todorov. An integrated system for real-time modelpredictive control of humanoid robots. In ICHR, 2013.
[15] M. Gemici and A. Saxena. Learning haptic representation
for manipulating deformable food objects. In IROS, 2014.
[16] I. Goodfellow, Q. Le, A. Saxe, H. Lee, and A. Y. Ng.
Measuring invariances in deep networks. In NIPS, 2009.
[17] A. Graves, A. Mohamed, and G. E. Hinton. Speech recognition with deep recurrent neural networks. In ICASSP,
2013.
[18] S. Heshmati-alamdari, G. K. Karavas, A. Eqtami,
M. Drossakis, and K. Kyriakopoulos. Robustness analysis
of model predictive control for constrained image-based
visual servoing. In ICRA, 2014.
[19] G. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):
504507, 2006.
[20] T. Howard, C. Green, and A. Kelly. Receding horizon
model-predictive control for mobile robot navigation of
intricate paths. In International Conference on Field and