DexCatch: Learning to Catch Arbitrary Objects with Dexterous Hands

Fengbo Lan*   Shengjie Wang*   Yunzhe Zhang   Haotian Xu   Oluwatosin Oseni   Yang Gao+   Tao Zhang+
* Equal Contribution    + Corresponding Authors

Abstract

Achieving human-like dexterous manipulation remains a crucial area of research in robotics. Current research focuses on improving the success rate of pick-and-place tasks. Compared with pick-and-place, throw-catching behavior has the potential to increase picking speed without transporting objects to their destination. However, dynamic dexterous manipulation poses a major challenge for stable control due to a large number of dynamic contacts. In this paper, we propose a Stability-Constrained Reinforcement Learning (SCRL) algorithm to learn to catch diverse objects with dexterous hands. The SCRL algorithm outperforms baselines by a large margin, and the learned policies show strong zero-shot transfer performance on unseen objects. Remarkably, even though the object in a hand facing sideward is extremely unstable due to the lack of support from the palm, our method can still achieve a high level of success in the most challenging task.

We account for 5 Catching Scenarios

Overam to Overam Catch

Abreast to Abreast Catch

Underarm to Underarm Catch

Overarm to Abreast Catch

Underarm to Overarm Catch

Objects of daily life

We categorize objects based on the point cloud distribution, namely Elongated objects (e.g., Banana, Pen), Flat objects like Keyboards, and Regular objects that do not fit into the previous categories (e.g., apples, cups). Our algorithm generalizes across these three categories.

Elongated ( Pen )

Flat ( Pie )

Regular ( Apple )

Method

Description

Algorithm Overview

The SCRL algorithm takes as input the environmental observation and the point cloud feature of the object. Then, it learns the catching policy for dexterous hands through an Actor-Critic structure. B: The Lyapunov function, the policy function and the value function are estimated using neural networks. Their network structures are similar, and the only difference is the dimension of the output. For example, the Lyapunov function contains a linear layer, a batch normalization layer, and a penultimate normalization layer, and the last linear layer takes a scalar as output.

BibTeX

@article{DexCatch,
  author    = {Fengbo Lan*, Shengjie Wang*, Yunzhe Zhang, Haotian Xu, Oluwatosin Oseni, Yang Gao, Tao Zhang},
  title     = {DexCatch: Learning to Catch Arbitrary Objects with Dexterous Hands},
  journal   = {arXiv preprint arXiv:2310.08809},
  year      = {2023},
}