Deformable object manipulation has been an area of high interest in the robotics community due to its myriad applications in manufacturing, supply chain and hospitality services. However, teaching robots to manipulate deformable objects has proven to be a long standing challenge due to their infinite dimensional configuration space, tendency to self occlude, and complex dynamics. Fortunately, recent advances in deep learning, computer vision and simulation-to-reality have opened up exciting new directions to tackle these challenges. In this work, we study the problem of fabric manipulation through a variety of methods ranging from learning-based perception combined with control to fully end-to-end learning techniques.

First, we study the problem of general-purpose fabric smoothing and folding. While there has been significant prior work on learning policies for specific fabric manipulation tasks, less focus has been given to algorithms which can perform many different tasks. We take a step towards this goal by learning point-pair correspondences across different fabric configurations in simulation. Then, given a single demonstration of a new task from an initial fabric configuration, these correspondences can be used to compute geometrically equivalent actions in a new fabric configuration. This makes it possible to define policies to robustly imitate a broad set of multi-step fabric smoothing and folding tasks. The resulting policies achieve 80.3% average task success rate across 10 fabric manipulation tasks on two different physical robotic systems. Results also suggest robustness to fabrics of various colors, sizes, and shapes. We also propose Multi-Modal Gaussian Shape Descriptor (MMGSD), a new visual representation of deformable objects which extends ideas from dense object descriptors to predict all symmetric correspondences between different object configurations.

Next, we present the first systematic benchmarking of fabric manipulation algorithms on physical hardware using Reach, a cloud robotics platform that enables low-latency remote execution of control policies on physical robots. We develop 4 novel learning-based algorithms that model expert actions, keypoints, reward functions, and dynamic motions, and we compare these against 4 learning-free and inverse dynamics algorithms on the task of folding a crumpled T-shirt with a single robot arm. The entire lifecycle of data collection, model training, and policy evaluation is performed remotely without physical access to the robot workcell. Results suggest a new algorithm combining imitation learning with analytic methods achieves 84% of human-level performance on the folding task.




Download Full History