We propose a method to estimate and track the 3D posture as well as the 3D shape of the human body from a single RGB-D image. We estimate the full 3D mesh of the body and show that 2D joint positions greatly improve 3D estimation and tracking accuracy. The problem is inherently very challenging because due to the complexity of the human body, lighting, clothing, and occlusion. The solve the problem, we leverage a custom MobileNet implementation of OpenPose CNN to construct a 2D skeletal model of the human body. We then fit a low-dimensional deformable body model called SMPL to the observed point cloud using initialization from the 2D skeletal model. We do so by minimizing a cost function that penalizes the error between the estimated SMPL model points and the observed real-world point cloud. We further impose a pose prior define by the pre-trained mixture of Gaussian model to penalize out unlikely poses. We evaluated our method on the Cambridge-Imperial APE (Action Pose Estimation) dataset showing comparable results with non-real time solutions.




Download Full History