Human Motion capture has been an active area of research for many years and has applications in many fields such as gaming, entertainment, physical therapy, and ergonomics. Most commercially available motion capture systems use numerous markers to be placed on the body of the human subject requiring a significant setup time. In this dissertation, we develop the architecture and algorithms for markerless motion capture with a multi-view structured light system. In contrast to existing markerless approaches that use multiple camera streams, we reconstruct the scene by combining the views from three structured light stations using sinusoidal phase shift patterns, each equipped with one projector, a stereo pair of cameras for phase unwrapping, and a color camera. The three stations surround the subject and are time multiplexed to avoid interference. Phase-shifted sinusoidal patterns offer low decoding complexity, require as few as three projection frames per reconstruction, and are well suited for capturing dynamic scenes. In these systems, depth is reconstructed by determining the phase projected onto each pixel in the camera and establishing correspondences between camera and projector pixels. Typically, multiple periods are projected within the set of sinusoidal patterns, thus requiring phase unwrapping on the phase image before correspondences can be established.

There are three novel contributions to this dissertation; first, we present a novel phase unwrapping algorithm across space and time in order to generate a temporally consistent point cloud. Specifically, we combine a quality guided phase unwrapping approach with absolute phase estimates from the stereo cameras to solve for the absolute phase of connected regions. Second, we develop a calibration method for multi-camera-projector systems in which sensors face each other as well as share a common viewpoint. We use a translucent planar sheet framed in PVC piping as a calibration target which is placed at multiple positions and orientations within a scene. In each position, the target is captured by the cameras while it is being illuminated by a set of patterns from various projectors. The translucent sheet allows the projected patterns to be visible from both sides, allowing correspondences between devices that face each other. The set of correspondences generated between the devices using this target are input into a bundle adjustment framework to estimate calibration parameters. Third, we develop algorithms to reconstruct dynamic geometry of a human subject using a template generated by the system itself. Specifically, we deform the template to each frame of the captured geometry by iteratively aligning each bone of the skeleton. This is done by searching for correspondences between the source template and the captured geometry, solving for rotation of bones, and enforcing constraints on each rotation to prevent the template from taking on anatomically unnatural poses. Once the geometry of the dynamic mesh is reconstructed, the template is textured using the color cameras from the multi-view structured-light system. We demonstrate the effectiveness of our approach both qualitatively and quantitatively for an actual sequence of a moving human subject by synthesizing arbitrary views of the dynamic scene.




Download Full History