In recent times, the field of computer vision has made great progress with recognizing and tracking people and their activities in videos. However, for systems designed to interact dynamically with humans, tracking and recognition are insufficient; the ability to predict behavior is requisite. In this thesis, we introduce various general frameworks for predict human behavior at three levels of granularity: events, motion, and dynamics. In Chapter 2, we present a system that is capable of predicting future events. In Chapter 3, we present a system that is capable of personalized prediction of the future motion of multi-agent, adversarial interactions. Finally, in Chapter 4, we present a framework for learning a representation of human dynamics that we can: 1) use to estimate the 3d pose and shape of people moving in videos, and 2) use to hallucinate the motion surrounding a single-frame snapshot. We conclude with several promising future directions for learning to predict human behavior from video.




Download Full History