Humans are avid consumers of visual content. Every day, people watch videos, play digital games and share photos on social media. However, there is an asymmetry -- while everybody is able to consume visual data, only a chosen few are talented enough to effectively express themselves visually. For the rest of us, most attempts at creating or manipulating realistic visual content end up quickly "falling off" the manifold of natural images. In this thesis, we investigate a number of data-driven approaches for preserving visual realism while creating and manipulating photographs. We use these methods as training wheels for visual content creation. We first propose to model visual realism directly from large-scale natural images. We then define a class of image synthesis and manipulation operations, constraining their outputs to look realistic according to the learned models. The presented methods not only help users easily synthesize more visually appealing photos but also enable new visual effects not possible before this work.
Part I describes discriminative methods for modeling visual realism and photograph aesthetics. Directly training these models requires expensive human judgments. To address this, we adopt active and unsupervised learning methods to reduce annotation costs. We then apply the learned model to various graphics tasks, such as automatically generating image composites and choosing the best-looking portraits from a photo album.
Part II presents approaches that directly model the natural image manifold via generative models and constrain the output of a photo editing tool to lie on this manifold. We build real-time data-driven exploration and editing interfaces based on both simpler image averaging models and more recent deep models.
Part III combines the discriminative learning and generative modeling into an end-to-end image-to-image translation framework, where a network is trained to map inputs (such as user sketches) directly to natural looking results. We present a new algorithm that can learn the translation in the absence of paired training data, as well as a method for producing diverse outputs given the same input image. These methods enable many new applications, such as turning user sketches into photos, season transfer, object transfiguration, photo style transfer, and generating real photographs from painting and computer graphics renderings.
Title
Learning to Synthesize and Manipulate Natural Images
Published
2017-12-14
Full Collection Name
Electrical Engineering & Computer Sciences Technical Reports
Other Identifiers
EECS-2017-214
Type
Text
Extent
183 p
Archive
The Engineering Library
Usage Statement
Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).